hes-dimport

hes-dimport is a elasticsearch plugin which generate shards files of index in mapreduce.

This plugin is applied to offline import index, each time will generate an index, so that you have to define the template to organize these indexes together.

Support Integer and Long and String and Double and Float and Short data types.

Compared to bulk's load API elasticsearch, it's advantages are below:

In a node that improves the write performance of 30%, when there is a large scale of Hadoop clusters, the overall write throughput is linearly increased.
The throughput is linear increase when scale-out hadoop clusters.
The index is generated once, and can be pushed to shard primary and multiple shards replicas.
Make the elasticsearch read and write separate,do not affect the performance of the read.
When there is a large volume of data written, you can use the relatively inexpensive hardware to extend the cluster, when you need a higher query qps and low latency, you can use a better HW.
Based on available of MapReduce job, the write is with HA.

How to use

package plugin

wget https://github.com/kewn21/eagle-hes.git
cd eagle-hes
mvn package

install to elasticsearch

./bin/plugin install eagle-hes-dimport-plugin-0.0.1.zip

mapreduce job

hadoop jar HesDimportJob {-Dindex.name={index_name}} {-Dindex.type={index_type}} [-Dautogenerate.id={true|false}] [-Dfile={meta_file_path}] {input_file_path} [delimiter]

metadata

You can describe the data format in the original data, or define in the metafile

original data with metadata

Each column is separated by the delimiter, when property autogenerate.id is set to false, the first column is for id.
Other columns be separated by ,, the column name and value and type be together with :.

metadata format

The definition of each column be separated by ,, the column name and type be together with :.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
eagle-hes-dimport-plugin		eagle-hes-dimport-plugin
eagle-hes-dimport		eagle-hes-dimport
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hes-dimport

How to use

package plugin

install to elasticsearch

mapreduce job

metadata

original data with metadata

metadata format

About

Releases

Packages

Languages

kewn21/eagle-hes

Folders and files

Latest commit

History

Repository files navigation

hes-dimport

How to use

package plugin

install to elasticsearch

mapreduce job

metadata

original data with metadata

metadata format

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages