8000 GitHub - kewn21/eagle-hes: high performance for elasticsearch import
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000

high performance for elasticsearch import

Notifications You must be signed in to change notification settings

kewn21/eagle-hes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hes-dimport

hes-dimport is a elasticsearch plugin which generate shards files of index in mapreduce.

This plugin is applied to offline import index, each time will generate an index, so that you have to define the template to organize these indexes together.

Support Integer and Long and String and Double and Float and Short data types.

Compared to bulk's load API elasticsearch, it's advantages are below:

  • In a node that improves the write performance of 30%, when there is a large scale of Hadoop clusters, the overall write throughput is linearly increased.
  • The throughput is linear increase when scale-out hadoop clusters.
  • The index is generated once, and can be pushed to shard primary and multiple shards replicas.
  • Make the elasticsearch read and write separate,do not affect the performance of the read.
  • When there is a large volume of data written, you can use the relatively inexpensive hardware to extend the cluster, when you need a higher query qps and low latency, you can use a better HW.
  • Based on available of MapReduce job, the write is with HA.

How to use

package plugin

wget https://github.com/kewn21/eagle-hes.git
cd eagle-hes
mvn package

install to elasticsearch

./bin/plugin install eagle-hes-dimport-plugin-0.0.1.zip

mapreduce job

hadoop jar HesDimportJob {-Dindex.name={index_name}} {-Dindex.type={index_type}} [-Dautogenerate.id={true|false}] [-Dfile={meta_file_path}] {input_file_path} [delimiter]

metadata

You can describe the data format in the original data, or define in the metafile

original data with metadata

  • Each column is separated by the delimiter, when property autogenerate.id is set to false, the first column is for id.
  • Other columns be separated by ,, the column name and value and type be together with :.

metadata format

  • The definition of each column be separated by ,, the column name and type be together with :.

About

high performance for elasticsearch import

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0