CN105959363A

CN105959363A - Big data cluster deployment method capable of adapting to hardware configuration

Info

Publication number: CN105959363A
Application number: CN201610264394.0A
Authority: CN
Inventors: 唐明; 常梦楠; 任红雷
Original assignee: China Electronic Technology Cyber Security Co Ltd
Current assignee: China Electronic Technology Cyber Security Co Ltd
Priority date: 2016-04-26
Filing date: 2016-04-26
Publication date: 2016-09-21

Abstract

The invention discloses a big data cluster deployment method capable of adapting to hardware configuration. Synchronization requests are transmitted to a puppet-master by puppet-agents, and after the puppet-agents pass the authentication, the puppet-agents are used to detect server hardware configuration information, and are used to transmit the server hardware configuration information to the puppet-master; the puppet-master is used to generate pseudo codes, which are transmitted to the puppet-agents; the puppet-agents are used to execute the content of the pseudo codes belonging to the own nodes, and then automatic deployment of a cluster is completed, and synchronization is completed. By adopting the big data cluster deployment method capable of adapting to the hardware configuration, automatic batch deployment of a big data environment is realized, and the processes from the detection of the hardware information to the acquisition of the software installation and the generation of the configuration files are realized by programming, and therefore tedious operations such as manual software installation and configuration file changing are not required. The big data cluster deployment method is advantageous in that the workload of the big data cluster set-up deployment is relieved, the time cost of the cluster deployment is saved, and the subsequent unified management and the subsequent unified operation and maintenance are facilitated.

Description

A kind of large data sets group's dispositions method of adaptive hardware configuration

Technical field

The present invention relates to large data sets group's dispositions method of a kind of adaptive hardware configuration.

Background technology

Large data sets group includes data batch processing and real-time streaming processes framework, also includes certification and authority control The systems such as system, the assembly related to is various, such as hadoop, hive, kafka, storm, spark, zooke Relation of interdependence is there may be between eper, kerberos, sentry etc., and different assembly.Big number The installation of these assemblies and the configuration of dependence is mainly included according to the deployment of cluster.

Puppet is a kind of cluster configuration management instrument.This instrument typically uses main frame-broker architecture, uses Its built-in puppet modeling language is to cluster resource, including user account, specific file and file, soft End-state and the dependence of part bag and service etc. are described, to complete automatization's clustered deploy(ment) and to join Put.

The role served as due to server in distributed type assemblies differs, and the server brand that different clusters use is also Be not quite similar, so must account for hardware configuration when clustered deploy(ment) is installed, as cpu core number, internal memory, File system division etc..Traditional dispositions method manually, needs to carry out repeatedly manual amendment, efficiency Low and error-prone.

Summary of the invention

In order to overcome the shortcoming of prior art, the invention provides the large data sets of a kind of adaptive hardware configuration Group's dispositions method.

The technical solution adopted in the present invention is: the large data sets group side of deployment of a kind of adaptive hardware configuration Method, comprises the steps:

Step one, puppet-agent send synchronization request to puppet-master；

Puppet-agent is authenticated by step 2, puppet-master；

Step 3, puppet-agent detecting server hardware configuration information are also sent to puppet-master；

Step 4, puppet-master combine in the information analysis manifest that puppet-agent sends The configuration content of corresponding node, generates false code and is sent to puppet-agent；

Step 5, puppet-agent receive false code, and each puppet-agent performs to one's name to save The false code content of point, the automatization completing cluster disposes, and terminates this subsynchronous.

Compared with prior art, the positive effect of the present invention is: the present invention realizes the automatization of big data environment Batch is disposed, and getting software installation generate with configuration file from hardware information detection is all programming realization, nothing Need manual installation software and the tedious work of change configuration file, be the big data of a kind of adaptive hardware configuration Cluster Automation arranging method.The method alleviates large data sets group and builds the workload of deployment, has saved collection The time cost that group disposes, and facilitate follow-up unified management and O&M.

Accompanying drawing explanation

Examples of the present invention will be described by way of reference to the accompanying drawings, wherein:

Fig. 1 is large data sets group's dispositions method FB(flow block) of a kind of adaptive hardware configuration；

Fig. 2 is a kind of method flow block diagram of self-defined fact variable acquisition hardware configuration in puppet.

Detailed description of the invention

Large data sets group's dispositions method of a kind of adaptive hardware configuration, as it is shown in figure 1, comprise the steps:

Step one, puppet-agent send synchronization request to puppet-master；

Puppet-agent is authenticated by step 2, puppet-master:

If request synchronizes for the first time, need puppet-master that puppet-agent is authenticated.This Embodiment uses automatic authentication registration mode, is automatically performed certification after sending synchronization request.Automatically the side of registration Formula needs to delete the existing certificate of master and the agent end of puppet, then revises puppet-master The configuration file autosign.conf of middle puppet, content changes into: *.So after synchronization request sends, Certification can be automatically performed.

Step 3, puppet-agent detecting server hardware configuration information are also sent to puppet-master:

Puppet-agent calls facter, facter and detects some hardware informations of main frame, and saves as fa Ct variable, then transmits these information to puppet-master.Including at puppet-master end certainly The fact variable of definition, as a example by hadoop module, the file hadoop.rb of definition fact variable is synchronizing Before be saved in puppet-master end, this document can before puppet-agent calls facter detection hardware by Master end is sent to the puppet catalogue of agent end: in/var/lib/puppet/lib/facter/.This step Successful execution need to rely on a kind of method that self-defined fact variable obtains hardware configuration in puppet, the method Flow process as in figure 2 it is shown, specifically comprise the following steps that

1) creating .rb file in the hadoop module of puppet, position is /etc/puppet/modules/ha Doop/lib/facter/hadoop.rb, utilizes facter instrument to obtain server hardware configuration information, saves as variable. As a example by hadoop_hdfs_data_dirs variable defined in hadoop.rb file, herein below defines ha Doop_hdfs_data_dirs variable, if the catalogue of existence/data [0-9] * form in file system, then h The value of adoop_hdfs_data_dirs is /data [0-9] */hadoop/dfs/data, otherwise hadoop_hdfs_data The value of _ dirs is /data/hadoop/dfs/data.

In this embodiment, the value of cluster namenode node hadoop_hdfs_data_dirs is /data/hado Op/dfs/data, the value of all datanode node hadoop_hdfs_data_dirs is /data9/hadoop/dfs/ data,/data4/hadoop/dfs/data,/data8/hadoop/dfs/data,/data10/hadoop/dfs/data,/data7/h adoop/dfs/data,/data3/hadoop/dfs/data,/data2/hadoop/dfs/data,/data5/hadoop/dfs/dat a,/data1/hadoop/dfs/data,/data6/hadoop/dfs/data.It can be seen that by self-defined fact variable, Reach the effect of adaptive server hardware configuration.

2) in hadoop module, above-mentioned custom variable is used.Defined variable hadoop_hdfs_data_di The purpose of rs is for the value energy of dfs.datanode.data.dir in the configuration file hdfs-site.xml of hadoop Enough dynamically generate according to hardware configuration, then need the variable of definition to be used in the template file of this configuration file In hdfs-site.xml.erb, position is /etc/puppet/modules/hadoop/templates/hdfs-site.xml.e rb.The position using variable hadoop_hdfs_data_dirs in this template file is as follows.

3) puppet.conf in all nodes of cluster adds pluginsync, to support above making by oneself Justice fact variable mode.Puppet-master end, adds: pluginsyn in [main] of puppet.conf C=true；At puppet-agent end, add in [agent] of puppet.conf: pluginsync=t rue。

Step 4, puppet-master combine in the information analysis manifest that puppet-agent sends corresponding The configuration content of node, generate false code (catalog) and be sent to puppet-agent:

Puppet-master finds node configuration corresponding in manifest according to the host name of puppet-agent, And resolve configuring content by the node of puppet modeling language programming realization.Meanwhile, f Act variable replacement becomes the actual value that puppet-agent is transmitted through.Analysis result is puppet generation built-in for puppet Code (catalog), is then sent to corresponding puppet-agent false code.

Step 5, puppet-agent receive false code, and each puppet-agent performs to one's name node False code content, carries out configuration file transmission simultaneously, and the automatization completing cluster disposes.Terminate this subsynchronous.

Puppet-agent judges either with or without File file upon execution, if it has, then to puppet-master Request, completes the transmission of file.For different puppet-agent, in the template of puppet-master end The fact variate-value life that file hdfs-site.xml.erb can detect according to corresponding puppet-agent and send Become configuration file hdfs-site.xml, and be sent to the appointment position of corresponding puppet-agent.After being finished Terminate to synchronize, after all puppet-agent are with EOS, i.e. complete the deployment of large data sets group.

In above step, puppet-master communicates with all puppet-agent and uses ssl to connect, so Request first needs to be authenticated, to guarantee the safety of subsequent transmission data when connecting.

The present invention mainly uses puppet and facter programming realization large data sets group to dispose, it is not necessary to according to firmly Part configuration is revised manually, has reached the purpose of adaptive hardware configuration, has saved engineering construction Manpower and time cost.It should be noted that in above-described embodiment simply as a example by hadoop module for The bright present invention, when Practical Project is disposed, further relates to other modules, such as hive, hbase, storm etc., combines Closing these modules install and rely on configuration, can complete large data sets group builds deployment.

Claims

1. large data sets group's dispositions method of an adaptive hardware configuration, it is characterised in that: include as follows Step:

Step one, puppet-agent send synchronization request to puppet-master；

Puppet-agent is authenticated by step 2, puppet-master；

Large data sets group's dispositions method of a kind of adaptive hardware the most according to claim 1 configuration, It is characterized in that: described certification is automatic authentication registration mode, automatically during registration, delete the maste of puppet The existing certificate of r and agent end, then the configuration file autosig of puppet in amendment puppet-master N.conf, content changes " * " into.

Large data sets group's dispositions method of a kind of adaptive hardware the most according to claim 1 configuration, It is characterized in that: puppet-agent, then will be hard by calling facter detecting server hardware configuration information Part configuration information saves as fact variable.

Large data sets group's dispositions method of a kind of adaptive hardware the most according to claim 3 configuration, It is characterized in that: the customizing method of described fact variable comprises the steps:

1) creating .rb file in the hadoop module of puppet, position is /etc/puppet/modules/ha Doop/lib/facter/hadoop.rb, utilizes facter instrument to obtain server hardware configuration information, saves as variable；

2) in hadoop module, above-mentioned variable is used；

3) puppet.conf in all nodes of cluster adds pluginsync.

Large data sets group's dispositions method of a kind of adaptive hardware the most according to claim 4 configuration, It is characterized in that: the method adding pluginsync in the puppet.conf in all nodes of cluster is: right In puppet-master end, in [main] of puppet.conf, add " pluginsync=true "；For Puppet-agent end, adds " pluginsync=true " in [agent] of puppet.conf.

Large data sets group's dispositions method of a kind of adaptive hardware the most according to claim 1 configuration, It is characterized in that: the puppet-master described in step 4 combines the information of puppet-agent transmission and enters The method that row resolves is: it is right that puppet-master finds in manifest according to the host name of puppet-agent The node configuration answered, and solve configuring content by the node of puppet modeling language programming realization Analysis；Meanwhile, fact variable replacement is become the actual value that puppet-agent is transmitted through.

Large data sets group's dispositions method of a kind of adaptive hardware the most according to claim 1 configuration, It is characterized in that: puppet-agent described in step 5 performs to sentence during the false code content of to one's name node Disconnected either with or without File file, if it has, then ask to puppet-master, complete the transmission of file.

Large data sets group's dispositions method of a kind of adaptive hardware the most according to claim 1 configuration, It is characterized in that: puppet-master and all puppet-agent communicate and all use ssl to connect.