8000 GitHub - rtvt123/hadoop-ansible: Ansible Playbook that installs a CDH4 Hadoop cluster (running on Java 7), with Ganglia, Fluentd, ElasticSearch and Kibana 3 for monitoring and centralized log indexing. NEW: Deploys Hive Metastore and Facebook Presto!
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Ansible Playbook that installs a CDH4 Hadoop cluster (running on Java 7), with Ganglia, Fluentd, ElasticSearch and Kibana 3 for monitoring and centralized log indexing. NEW: Deploys Hive Metastore and Facebook Presto!

Notifications You must be signed in to change notification settings

rtvt123/hadoop-ansible

 
 

Repository files navigation

Hadoop Ansible Playbook Build Status

Ansible Playbook that installs a CDH4 Hadoop cluster (running on Java 7, supported from CDH 4.4), with Ganglia, Fluentd, ElasticSearch and Kibana 3 for monitoring and centralized log indexing.

Hire/Follow @analytically. NEW: Deploys Hive Metastore and Facebook Presto!

Requirements

  • Ansible 1.4 or later
  • 6 + 1 Ubuntu 12.04 LTS, 13.04 or 13.10 hosts - see ubuntu-netboot-tftp if you need automated server installation
  • Mandrill API key for sending emails
  • ansibler user in sudo group without sudo password prompt (see Bootstrapping section below)

Cloudera (CDH4) Hadoop Roles

If you're assembling your own Hadoop playbook, these roles are available for you to reuse:

Configure

Customize the following files:

Required:

Optional:

Role Vars

When specifying/reusing roles, one can override the vars, eg.:

- { role: postfix_mandrill, postfix_domain: example.com, mandrill_username: joe, mandrill_api_key: 123 }

Adding hosts

Edit the hosts file and list hosts per group (see Inventory for more examples):

[datanodes]
hslave010
hslave[090:252]
hadoop-slave-[a:f].example.com

Make sure that the zookeepers and journalnodes groups contain at least 3 hosts and have an odd number of hosts.

Ganglia nodes

Since we're using unicast mode for Ganglia (which significantly reduces chatter), you may have to wait 60 seconds after node startup before it is seen/shows up in the web interface.

Installing Hadoop

To run Ansible:

./site.sh

To e.g. just install ZooKeeper, add the zookeeper tag as argument (available tags: apache, bonding, configuration, elasticsearch, fluentd, ganglia, hadoop, hbase, hive, java, kibana, ntp, presto, rsyslog, tdagent, zookeeper):

./site.sh zookeeper

What else is installed?

Performance testing

Instructions on how to test the performance of your CDH4 cluster.

  • SSH into one of the machines.
  • Change to the hdfs user: sudo su - hdfs
  • Set HADOOP_MAPRED_HOME: export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
  • cd /usr/lib/hadoop-mapreduce
TeraGen and TeraSort
  • hadoop jar hadoop-mapreduce-examples.jar teragen -Dmapred.map.tasks=1000 10000000000 /tera/in to run TeraGen
  • hadoop jar hadoop-mapreduce-examples.jar terasort /tera/in /tera/out to run TeraSort
DFSIO
  • hadoop jar hadoop-mapreduce-client-jobclient-2.0.0-cdh4.5.0-tests.jar TestDFSIO -write

Bootstrapping

Paste your public SSH RSA key in bootstrap/ansible_rsa.pub and run bootstrap.sh to bootstrap the nodes specified in bootstrap/hosts. See bootstrap/bootstrap.yml for more information.

What about Pig, Flume, etc?

You can manually install additional components after running this playbook. Follow the official CDH4 Installation Guide.

Screenshots

zookeeper

hmaster01

ganglia

kibana

License

Licensed un 4781 der the Apache License, Version 2.0.

Copyright 2013 Mathias Bogaert.

About

Ansible Playbook that installs a CDH4 Hadoop cluster (running on Java 7), with Ganglia, Fluentd, ElasticSearch and Kibana 3 for monitoring and centralized log indexing. NEW: Deploys Hive Metastore and Facebook Presto!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0