This repo is forked from lresende/docker-yarn-cluster and updated to make it run without bugs on my MacBook. I made these changes:
- Update Java 8 and hadoop 2.7.4 download urls, original urls are invalid.
- Update
bootstrap.sh
, we don't need to runstart-dfs.sh
andstart-yarn.sh
in slaves. As long as the master node knows the slaves,start-dfs.sh
andstart-yarn.sh
will be able to startNamenode
andResourceManager
on master node, withDatanode
andNodeManager
on each slave node. Ref: this post. - Use
docker-compose.yml
configuration file. One big issue is how to make the containers know each other. Byknow
I mean hadoop and yarn knows each container's ip address. So I create a custom network, set a static ip address for each container in the network, and withextra_hosts
, the host-to-ip mappings are added to each container'setc/hostname
. - Add
wordcount.sh
to verify the cluster is set up correctly.
- build the image with tag
yarn-cluster
:
./build
docker-compose up
-
the cluster will have a
namenode
container as master, anddatanode1
,datanode2
anddatanode3
containers as slaves. -
ssh into namenode:
docker exec -it namenode sh
and runjps
, it will show:
561 Jps
489 ResourceManager
125 NameNode
333 SecondaryNameNode
- ssh into any datanode, for example datanode1:
docker exec -it datanode1 sh
and runjps
, it will show:
48 DataNode
152 NodeManager
285 Jps
- ssh into any container, for example:
docker exec -it datanode3 sh
, and run./wordcount.sh
, it will eventually comes up with this output:
Docker 2
Hadoop 1
Hello 3
World 1
in 1
- To add or remove datanodes, we only need to update 2 files:
slaves
anddocker-compose.yml
. And then restart containers by:
# Stop and remove containers
docker-compose down
# Create and start containers
docker-compose up
-
For example, if I want to add 2 more datanodes, just add
datanode4
anddatanode5
inslaves
, each line with one node. and indocker-compose.yml
, follow the same way we dodatanode3
, give them newipv4_address
and differentports
, then update each container'sextra_hosts
with the newhostname:ip
pairs. -
Note the beautiful thing about this is that I don't need to rebuild the image at all! Because all docker container related configurations is in
docker-compose.yml
, andslaves
file is mounted to the namenode container's/var/tmp/slaves
file. So it is fast and simple.
- To access hadoop's web UI, first find docker machine's ip:
docker-machine ip
, then go tohttp://192.168.99.100:8088/
(192.168.99.100 is the docker machine's ip address)