Description

Building a HA hadoop cluster with active passive namenode pair.

namenode is the controller for HDFS reads and writes. In HA mode namenode edits the journalnodes log, the standby then reads this log and applys to its own copy.

https://hadoop.apache.org/docs/r2.9.2/hadoop-project-dist/hadoop-common/ClusterSetup.html

Solution

Create a new hadoop Cluster named lab1. 3 KVMs all running instances of zookeeper, and journalnode. node 1 will be the active namenode, node2 will be the standby namenode.

All hosts will run datanodes with a replication factor of 2.

git repo hosting deploy with ansible to come.

ansible-playbook hadoop.yml --diff -i inventory -u root -e "bootstrap=True"

Configuration

core-site.xml : Config for HA HDFS cluster naming.

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://lab1</value>
    </property>
</configuration>

hdfs-site.xml : Config for HDFS. Active/Passive Namenode config. 3 way minimum cluster n+1 config. replication factor of 2 for storage.

<configuration>
	<property>
		<name>dfs.nameservices</name>
		<value>lab1</value>
	</property>
	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	</property>
	<property>
		<name>dfs.ha.namenodes.lab1</name>
		<value>nn1,nn2</value>
	</property>
	<property>
		<name>dfs.namenode.rpc-address.lab1.nn1</name>
		<value>hadoop-n1.home.lan:8020</value>
	</property>
	<property>
		<name>dfs.namenode.rpc-address.lab1.nn2</name>
		<value>hadoop-n2.home.lan:8020</value>
	</property>
	<property>
		<name>dfs.namenode.http-address.lab1.nn1</name>
		<value>hadoop-n1.home.lan:50070</value>
	</property>
	<property>
		<name>dfs.namenode.http-address.lab1.nn2</name>
		<value>hadoop-n2.home.lan:50070</value>
	</property>
	<property>
		<name>dfs.namenode.shared.edits.dir</name>
		<value>qjournal://hadoop-n1.home.lan:8485;hadoop-n2.home.lan:8485;hadoop-n3.home.lan:8485/lab1</value>
	</property>

	<property>
		<name>dfs.journalnode.edits.dir</name>
	 	<value>/data1/hadoop/hdfs/jn</value>
	</property>

	<property>
		<name>dfs.client.failover.proxy.provider.lab1</name>
		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
	</property>
	<property>
		<name>dfs.ha.fencing.methods</name>
		<value>sshfence</value>
	</property>
<!--
	<property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>
		<value>/root/.ssh/id_rsa</value>
	</property>
-->
	<property>
		<name>dfs.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>

	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:///data1/hadoop/hdfs/nn</value>
	</property>

	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:///data1/hadoop/hdfs/dn</value>
	</property>

	<property>
		<name>dfs.replication</name>
		<value>2</value>
	</property>

	<property>
		<name>ha.zookeeper.quorum</name>
		<value>hadoop-n1.home.lan:2181,hadoop-n2.home.lan:2181,hadoop-n3.home.lan:2181</value>
	</property>
</configuration>

Initialise a new hdfs cluster

v2

Note: zookeeper must be running on all 3 nodes.

Start journal node on all 3 servers.

hadoop-daemon.sh start journalnode

Initalise and start nodename on hadoop-n1 (active)

hdfs namenode -format
hadoop-daemon.sh start namenode

Initialise and start namenode on hadoop-n2 (standby)

hdfs namenode —bootstrapStandby
hadoop-daemon.sh start namenode

Start all dfs services, this will start the ZK failover controller, namenodes will then become Active/Passive.

start-dfs.sh

Confirm status of namenodes

Active:

http://hadoop-n1.home.lan:50070/dfshealth.html#tab-overview

Standby:

http://hadoop-n2.home.lan:50070/dfshealth.html#tab-overview

Confirm presence of datanodes:

http://hadoop-n1.home.lan:50070/dfshealth.html#tab-datanode

Review status of a datanode:

http://hadoop-n1.home.lan:50075/datanode.html

Upload a file in the Utilities » Browse filesystem

http://hadoop-n1.home.lan:50070/explorer.html#/

TODO Yarn

Start yarn services:

[[email protected] sbin]# ./start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/logs/yarn-root-resourcemanager-hadoop-n1.home.lan.out
localhost: starting nodemanager, logging to /opt/hadoop/logs/yarn-root-nodemanager-hadoop-n1.home.lan.out

Check yarn scheduler status:

http://hadoop-n1.home.lan:8088/cluster

references: