Hadoop Yarn Configuration on Cluster

  • Post author:
  • Post category:Hadoop
  • Post last modified:August 25, 2021
Once you have Apache Hadoop Installation completes and able to run HDFS commands, the next step is to do Hadoop Yarn Configuration on Cluster. This post explains how to setup Yarn master on the Hadoop cluster and run a map-reduce example. Before you proceed with this document, please make sure you have Apache Hadoop Installation and the Hadoop cluster is up and running. if you do not have a setup, please follow the below link to set up your cluster and come back to this page. Apache Hadoop Multi Node Cluster Setup on Ubuntu By default Yarn comes with Hadoop distribution hence there is no need of additional installation, just you need to configure to use Yarn and some memory/core settings.

1. Configure yarn-site.xml

On yarn-site.xml file, configure default node manager memory, yarn scheduler minimum, and maximum memory configurations.
<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>1536</value>
</property>
<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>1536</value>
</property>
<property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>128</value>
</property>
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

2. Configure mapred-site.xml file

Add below properties to mapred-site.xml file
<property>
	<name>yarn.app.mapreduce.am.resource.mb</name>
	<value>512</value>
</property>
<property>
	<name>mapreduce.map.memory.mb</name>
	<value>256</value>
</property>
<property>
	<name>mapreduce.reduce.memory.mb</name>
	<value>256</value>
</property>
<property>
	<name>yarn.app.mapreduce.am.env</name>
	<value>HADOOP_MAPRED_HOME=$HADOOP_MAPRED_HOME</value>
</property>
<property>
	<name>mapreduce.map.env</name>
	<value>HADOOP_MAPRED_HOME=$HADOOP_MAPRED_HOME</value>
</property>
<property>
	<name>mapreduce.reduce.env</name>
	<value>HADOOP_MAPRED_HOME=$HADOOP_MAPRED_HOME</value>
</property>

3. Configure Data Nodes

Copy yarn-site.xml and mapred-site.xml files to all 3 data nodes (I have 3 data nodes) Below is an example to copy to datanode1 using the SCP command. repeat this setup for all your data nodes.
scp hadoop/etc/hadoop/yarn-site.xml datanode1:/home/ubuntu/hadoop/etc/hadoop/
scp hadoop/etc/hadoop/mapred-site.xml datanode1:/home/ubuntu/hadoop/etc/hadoop/

4. Start YARN from Name Node/node-master

start-yarn.sh
You should see following lines.
start-yarn.sh
[email protected]:~$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
jps on namenode should list the following.
[email protected]:~$ jps
11281 Jps
10740 SecondaryNameNode
10442 NameNode
10972 ResourceManager
Note that SecondaryNameNode & NameNode were started with start-hdfs.sh file. With start-yarn.sh command it started ResourceManager on namenode and NodeManager on data nodes. Now on any datanode run jps command and confirm NadeManager is running.
[email protected]:~$ jps
10273 DataNode
10648 Jps
10463 NodeManager

5. Stop Yarn

To stop YARN, run the following command on Name node.
stop-yarn.sh

6. Yarn UI

Start your Yarn in case if you have stopped it. Now open your favorite browser and enter http://192.168.1.100:8088/cluster (replace 92.168.1.10 with your namenode ip)
Yarn Hadoop Cluster

7. Run MapReduce Example.

yarn jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar wordcount "books/*" output

8. Check Job Status on Yarn UI.

You should see an entry with application ID similar to “application_1547102810368_0001”  and the status “FINISHED” state.

9. Yarn logs

To look at the yarn logs, get your job application ID from Yarn UI and run below command.
yarn logs -applicationID application_1547102810368_0001

Next Steps

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply