This post explains how to setup and run Spark applications on the Hadoop with Yarn cluster manager and will run a spark example on the Yarn cluster manager.
Spark Install and Setup
1. Download Apache spark latest version from here
2. Once your download is complete, unzip the file’s contents using tar, a file archiving tool and rename the folder to spark
tar -xzf spark-2.4.0-bin-hadoop2.7.tgz mv spark-2.4.0-bin-hadoop2.7 spark
3. Add spark environment variables to .bashrc or .profile file. open file in vi editor and add below variables.
vi ~/.bashrc export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export SPARK_HOME=/home/ubuntu/spark export PATH=$PATH:$SPARK_HOME/bin export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$LD_LIBRARY_PATH
Now load the environment variables to the opened session by running below command
In case if you added to .profile file then restart your session by logging out and logging in again.
4. Finally, edit $SPARK_HOME/conf/spark-defaults.conf and set spark.master to yarn
spark.master yarn spark.driver.memory 512m spark.yarn.am.memory 512m spark.executor.memory 512m
With this, Spark setup completes with Yarn. Now let’s try to run sample job that comes with Spark binary distribution.
5. Run Sample spark job
spark-submit --deploy-mode client --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.11-2.4.0.jar 10
Spark History server
1. Configure history server
edit $SPARK_HOME/conf/spark-defaults.conf file and add below properties.
spark.eventLog.enabled true spark.eventLog.dir hdfs://namenode:9000/spark-logs spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider spark.history.fs.logDirectory hdfs://namenode:9000/spark-logs spark.history.fs.update.interval 10s spark.history.ui.port 18080
2. Run history server
As per the configuration, history server runs on 18080 port.
3. Run spark job again, and access below Spark UI to check the logs and status of the job.
Since you have Spark jobs running on the cluster, you can explore Spark examples from GitHub project.