Apache Spark 3.5 Installation on Windows

  • Post author:
  • Post category:Apache Spark
  • Post last modified:November 17, 2023
  • Reading time:13 mins read

Steps to install Apache Spark 3.5 Installation on Windows – In this article, I will explain step-by-step how to do Apache Spark 3.5 Installation on Windows OS 7, 10, and the latest version and how to start a history server and monitor your jobs using Web UI.

Related:

Install Java 8 or Later

To install Apache Spark 3.5 on Windows 10 or 11, you would need to install Java/JDK 8, 11, 17, or the latest version either from oracle.com or https://openjdk.org/ or https://jdk.java.net/ on your system.

After downloading, double-click on the downloaded .exe file to install it on your Windows system, choose any custom directory, or keep the default location.

Note: This article explains Installing Apache Spark 3.5 with Java 17; the same steps will also work for Java 8, 11, and 13 versions.

Apache Spark Installation on Windows

Apache Spark comes in compressed tar/zip files; hence, installation on Windows is not much of a deal as you need to download and untar the file. Download Apache Spark by accessing the Spark Download page and selecting the link from “Download Spark (point 3 from below screenshot)”.

If you want to use a different version of Spark & Hadoop, select the one you want from the drop-down; the link on point 3 changes to the selected version and provides you with an updated link to download.

Apache Spark install on Windows

After download, untar the binary using 7zip or any zip utility to extract the zip file and copy the extracted directory spark-3.5.0-bin-hadoop3 to c:\apps\opt\spark-3.5.0-bin-hadoop3

Spark Environment Variables

Post Java and Apache Spark installation on Windows, set JAVA_HOME, SPARK_HOME, HADOOP_HOME, and PATH environment variables. If you know how to set the environment variable on Windows, add the following.


JAVA_HOME = C:\Program Files\Java\jdk1.8.0_201
SPARK_HOME  = C:\apps\opt\spark-3.5.0-bin-hadoop3
HADOOP_HOME = C:\apps\opt\spark-3.5.0-bin-hadoop3

PATH=%PATH%;%SPARK_HOME%\bin;%JAVA_HOME%\bin

Follow the steps below if you are unaware of how to add or edit environment variables on Windows.

  1. Open the System Environment Variables window and select Environment Variables.
Apache Spark Installation windows

2. On the following Environment variable screen, add SPARK_HOME, HADOOP_HOME, JAVA_HOME by selecting the New option.

3. This opens up the New User Variables window where you can enter the variable name and value. Add respective paths to these variables.

4. Now Edit the PATH variable.

apache spark install windows

5. Add Spark, Java, and Hadoop bin locations by selecting the New option.

spark install windows

Spark with winutils.exe on Windows

Many beginners think Apache Spark needs a Hadoop cluster installed to run, but that’s not true; Spark can run on AWS using S3 and Azure using blob storage without Hadoop and HDFSe.t.c.

To run Apache Spark on Windows, you need winutils.exe as it uses POSIX like file access operations in Windows using Windows API.

winutils.exe enables Spark to use Windows-specific services, including running shell commands on a Windows environment.

Download winutils.exe for Hadoop 3.3 and copy it to %SPARK_HOME%\bin folder. Winutils differ for each Hadoop version; hence, download the right version based on your Spark vs Hadoop distribution.

Apache Spark shell

spark-shell is a CLI utility that comes with Apache Spark distribution, open command prompt, go to cd %SPARK_HOME%/bin and type spark-shell command to run Apache Spark shell. You should see something like this below (ignore the error you see at the end). Sometimes, your Spark instance may take a minute or two to initialize to get to the below screen.

apache spark install windows

Spark-shell also creates a Spark context web UI and by default, it can access from http://localhost:4041.

On spark-shell command line, you can run any Spark statements like creating an RDD, getting a Spark version e.t.c


scala> spark.version
res2: String = 3.5.0

scala> val rdd = sc.parallelize(Array(1,2,3,4,5,6,7,8,9,10))
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at console:24

scala>

This completes the installation of Apache Spark on Windows 10, 11, or any latest version.

Where to go Next?

You can continue following the below document to see how you can debug the logs using Spark Web UI and enable the Spark history server or follow the links as next steps

Web UI on Windows

Apache Spark provides a suite of Web UIs (Jobs, Stages, Tasks, Storage, Environment, Executors, and SQL) to monitor the status of your Spark application, resource consumption of Spark cluster, and Spark configurations. On Spark Web UI, you can see how the operations are executed.

apache spark install windows

History Server

The History server keeps a log of all Spark applications you submit by spark-submit, spark-shell. You can enable Spark to collect the logs by adding the below configs to spark-defaults.conf file, conf file is located at %SPARK_HOME%/conf directory.


spark.eventLog.enabled true
spark.history.fs.logDirectory file:///c:/logs/path

After setting the above properties, start the history server by starting the below command.


$SPARK_HOME/bin/spark-class.cmd org.apache.spark.deploy.history.HistoryServer

By default, the History server listens at 18080 port and you can access it from the browser using http://localhost:18080/

spark history server
Spark History Server

By clicking on each App ID, you will get the details of the application in Spark web UI.

Conclusion

In summary, you have learned how to install Apache Spark on Windows and run sample statements in spark-shell, and learned how to start to spark web-UI and history server.

If you have any issues, setting up, please message me in the comments section, and I will try to respond with the solution.

Happy Learning !!

Related Articles

Prabha

Prabha is an accomplished data engineer with a wealth of experience in architecting, developing, and optimizing data pipelines and infrastructure. With a strong foundation in software engineering and a deep understanding of data systems, Prabha excels in building scalable solutions that handle diverse and large datasets efficiently. At SparkbyExamples.com Prabha writes her experience in Spark, PySpark, Python and Pandas.

Leave a Reply

This Post Has 3 Comments

  1. Anonymous

    Download wunutils.exe for Hadoop 2.7 -> Please correct the meaning of winutils.

  2. Karthik

    Hi,

    Spark Setup on Hadoop Cluster with Yarn Page is not getting loaded completely. It will be great help if you can share step by step approach to instal spark on hadoop cluster with yarn

    1. NNK

      Hi Karthik, I have fixed it. Could you please check. Thanks for letting me know.