In this article, I will explain how to setup and run the PySpark application on the Spyder IDE. Spyder IDE is a popular tool to write and run Python applications and you can use this tool to run PySpark application during the development phase.
Install Java 8 or later version
PySpark uses Py4J library which is a Java library that integrates python to dynamically interface with JVM objects when running the PySpark application. Hence, you would need Java to be installed. Download the Java 8 or later version from Oracle and install it on your system.
Post installation, set JAVA_HOME
and PATH
variable.
JAVA_HOME = C:\Program Files\Java\jdk1.8.0_201
PATH = %PATH%;C:\Program Files\Java\jdk1.8.0_201\bin
Install Apache Spark
Access the Apache Spark download page and locate the “Download Spark” link (point 3). If you wish to use a different version of Spark and Hadoop, choose your desired versions from the dropdown menus. As you make your selections, the link mentioned in point 3 dynamically updates to reflect the chosen versions, providing you with an updated link for downloading.
After downloading, untar the binary file using unzip utility and copy the underlying folder spark-3.0.0-bin-hadoop2.7
to c:\apps
Set the following environment variables.
SPARK_HOME = C:\apps\spark-3.0.0-bin-hadoop2.7
HADOOP_HOME = C:\apps\spark-3.0.0-bin-hadoop2.7
PATH=%PATH%;C:\apps\spark-3.0.0-bin-hadoop2.7\bin
PySpark shell
Now open the command prompt and type pyspark
to run PySpark shell.
Run PySpark application from Spyder IDE
To develop PySpark applications, you’ll require an Integrated Development Environment (IDE), and there are numerous options available. I’ve opted to utilize the Spyder IDE. If you haven’t already installed the Spyder IDE, it’s necessary to do so before continuing with your PySpark development tasks.
Now, set the following environment variable.
PYTHONPATH => %SPARK_HOME%/python;$SPARK_HOME/python/lib/py4j-0.10.9-src.zip;%PYTHONPATH%
Now open Spyder IDE and create a new file with the below simple PySpark program and run it. You should see 5 in output.
Happy Learning !!