You are currently viewing Setup and run PySpark on Spyder IDE

In this article, I will explain how to setup and run the PySpark application on the Spyder IDE. Spyder IDE is a popular tool to write and run Python applications and you can use this tool to run PySpark application during the development phase.


Install Java 8 or later version

PySpark uses Py4J library which is a Java library that integrates python to dynamically interface with JVM objects when running the PySpark application. Hence, you would need Java to be installed. Download the Java 8 or later version from Oracle and install it on your system.

Post installation, set JAVA_HOME and PATH variable.

JAVA_HOME = C:\Program Files\Java\jdk1.8.0_201
PATH = %PATH%;C:\Program Files\Java\jdk1.8.0_201\bin

Install Apache Spark

Access the Apache Spark download page and locate the “Download Spark” link (point 3). If you wish to use a different version of Spark and Hadoop, choose your desired versions from the dropdown menus. As you make your selections, the link mentioned in point 3 dynamically updates to reflect the chosen versions, providing you with an updated link for downloading.

Pyspark installation

After downloading, untar the binary file using unzip utility and copy the underlying folder spark-3.0.0-bin-hadoop2.7 to c:\apps

Set the following environment variables.

SPARK_HOME  = C:\apps\spark-3.0.0-bin-hadoop2.7
HADOOP_HOME = C:\apps\spark-3.0.0-bin-hadoop2.7

PySpark shell

Now open the command prompt and type pyspark to run PySpark shell.

pyspark shell

Run PySpark application from Spyder IDE

To develop PySpark applications, you’ll require an Integrated Development Environment (IDE), and there are numerous options available. I’ve opted to utilize the Spyder IDE. If you haven’t already installed the Spyder IDE, it’s necessary to do so before continuing with your PySpark development tasks.

Now, set the following environment variable.


Now open Spyder IDE and create a new file with the below simple PySpark program and run it. You should see 5 in output.

PySpark application running on Spyder IDE

Happy Learning !!

Leave a Reply