Python: No module named ‘pyspark’ Error

How to resolve No module named ‘pyspark’ Error in Jupyter notebook and any python editor? In python when you try to import PySpark library without installing or properly setting environment variables you would get ‘No module named ‘pyspark’ error’.

ModuleNotFoundError: No module named 'pyspark'

1. Install PySpark to resolve No module named ‘pyspark’ Error

Note that PySpark doesn’t come with Python installation hence it will not be available by default, in order to use, first you need to install pyspark by using pip or conda (if you are using anaconda) commands.

$ pip install pyspark

Even after successful installing Spark/PySpark on Linux/windows/mac, you may still have issues importing PySpark libraries in Python, below I have explained some possible ways to resolve the import issues.

Note: Do not use Python shell or Python command to run PySpark program.

2. Using findspark

Even after installing PySpark you are getting “No module named pyspark" in Python, this could be due to environment variables issues, you can solve this by installing and import findspark.

findspark library searches pyspark installation on the server and adds PySpark installation path to sys.path at runtime so that you can import PySpark modules. In order to use first, you need to Install findspark using pip command.

pip install findspark 

Now run the below commands in sequence on Jupyter Notebook or in Python script.

import findspark

import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[1]").appName("").getOrCreate()

3. Setting Environment Variables

To set PySpark environment variables, first, get the PySpark installation direction path by running the Python command pip show.

pip show pyspark

Now set the SPARK_HOME & PYTHONPATH according to your installation, For my articles, I run my PySpark programs in Linux, Mac and Windows hence I will show what configurations I have for each. After setting these, you should not see "No module named pyspark” while importing PySpark in Python.

3.1 Linux on Ubuntu

export SPARK_HOME=/Users/prabha/apps/spark-2.4.0-bin-hadoop2.7

Put these on .bashrc file and re-load the file by using source ~/.bashrc

3.2 Mac OS

On Mac I have Spark 2.4.0 version, hence the below variables.

export SPARK_HOME=/usr/local/Cellar/apache-spark/2.4.0
export PYTHONPATH=$SPARK_HOME/libexec/python:$SPARK_HOME/libexec/python/build:$PYTHONPATH

Put these on .bashrc file and re-load the file by using source ~/.bashrc

3.3 Windows PySpark environment

For my windows environment, I have the PySpark version spark-3.0.0-bin-hadoop2.7 so below are my environment variables. Set these on the Windows environment variables screen.

set SPARK_HOME=C:\apps\opt\spark-3.0.0-bin-hadoop2.7

If you have a different Spark version, use the version accordingly.


In summary, you can resolve No module named pyspark error by importing modules/libraries in PySpark (shell/script) either by setting the right environment variables or installing and using findspark module.

Happy Learning !!

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing Python: No module named ‘pyspark’ Error