How to resolve Python: No module named ‘findspark’ Error in Jupyter notebook or any Python editor while working with PySpark? In Python when you try to import the PySpark library without installing
findspark or properly setting environment variables you would get a no module named ‘findspark’ error.
findspark module doesn’t come with Python or Pyspark installation as it is not part of the Python and Spark distribution respectively hence, you need to install this module and set appropriate environment variables before using this.
>>> import pyspark Traceback (most recent call last): File "", line 1, in ImportError: No module named 'findspark'
The error message No module named ‘findspark’ means that the Python interpreter is unable to find the
findspark module, which is required for using Apache Spark with Python.
Here are some steps you can take to resolve this error:
Table of contents
1. Using Pip to Install findspark Module
To resolve the No module named ‘findspark‘ Error, check if you have installed the
findspark module, if not install this module using the pip.
- Open your command prompt or terminal and run the following command
pip show findspark.
- If the module is not installed, you can install it using pip by running the command
pip install findspark.
2. Check Spark Installation Path
Check if you have set the correct path for Spark. If you have installed Spark on your system, make sure that you have set the correct path for it. To resolve the error No module named ‘findspark‘, you need to set the path using the
findspark.init() function, as follows:
# Import findspark module import findspark findspark.init("/path/to/spark")
Replace “/path/to/spark” with the actual path where Spark is installed on your system.
3. Setting Environment Variables
Check if you have set the SPARK_HOME environment variable. Post Spark/PySpark installation you need to set the
SPARK_HOME environment variable with the installation directory of Spark. Use
pip show pyspark to get the path of Spark installation directory and use this value to set to SPARK_HOME.
# Show pyspark installation path pip show pyspark
# On Linux/MacOS export SPARK_HOME=/path/to/spark export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
Replace “/path/to/spark” or “C:\path\to\spark” with the actual path where Spark is installed on your system. Put these on
.bashrc file and re-load the file by using
For my Windows environment, I have the PySpark version
spark-3.0.0-bin-hadoop2.7 so below are my environment variables.
# On windows set SPARK_HOME=C:\path\to\spark set HADOOP_HOME=%SPARK_HOME% set PYTHONPATH=%SPARK_HOME%/python;%SPARK_HOME%/python/lib/py4j-0.10.9-src.zip;%PYTHONPATH%
Replace “/path/to/spark” or “C:\path\to\spark” with the actual path where Spark is installed on your system.
4. Other methods to Fix “No module named ‘findspark‘”
If none of the above steps work to resolve the error No module named ‘findspark’,
- try uninstalling and reinstalling the
findsparkmodule using pip.
- You can uninstall the module using the command
pip uninstall findspark, and
- then reinstall it using the command
pip install findspark.
Hopefully, one of these steps will help you resolve the error.
The “No module named ‘findspark'” error occurs when the
findspark module is not installed or cannot be found in your Python environment to find PySpark. To resolve this error, you need to install the
findspark module using pip and set the path to the Spark installation directory using the
findspark.init() function or the
SPARK_HOME environment variable in your Python code.
- Python: No module named ‘pyspark’ Error
- Spark Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions
- Spark Session configuration in PySpark.
- PySpark Create DataFrame from List
- Python json.loads() Method with Examples
- PySpark Write to CSV File