Somehow I got Python 3.4 & 2.7 installed on my Linux cluster and while running the PySpark application, I was getting Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions
. I spent some time looking at it on google and found a solution, here I would like to show how to resolve this error.
Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions
How to Change the Spark/PySpark Driver Python Version?
Regardless of what version of Spark/PySpark version you are using both Driver and all workers should have the same Python version. Run which python
command to get the python installation directory.
which python
In order to fix this set the python environment variables PYSPARK_PYTHON
and PYSPARK_DRIVER_PYTHON
on ~/.bashrc
file to the python installation path.
export PYSPARK_PYTHON=/python-path
export PYSPARK_DRIVER_PYTHON=/python-path
After adding these environment to ~/.bashrc
, reload this file by using source
command.
source ~/.bashrc
You can also try adding these environment variable to <SPARK_HOME>/conf/spark-env.sh
file.
Hope this resolves your issue of having a different Python version between Spark/PySpark driver and worker nodes. In case it doesn’t resolve, follow the below reference for more solutions to this issue.
Happy Learning !!
Related Articles
- How to Spark Submit Python | PySpark File (.py)?
- SOLVED: py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM
- Python set difference()
- Spark printSchema() Example
- Python set union() Function
- Ways to Install Pyspark for Python
thank man.