Somehow I got Python 3.4 & 2.7 installed on my Linux cluster and while running the PySpark application, I was getting
Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions. I spent some time looking at it on google and found a solution, here I would like to show how to resolve this error.
Exception: Python in worker has different version 3.4 than that in driver 2.7,
PySpark cannot run with different minor versions
1. How to Change the Spark/PySpark Driver Python Version?
Regardless of what version of Spark/PySpark version you are using both Driver and all workers should have the same Python version. Run
which python command to get the python installation directory.
In order to fix this set the python environment variables
~/.bashrc file to the python installation path.
After adding these environment to
~/.bashrc, reload this file by using
You can also try adding these environment variable to
Hope this resolves your issue of having a different Python version between Spark/PySpark driver and worker nodes. In case it doesn’t resolve, follow the below reference for more solutions to this issue.
Happy Learning !!
- How to Spark Submit Python | PySpark File (.py)?
- SOLVED: py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM
- Ways to Install Pyspark for Python