Spark Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions

Somehow I got Python 3.4 & 2.7 installed on my Linux cluster and while running the PySpark application, I was getting Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions. I spent some time looking at it on google and found a solution, here I would like to show how to resolve this error.


Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions

How to Change the Spark/PySpark Driver Python Version?

Regardless of what version of Spark/PySpark version you are using both Driver and all workers should have the same Python version. Run which python command to get the python installation directory.


which python

In order to fix this set the python environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON on ~/.bashrc file to the python installation path.


export PYSPARK_PYTHON=/python-path
export PYSPARK_DRIVER_PYTHON=/python-path

After adding these environment to ~/.bashrc, reload this file by using source command.


source ~/.bashrc

You can also try adding these environment variable to <SPARK_HOME>/conf/spark-env.sh file.

Hope this resolves your issue of having a different Python version between Spark/PySpark driver and worker nodes. In case it doesn’t resolve, follow the below reference for more solutions to this issue.

Happy Learning !!

Reference

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

This Post Has One Comment

  1. Anonymous

    thank man.