Problem: While running PySpark application through spark-submit, Spyder or even from PySpark shell I am getting Pyspark: Exception: Java gateway process exited before sending the driver its port number.
Solution: Pyspark: Exception: Java gateway process exited before sending the driver its port number
In order to run PySpark (Spark with Python) you would need to have Java installed on your Mac, Linux or Windows, without Java installation & not having JAVA_HOME
environment variable set with Java installation path or not having PYSPARK_SUBMIT_ARGS
, you would get Exception: Java gateway process exited before sending the driver its port number.
Set PYSPARK_SUBMIT_ARGS
Set PYSPARK_SUBMIT_ARGS with master, this resolves Exception: Java gateway process exited before sending the driver its port number.
export PYSPARK_SUBMIT_ARGS="--master local[3] pyspark-shell"
vi ~/.bashrc
, add the above line and reload the bashrc file using source ~/.bashrc
Incase if issue still doesn’t resolve, check your Java installation and JAVA_HOME
environment variable.
Install Open JDK
Why you need Java to run PySpark?
Spark basically written in Scala and later on due to its industry adaptation it’s API PySpark released for Python using Py4J. Py4J
is a Java library that is integrated within PySpark and allows python to dynamically interface with JVM objects, hence to run PySpark you also need Java to be installed along with Python, and Apache Spark.
Use below commands to install OpenJDK or Oracle JDK on Linux Ubuntu.
# To Install Open JDK
sudo add-apt-repository ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install openjdk-11-jdk
# To Install Oracke JDK varsion 8
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
Set JAVA_HOME Environment Variable
Now export JAVA_HOME with the java installation directory.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
vi ~/.bashrc
, add the above line and reload the bashrc file using source ~/.bashrc
Happy Learning
Related Articles
- PySpark “ImportError: No module named py4j.java_gateway” Error
- How to Import PySpark in Python Script
- PySpark install on Windows
- PySpark NOT isin() or IS NOT IN Operator
- PySpark alias() Column & DataFrame Examples
- Fonctions filter where en PySpark | Conditions Multiples