How to Find PySpark Version?

I will quickly cover different ways to find the PySpark (Spark with python) installed version through the command line and runtime. You can use these options to check the PySpark version in Hadoop (CDH), Aws Glue, Anaconda, Jupyter notebook e.t.c on Mac, Linux, Windows, CentOS.

1. Find PySpark Version from Command Line

Like any other tools or language, you can use –version option with spark-submit, spark-shell, pyspark and spark-sql commands to find the PySpark version.

pyspark --version
spark-submit --version
spark-shell --version
spark-sql --version

All above spark-submit command, spark-shell command, pyspark shell command, and spark-sql return the below output where you can check PySpark installed version.

find pyspark version
pyspark –version

As you see it displays the spark version along with Scala version 2.12.10 and Java version. For Java, I am using OpenJDK hence it shows the version as OpenJDK 64-Bit Server VM, 11.0-13.

2. Check Version From Shell

Additionally, you are in pyspark-shell and you wanted to check the PySpark version without exiting pyspark-shell, you can achieve this by using the sc.version. sc is a SparkContect variable that default exists in pyspark-shell. Use the below steps to find the spark version.

  1. cd to $SPARK_HOME/bin
  2. Launch pyspark-shell command
  3. Enter sc.version or spark.version

sc.version and spark.version returns a version as a string type.

pyspark check version

3. Find PySpark Version from Runtime

Imagine you are writing a PySpark application and you wanted to find the PySpark version during runtime, you can get it by accessing the version or sparkContext.version properties from the SparkSession object.

# Import PySpark
import pyspark
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.master("local[1]") \
                    .appName('') \

print('PySpark Version :'+spark.version)
print('PySpark Version :'+spark.sparkContext.version)

In this simple article, you have learned to check a PySpark version from the command line, pyspark shell, and runtime, you can use these from Hadoop (CDH), Aws Glue, Anaconda, Jupyter notebook e.t.c

Happy Learning !!

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply

You are currently viewing How to Find PySpark Version?