I will quickly cover different ways to find the PySpark (Spark with python) installed version through the command line and runtime. You can use these options to check the PySpark version in Hadoop (CDH), Aws Glue, Anaconda, Jupyter notebook e.t.c on Mac, Linux, Windows, CentOS.
1. Find PySpark Version from Command Line
Like any other tools or language, you can use –version option with
spark-sql commands to find the PySpark version.
pyspark --version spark-submit --version spark-shell --version spark-sql --version
As you see it displays the spark version along with Scala version 2.12.10 and Java version. For Java, I am using OpenJDK hence it shows the version as
OpenJDK 64-Bit Server VM, 11.0-13.
2. Check Version From Shell
Additionally, you are in
pyspark-shell and you wanted to check the PySpark version without exiting pyspark-shell, you can achieve this by using the
sc is a SparkContect variable that default exists in
pyspark-shell. Use the below steps to find the spark version.
- cd to
spark.version returns a version as a string type.
3. Find PySpark Version from Runtime
Imagine you are writing a PySpark application and you wanted to find the PySpark version during runtime, you can get it by accessing the
sparkContext.version properties from the SparkSession object.
# Import PySpark import pyspark from pyspark.sql import SparkSession # Create SparkSession spark = SparkSession.builder.master("local") \ .appName('SparkByExamples.com') \ .getOrCreate() print('PySpark Version :'+spark.version) print('PySpark Version :'+spark.sparkContext.version)
In this simple article, you have learned to check a PySpark version from the command line, pyspark shell, and runtime, you can use these from Hadoop (CDH), Aws Glue, Anaconda, Jupyter notebook e.t.c
Happy Learning !!