• Post author:
  • Post category:PySpark
  • Post last modified:March 27, 2024
  • Reading time:3 mins read
You are currently viewing Spark Context ‘sc’ Not Defined?

Problem: When I tried to use SparkContext object 'sc' in PySpark program I am getting Spark Context 'sc' Not Defined, But the sc is working in Spark/PySpark shell.

Solution: Spark Context ‘sc’ Not Defined?

In Spark/PySpark 'sc' is a SparkContext object that’s created upfront by default on spark-shell/pyspark shell, this object also available in Databricks however when you write PySpark program you need to create SparkSession which internally create SparkContext.

If you are getting Spark Context 'sc' Not Defined in Spark/PySpark shell use below export


export PYSPARK_SUBMIT_ARGS="--master local[1] pyspark-shell"

vi ~/.bashrc , add the above line and reload the bashrc file using source ~/.bashrc and launch spark-shell/pyspark shell.

Below is a way to use get SparkContext object in PySpark program.


# Import PySpark
import pyspark
from pyspark.sql import SparkSession

#Create SparkSession
spark = SparkSession.builder
                    .master("local[1]")
                    .appName("SparkByExamples.com")
                    .getOrCreate()
sc=spark.sparkContext

In case if you get No module named pyspark error, Follow steps mentioned in How to import PySpark in Python Script to resolve the error. In simple words just use findspark.


#Install findspark
pip install findspark 

# Import findspark
import findspark
findspark.init()

#import pyspark
import pyspark
from pyspark.sql import SparkSession

#Create SparkSession which creates SparkContext.

Alternatively, you can also get object of SparkContext by using getOrCreate().

from pyspark import SparkContext sc = SparkContext.getOrCreate()

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium