Spark Context ‘sc’ Not Defined?

Problem: When I tried to use SparkContext object 'sc' in PySpark program I am getting Spark Context 'sc' Not Defined, But the sc is working in Spark/PySpark shell.

Solution: Spark Context ‘sc’ Not Defined?

In Spark/PySpark 'sc' is a SparkContext object that’s created upfront by default on spark-shell/pyspark shell, this object also available in Databricks however when you write PySpark program you need to create SparkSession which internally create SparkContext.

If you are getting Spark Context 'sc' Not Defined in Spark/PySpark shell use below export

export PYSPARK_SUBMIT_ARGS="--master local[1] pyspark-shell"

vi ~/.bashrc , add the above line and reload the bashrc file using source ~/.bashrc and launch spark-shell/pyspark shell.

Below is a way to use get SparkContext object in PySpark program.

# Import PySpark
import pyspark
from pyspark.sql import SparkSession

#Create SparkSession
spark = SparkSession.builder

In case if you get No module named pyspark error, Follow steps mentioned in How to import PySpark in Python Script to resolve the error. In simple words just use findspark.

#Install findspark
pip install findspark 

# Import findspark
import findspark

#import pyspark
import pyspark
from pyspark.sql import SparkSession

#Create SparkSession which creates SparkContext.

Alternatively, you can also get object of SparkContext by using getOrCreate().

from pyspark import SparkContext sc = SparkContext.getOrCreate()

Happy Learning !!

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing Spark Context ‘sc’ Not Defined?