Spark Context ‘sc’ Not Defined?

Problem: When I tried to use SparkContext object 'sc' in PySpark program I am getting Spark Context 'sc' Not Defined, But the sc is working in Spark/PySpark shell.

Solution: Spark Context ‘sc’ Not Defined?

In Spark/PySpark 'sc' is a SparkContext object that’s created upfront by default on spark-shell/pyspark shell, this object also available in Databricks however when you write PySpark program you need to create SparkSession which internally create SparkContext.

If you are getting Spark Context 'sc' Not Defined in Spark/PySpark shell use below export


export PYSPARK_SUBMIT_ARGS="--master local[1] pyspark-shell"

vi ~/.bashrc , add the above line and reload the bashrc file using source ~/.bashrc and launch spark-shell/pyspark shell.

Below is a way to use get SparkContext object in PySpark program.


# Import PySpark
import pyspark
from pyspark.sql import SparkSession

#Create SparkSession
spark = SparkSession.builder
                    .master("local[1]")
                    .appName("SparkByExamples.com")
                    .getOrCreate()
sc=spark.sparkContext

In case if you get No module named pyspark error, Follow steps mentioned in How to import PySpark in Python Script to resolve the error. In simple words just use findspark.


#Install findspark
pip install findspark 

# Import findspark
import findspark
findspark.init()

#import pyspark
import pyspark
from pyspark.sql import SparkSession

#Create SparkSession which creates SparkContext.

Alternatively, you can also get object of SparkContext by using getOrCreate().

from pyspark import SparkContext sc = SparkContext.getOrCreate()

Happy Learning !!

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply