Problem: When I tried to use SparkContext
object 'sc'
in PySpark program I am getting Spark Context 'sc' Not Defined
, But the sc is working in Spark/PySpark shell.
Solution: Spark Context ‘sc’ Not Defined?
In Spark/PySpark 'sc'
is a SparkContext object that’s created upfront by default on spark-shell/pyspark shell, this object also available in Databricks however when you write PySpark program you need to create SparkSession
which internally create SparkContext
.
If you are getting Spark Context 'sc' Not Defined
in Spark/PySpark shell use below export
export PYSPARK_SUBMIT_ARGS="--master local[1] pyspark-shell"
vi ~/.bashrc
, add the above line and reload the bashrc file using source ~/.bashrc
and launch spark-shell/pyspark shell.
Below is a way to use get SparkContext object in PySpark program.
# Import PySpark
import pyspark
from pyspark.sql import SparkSession
#Create SparkSession
spark = SparkSession.builder
.master("local[1]")
.appName("SparkByExamples.com")
.getOrCreate()
sc=spark.sparkContext
In case if you get ‘No module named pyspark
‘ error, Follow steps mentioned in How to import PySpark in Python Script to resolve the error. In simple words just use findspark
.
#Install findspark
pip install findspark
# Import findspark
import findspark
findspark.init()
#import pyspark
import pyspark
from pyspark.sql import SparkSession
#Create SparkSession which creates SparkContext.
Alternatively, you can also get object of SparkContext by using getOrCreate()
.
Happy Learning !!