NameError: Name ‘Spark’ is not Defined

Problem: When I am using spark.createDataFrame() I am getting NameError: Name 'Spark' is not Defined, if I use the same in Spark or PySpark shell it works without issue.

Solution: NameError: Name ‘Spark’ is not Defined in PySpark

Since Spark 2.0 'spark' is a SparkSession object that is by default created upfront and available in Spark shell, PySpark shell, and in Databricks however, if you are writing a Spark/PySpark program in .py file, you need to explicitly create SparkSession object by using builder to resolve NameError: Name 'Spark' is not Defined.

Similarly, 'sc' is a SparkContext object that is available by default in Spark/PySpark shell & Databricks.

# Import PySpark
import pyspark
from pyspark.sql import SparkSession

#Create SparkSession
spark = SparkSession.builder

In case if you get No module named pyspark error, Follow steps mentioned in How to import PySpark in Python Script to resolve the error. In simple words try to use findspark.

#Install findspark
pip install findspark 

# Import findspark
import findspark

#import pyspark
import pyspark
from pyspark.sql import SparkSession

Hope this resolves NameError: Name 'Spark' is not Defined and you able to execute PySpark program by using spark-submit or from editors.

Happy Learning !!

NNK is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply