Problem: When I am using
spark.createDataFrame() I am getting
NameError: Name 'Spark' is not Defined, if I use the same in Spark or PySpark shell it works without issue.
Solution: NameError: Name ‘Spark’ is not Defined in PySpark
Since Spark 2.0
'spark' is a
SparkSession object that is by default created upfront and available in Spark shell, PySpark shell, and in Databricks however, if you are writing a Spark/PySpark program in .py file, you need to explicitly create SparkSession object by using builder to resolve
NameError: Name 'Spark' is not Defined.
'sc' is a SparkContext object that is available by default in Spark/PySpark shell & Databricks.
# Import PySpark import pyspark from pyspark.sql import SparkSession #Create SparkSession spark = SparkSession.builder .master("local") .appName("SparkByExamples.com") .getOrCreate()
In case if you get ‘
No module named pyspark‘ error, Follow steps mentioned in How to import PySpark in Python Script to resolve the error. In simple words try to use findspark.
#Install findspark pip install findspark # Import findspark import findspark findspark.init() #import pyspark import pyspark from pyspark.sql import SparkSession
Hope this resolves
NameError: Name 'Spark' is not Defined and you able to execute PySpark program by using spark-submit or from editors.
Happy Learning !!