Problem: When I am using spark.createDataFrame()
I am getting NameError: Name 'Spark' is not Defined
, if I use the same in Spark or PySpark shell it works without issue.
Solution: NameError: Name ‘Spark’ is not Defined in PySpark
Since Spark 2.0 'spark'
is a SparkSession
object that is by default created upfront and available in Spark shell, PySpark shell, and in Databricks however, if you are writing a Spark/PySpark program in .py file, you need to explicitly create SparkSession object by using builder to resolve NameError: Name 'Spark' is not Defined
.
Similarly, 'sc'
is a SparkContext object that is available by default in Spark/PySpark shell & Databricks.
# Import PySpark
import pyspark
from pyspark.sql import SparkSession
#Create SparkSession
spark = SparkSession.builder
.master("local[1]")
.appName("SparkByExamples.com")
.getOrCreate()
In case if you get ‘No module named pyspark
‘ error, Follow steps mentioned in How to import PySpark in Python Script to resolve the error. In simple words try to use findspark.
#Install findspark
pip install findspark
# Import findspark
import findspark
findspark.init()
#import pyspark
import pyspark
from pyspark.sql import SparkSession
Hope this resolves NameError: Name 'Spark' is not Defined
and you able to execute PySpark program by using spark-submit or from editors.
Happy Learning !!
Hello,
how can I resolve this error?
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ ‘_/
/__ / .__/\_,_/_/ /_/\_\ version 3.0.3
/_/
Using Python version 3.8.5 (default, Sep 3 2020 21:29:08)
SparkSession available as ‘spark’.
>>> 21/12/25 20:46:26 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped.
And if Import Spark in jupyter Notebook, i get this error below:
Exception: Java gateway process exited before sending its port number