NameError: Name 'Spark' is not Defined

| *** Please Subscribe for Ad Free & Premium Content ***

Post author:Naveen Nelamali
Post category:PySpark
Post last modified:March 27, 2024
Reading time:3 mins read

You are currently viewing NameError: Name ‘Spark’ is not Defined

Problem: When I am using spark.createDataFrame() I am getting NameError: Name 'Spark' is not Defined, if I use the same in Spark or PySpark shell it works without issue.

Solution: NameError: Name ‘Spark’ is not Defined in PySpark

Since Spark 2.0 'spark' is a SparkSession object that is by default created upfront and available in Spark shell, PySpark shell, and in Databricks however, if you are writing a Spark/PySpark program in .py file, you need to explicitly create SparkSession object by using builder to resolve NameError: Name 'Spark' is not Defined.

Similarly, 'sc' is a SparkContext object that is available by default in Spark/PySpark shell & Databricks.


# Import PySpark
import pyspark
from pyspark.sql import SparkSession

#Create SparkSession
spark = SparkSession.builder
                    .master("local[1]")
                    .appName("SparkByExamples.com")
                    .getOrCreate()

In case if you get ‘No module named pyspark‘ error, Follow steps mentioned in How to import PySpark in Python Script to resolve the error. In simple words try to use findspark.


#Install findspark
pip install findspark 

# Import findspark
import findspark
findspark.init()

#import pyspark
import pyspark
from pyspark.sql import SparkSession

Hope this resolves NameError: Name 'Spark' is not Defined and you able to execute PySpark program by using spark-submit or from editors.

Happy Learning !!

This Post Has One Comment

Sosthène December 25, 2021

Hello,
how can I resolve this error?

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ ‘_/
/__ / .__/\_,_/_/ /_/\_\ version 3.0.3
/_/

Using Python version 3.8.5 (default, Sep 3 2020 21:29:08)
SparkSession available as ‘spark’.
>>> 21/12/25 20:46:26 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped.
And if Import Spark in jupyter Notebook, i get this error below:

Exception: Java gateway process exited before sending its port number

Comments are closed.

Solution: NameError: Name ‘Spark’ is not Defined in PySpark

Related Articles

This Post Has One Comment