• Post author:
  • Post category:PySpark
  • Post last modified:March 27, 2024
  • Reading time:3 mins read
You are currently viewing NameError: Name ‘Spark’ is not Defined

Problem: When I am using spark.createDataFrame() I am getting NameError: Name 'Spark' is not Defined, if I use the same in Spark or PySpark shell it works without issue.

Solution: NameError: Name ‘Spark’ is not Defined in PySpark

Since Spark 2.0 'spark' is a SparkSession object that is by default created upfront and available in Spark shell, PySpark shell, and in Databricks however, if you are writing a Spark/PySpark program in .py file, you need to explicitly create SparkSession object by using builder to resolve NameError: Name 'Spark' is not Defined.

Similarly, 'sc' is a SparkContext object that is available by default in Spark/PySpark shell & Databricks.

# Import PySpark
import pyspark
from pyspark.sql import SparkSession

#Create SparkSession
spark = SparkSession.builder

In case if you get No module named pyspark error, Follow steps mentioned in How to import PySpark in Python Script to resolve the error. In simple words try to use findspark.

#Install findspark
pip install findspark 

# Import findspark
import findspark

#import pyspark
import pyspark
from pyspark.sql import SparkSession

Hope this resolves NameError: Name 'Spark' is not Defined and you able to execute PySpark program by using spark-submit or from editors.

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has One Comment

  1. Sosthène

    how can I resolve this error?

    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Welcome to
    ____ __
    / __/__ ___ _____/ /__
    _\ \/ _ \/ _ `/ __/ ‘_/
    /__ / .__/\_,_/_/ /_/\_\ version 3.0.3

    Using Python version 3.8.5 (default, Sep 3 2020 21:29:08)
    SparkSession available as ‘spark’.
    >>> 21/12/25 20:46:26 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped.
    And if Import Spark in jupyter Notebook, i get this error below:

    Exception: Java gateway process exited before sending its port number

Comments are closed.