You are currently viewing Spark – Create a SparkSession and SparkContext

In Spark or PySpark SparkSession object is created programmatically using SparkSession.builder() and if you are using Spark shell SparkSession object “spark” is created by default for you as an implicit object whereas SparkContext is retrieved from the Spark session object by using sparkSession.sparkContext. In this article, you will learn how to create SparkSession & how to use SparkContext in detail with Scala & PySpark example.

Spark – Create SparkSession

Since Spark 2.0 SparkSession is an entry point to underlying Spark functionality. All functionality available with SparkContext is also available in SparkSession. Also, it provides APIs to work on DataFrames and Datasets.

Spark Session also includes all the APIs available in different contexts –

  • Spark Context,
  • SQL Context,
  • Streaming Context,
  • Hive Context.

Below is an example to create SparkSession using Scala language.


import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample")
      .getOrCreate();

master() – If you are running it on the cluster you need to use your master name as an argument to master(). usually, it would be either yarn or mesos depends on your cluster setup.

appName() – Used to set your application name.

getOrCreate() – This returns a SparkSession object if already exists, and creates a new one if not exist.

Note: That spark session object “spark” is by default available in Spark shell.

1. PySpark – create SparkSession

Below is a PySpark example to create SparkSession.


# PySpark - create SparkSession
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder \
                    .master('local[1]') \
                    .appName('SparkByExamples.com') \
                    .getOrCreate()

When running it on the cluster you need to use your master name as an argument to master(). usually, it would be either yarn or mesos depends on your cluster setup.

1.1 Create SparkContext

A Spark “driver” is an application that creates a SparkContext for executing one or more jobs in the Spark cluster. It allows your Spark/PySpark application to access Spark Cluster with the help of Resource Manager.

When you create a SparkSession object, SparkContext is also created and can be retrieved using spark.sparkContext. SparkContext will be created only once for an application; even if you try to create another SparkContext, it still returns existing SparkContext.

2. Scala Example


// Scala Example
import org.apache.spark.sql.SparkSession

object SparkSessionTest {
  def main(args:Array[String]): Unit ={
    val spark = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample")
      .getOrCreate();

    println("First SparkContext:");
    println("APP Name :"+spark.sparkContext.appName);
    println("Master :"+spark.sparkContext.master);

    val sparkSession2 = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample-test")
      .getOrCreate();

    println("Second SparkContext:")
    println("APP Name :"+sparkSession2.sparkContext.appName);
    println("Master :"+sparkSession2.sparkContext.master);
  }
}

3. PySpark Example


# PySpark Example
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[1]") \
                    .appName('SparkByExamples.com') \
                    .getOrCreate()

print("First SparkContext:");
print("APP Name :"+spark.sparkContext.appName);
print("Master :"+spark.sparkContext.master);

sparkSession2 = SparkSession.builder \
      .master("local[1]") \
      .appName("SparkByExample-test") \
      .getOrCreate();

print("Second SparkContext:")
print("APP Name :"+sparkSession2.sparkContext.appName);
print("Master :"+sparkSession2.sparkContext.master);

The above examples returns below same output for SPark with Scala and PySpark examples.


First SparkContext:
APP Name :SparkByExample
Master :local[1]

Second SparkContext:
APP Name :SparkByExample
Master :local[1]

4. Conclusion

In this Spark article, you have learned SparkSession can be created using builder() method and SparkContext is created by default while session object created and it can be accessed using spark.sparkSession (spark is a SparkSession object).

Resources: https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has 3 Comments

  1. Issai

    when I try to run this, giving below error. Please tell me what to do

    Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$

  2. Issai

    when I try to run this, giving below error. Please tell me what to do

    Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$

Comments are closed.