Spark – Create a SparkSession and SparkContext

In Spark or PySpark SparkSession object is created programmatically using SparkSession.builder() and if you are using Spark shell SparkSession object “spark” is created by default for you as an implicit object whereas SparkContext is retrieved from the Spark session object by using sparkSession.sparkContext. In this article, you will learn how to create SparkSession & how to use SparkContext in detail with Scala & PySpark example.

Spark – Create SparkSession

Since Spark 2.0 SparkSession is an entry point to underlying Spark functionality. All functionality available with SparkContext is also available in SparkSession. Also, it provides APIs to work on DataFrames and Datasets.

Spark Session also includes all the APIs available in different contexts –

  • Spark Context,
  • SQL Context,
  • Streaming Context,
  • Hive Context.

Below is an example to create SparkSession using Scala language.


import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample")
      .getOrCreate();

master() – If you are running it on the cluster you need to use your master name as an argument to master(). usually, it would be either yarn or mesos depends on your cluster setup.

appName() – Used to set your application name.

getOrCreate() – This returns a SparkSession object if already exists, creates new one if not exists.

Note: That spark session object “spark” is by default available in Spark shell.

PySpark – create SparkSession

Below is a PySpark example to create SparkSession.


import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder \
                    .master('local[1]') \
                    .appName('SparkByExamples.com') \
                    .getOrCreate()

When running it on the cluster you need to use your master name as an argument to master(). usually, it would be either yarn or mesos depends on your cluster setup.

Create SparkContext

A Spark “driver” is an application that creates a SparkContext for executing one or more jobs in the Spark cluster. It allows your Spark/PySpark application to access Spark Cluster with the help of Resource Manager.

When you create a SparkSession object, SparkContext is also created and can be retrieved using spark.sparkContext. SparkContext will be created only once for an application; even if you try to create another SparkContext, it still returns existing SparkContext.

Scala Example


import org.apache.spark.sql.SparkSession

object SparkSessionTest {
  def main(args:Array[String]): Unit ={
    val spark = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample")
      .getOrCreate();

    println("First SparkContext:");
    println("APP Name :"+spark.sparkContext.appName);
    println("Master :"+spark.sparkContext.master);

    val sparkSession2 = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample-test")
      .getOrCreate();

    println("Second SparkContext:")
    println("APP Name :"+sparkSession2.sparkContext.appName);
    println("Master :"+sparkSession2.sparkContext.master);
  }
}

PySpark Example


import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[1]") \
                    .appName('SparkByExamples.com') \
                    .getOrCreate()

print("First SparkContext:");
print("APP Name :"+spark.sparkContext.appName);
print("Master :"+spark.sparkContext.master);

sparkSession2 = SparkSession.builder \
      .master("local[1]") \
      .appName("SparkByExample-test") \
      .getOrCreate();

print("Second SparkContext:")
print("APP Name :"+sparkSession2.sparkContext.appName);
print("Master :"+sparkSession2.sparkContext.master);

Above examples returns below same output for SPark with Scala and PySpark examples.


First SparkContext:
APP Name :SparkByExample
Master :local[1]

Second SparkContext:
APP Name :SparkByExample
Master :local[1]

Conclusion

In this Spark article, you have learned SparkSession can be created using builder() method and SparkContext is created by default while session object created and it can be accessed using spark.sparkSession (spark is a SparkSession object).

Resources: https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html

Happy Learning !!

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

This Post Has 2 Comments

  1. Issai

    when I try to run this, giving below error. Please tell me what to do

    Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$

Spark – Create a SparkSession and SparkContext