You are currently viewing SparkSession vs SQLContext

In Spark, SparkSession is an entry point to the Spark application and SQLContext is used to process structured data that contains rows and columns Here, I will mainly focus on explaining the difference between SparkSession and SQLContext by defining and describing how to create these two.instances and using it from spark-shell.

What is SparkSession

SparkSession is an entry point to underlying Spark functionality in order to programmatically create Spark RDD, DataFrame and DataSet. It’s object “spark” is default available in spark-shell and it can be created programmatically using SparkSession builder pattern.

What is SQLContext

Spark SQLContext is defined in org.apache.spark.sql package since 1.0 and is deprecated in 2.0 and replaced with SparkSession. SQLContext contains several useful functions of Spark SQL to work with structured data (columns & rows) and it is an entry point to Spark SQL.

SparkSession

With Spark 2.0 a new class org.apache.spark.sql.SparkSession has been introduced which is a combined class for all different contexts we used to have prior to 2.0 release hence SparkSession will be used in replace with SQLContext, HiveContext.

SparkSession in spark-shell

Be default Spark shell provides “spark” object which is an instance of SparkSession class. We can directly use this object where required


scala> val sqlcontext = spark.sqlContext

Creating SparkSession from Scala program


    val spark = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExamples.com")
      .getOrCreate();

SQLContext

Spark org.apache.spark.sql.SQLContext is a deprecated class that contains several useful functions to work with Spark SQL and it is an entry point of Spark SQL however, as mentioned this has been deprecated since Spark 2.0 and recommends using SparkSession.

SQLContext in spark-shell

You can create an SQLContext in Spark shell by passing a default SparkContext object (sc) as a parameter to the SQLContext constructor.


scala> val sqlcontext = new org.apache.spark.sql.SQLContext(sc)

Creating SQLContext from Scala program

Before Spark 2.0, you would need to pass a SparkContext object to a constructor in order to create SQL Context instance, In Scala, you do this as explained in the below example.


  val conf = new SparkConf().setAppName("sparkbyexamples.com").setMaster("local[1]")
  val sparkContext = new SparkContext(conf)
  val sqlContext = new org.apache.spark.sql.SQLContext(sparkContext)

However, In 2.0 SQLContext() constructor has been deprecated and recommend to use sqlContext method from SparkSession, as you have learned above sections you can do this by


val sqlContext = spark.sqlContext

Conclusion

In this Spark article, you have learned SparkSession vs SQLContext, their differences, how to create each from Spark Shell and Scala program.

Reference

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium