In Spark, SparkSession is an entry point to the Spark application and SQLContext is used to process structured data that contains rows and columns Here, I will mainly focus on explaining the difference between SparkSession and SQLContext by defining and describing how to create these two.instances and using it from spark-shell.
SparkSession is an entry point to underlying Spark functionality in order to programmatically create Spark RDD, DataFrame and DataSet. It’s object “spark” is default available in spark-shell and it can be created programmatically using SparkSession builder pattern.
Spark SQLContext is defined in org.apache.spark.sql package since 1.0 and is deprecated in 2.0 and replaced with SparkSession. SQLContext contains several useful functions of Spark SQL to work with structured data (columns & rows) and it is an entry point to Spark SQL.
SparkSession
With Spark 2.0 a new class org.apache.spark.sql.SparkSession
has been introduced which is a combined class for all different contexts we used to have prior to 2.0 release hence SparkSession will be used in replace with SQLContext, HiveContext.
SparkSession in spark-shell
Be default Spark shell provides “spark” object which is an instance of SparkSession class. We can directly use this object where required
scala> val sqlcontext = spark.sqlContext
Creating SparkSession from Scala program
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExamples.com")
.getOrCreate();
SQLContext
Spark org.apache.spark.sql.SQLContext
is a deprecated class that contains several useful functions to work with Spark SQL and it is an entry point of Spark SQL however, as mentioned this has been deprecated since Spark 2.0 and recommends using SparkSession.
SQLContext in spark-shell
You can create an SQLContext in Spark shell by passing a default SparkContext object (sc) as a parameter to the SQLContext constructor.
scala> val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
Creating SQLContext from Scala program
Before Spark 2.0, you would need to pass a SparkContext object to a constructor in order to create SQL Context instance, In Scala, you do this as explained in the below example.
val conf = new SparkConf().setAppName("sparkbyexamples.com").setMaster("local[1]")
val sparkContext = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sparkContext)
However, In 2.0 SQLContext() constructor has been deprecated and recommend to use sqlContext
method from SparkSession, as you have learned above sections you can do this by
val sqlContext = spark.sqlContext
Conclusion
In this Spark article, you have learned SparkSession vs SQLContext, their differences, how to create each from Spark Shell and Scala program.
Related Articles
- Spark SqlContext explained with Examples
- Spark – What is SparkSession Explained
- SparkSession vs SparkContext
- Spark – Create a SparkSession and SparkContext
- Spark DataFrame Cache and Persist Explained
- Find Maximum Row per Group in Spark DataFrame
- Difference in DENSE_RANK and ROW_NUMBER in Spark
- How to Check Spark Version
Reference
Happy Learning !!