You are currently viewing Spark SqlContext explained with Examples

In Spark Version 1.0 SQLContext (org.apache.spark.sql.SQLContext ) is an entry point to SQL in order to work with structured data (rows and columns) however with 2.0 SQLContext has been replaced with SparkSession.

What is Spark SQLContext

Spark org.apache.spark.sql.SQLContext is a deprecated class that contains several useful functions to work with Spark SQL and it is an entry point o Spark SQL however, this has been deprecated since Spark 2.0 and recommends using SparkSession.

SQLContext in spark-shell

You can create an SQLContext in Spark shell by passing a default SparkContext object (sc) as a parameter to the SQLContext constructor.


scala> val sqlcontext = new org.apache.spark.sql.SQLContext(sc)

Creating SQLContext from Scala program

In Spark 1.0, you would need to pass a SparkContext object to a constructor in order to create SQL Context instance, In Scala, you do this as explained in the below example.


val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExamples.com")
    .getOrCreate();
val sqlContext = new org.apache.spark.sql.SQLContext(spark.sparkContext)

However, In 2.0 SQLContext() constructor has been deprecated and recommend to use sqlContext method from SparkSession for example spark.sqlContext


val sqlContext = spark.sqlContext

Create a DataFrame by reading a file


  val sqlContext:SQLContext = spark.sqlContext

  //read csv with options
  val df = sqlContext.read.options(Map("inferSchema"->"true","delimiter"->",","header"->"true"))
    .csv("src/main/resources/zipcodes.csv")
  df.show()
  df.printSchema()

Execute SQL query from SQLContext object


  df.createOrReplaceTempView("TAB")
  sqlContext.sql("select * from TAB")
    .show(false)

Note: Since 2.0, SQLContext is replaced by SparkSession and SparkSession contains all methods that are present in SQLContext.

Complete SQLContext Example


package com.sparkbyexamples.spark

import org.apache.spark.sql.{SQLContext, SparkSession}

object SQLContextExample extends App {

  val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExamples.com")
    .getOrCreate();

  spark.sparkContext.setLogLevel("ERROR")


  val sqlContext:SQLContext = spark.sqlContext

  //read csv with options
  val df = sqlContext.read.options(Map("inferSchema"->"true","delimiter"->",","header"->"true"))
    .csv("src/main/resources/zipcodes.csv")
  df.show()
  df.printSchema()

  df.createOrReplaceTempView("TAB")
  sqlContext.sql("select * from TAB")
    .show(false)

}

This example is also available at GitHub project for reference

Conclusion

In this article, you have learned how to create an SQLContext object from Spark shell and through programming using Scala example and also learned how to read a file and create a DataFrame. Finally learned SQLContext has been deprecated and to use SparkSession instead.

References

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium