In Spark or PySpark SparkSession object is created programmatically using SparkSession.builder()
and if you are using Spark shell SparkSession object “spark
” is created by default for you as an implicit object whereas SparkContext is retrieved from the Spark session object by using sparkSession.sparkContext
. In this article, you will learn how to create SparkSession & how to use SparkContext in detail with Scala & PySpark example.
Spark – Create SparkSession
Since Spark 2.0 SparkSession
is an entry point to underlying Spark functionality. All functionality available with SparkContext is also available in SparkSession. Also, it provides APIs to work on DataFrames and Datasets.
Spark Session also includes all the APIs available in different contexts –
- Spark Context,
- SQL Context,
- Streaming Context,
- Hive Context.
Below is an example to create SparkSession using Scala language.
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
master()
– If you are running it on the cluster you need to use your master name as an argument to master()
. usually, it would be either yarn
or mesos
depends on your cluster setup.
appName()
– Used to set your application name.
getOrCreate()
– This returns a SparkSession object if already exists, and creates a new one if not exist.
Note: That spark session object “spark” is by default available in Spark shell.
PySpark – create SparkSession
Below is a PySpark example to create SparkSession.
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.master('local[1]') \
.appName('SparkByExamples.com') \
.getOrCreate()
When running it on the cluster you need to use your master name as an argument to master()
. usually, it would be either yarn
or mesos
depends on your cluster setup.
Create SparkContext
A Spark “driver” is an application that creates a SparkContext
for executing one or more jobs in the Spark cluster. It allows your Spark/PySpark application to access Spark Cluster with the help of Resource Manager.
When you create a SparkSession
object, SparkContext
is also created and can be retrieved using spark.sparkContext
. SparkContext will be created only once for an application; even if you try to create another SparkContext, it still returns existing SparkContext.
Scala Example
import org.apache.spark.sql.SparkSession
object SparkSessionTest {
def main(args:Array[String]): Unit ={
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
println("First SparkContext:");
println("APP Name :"+spark.sparkContext.appName);
println("Master :"+spark.sparkContext.master);
val sparkSession2 = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample-test")
.getOrCreate();
println("Second SparkContext:")
println("APP Name :"+sparkSession2.sparkContext.appName);
println("Master :"+sparkSession2.sparkContext.master);
}
}
PySpark Example
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[1]") \
.appName('SparkByExamples.com') \
.getOrCreate()
print("First SparkContext:");
print("APP Name :"+spark.sparkContext.appName);
print("Master :"+spark.sparkContext.master);
sparkSession2 = SparkSession.builder \
.master("local[1]") \
.appName("SparkByExample-test") \
.getOrCreate();
print("Second SparkContext:")
print("APP Name :"+sparkSession2.sparkContext.appName);
print("Master :"+sparkSession2.sparkContext.master);
The above examples returns below same output for SPark with Scala and PySpark examples.
First SparkContext:
APP Name :SparkByExample
Master :local[1]
Second SparkContext:
APP Name :SparkByExample
Master :local[1]
Conclusion
In this Spark article, you have learned SparkSession can be created using builder() method and SparkContext is created by default while session object created and it can be accessed using spark.sparkSession (spark is a SparkSession object).
Resources: https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html
Happy Learning !!
when I try to run this, giving below error. Please tell me what to do
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
Hi Bhargavi, Are you using Maven to build your project? if so you need to add dependencies as specified here https://github.com/spark-examples/spark-hello-world-example/blob/master/pom.xml
when I try to run this, giving below error. Please tell me what to do
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
Hi Bhargavi, Are you using Maven to build your project? if so you need to add dependencies as specified here https://github.com/spark-examples/spark-hello-world-example/blob/master/pom.xml