SparkContext has been available since Spark 1.x (JavaSparkContext for Java) and it used to be an entry point to Spark and PySpark before introducing SparkSession in 2.0. Creating SparkContext is the first step to using RDD and connecting to Spark Cluster, In this article, you will learn how to create it using examples.
Since Spark 1.x, SparkContext is an entry point to Spark and is defined in org.apache.spark
package. It is used to programmatically create Spark RDD, accumulators, and broadcast variables on the cluster. Its object sc
is default variable available in spark-shell and it can be programmatically created using SparkContext
class.
Note that you can create only one active SparkContext per JVM. You should stop() the active SparkContext before creating a new one.

The Spark driver program creates and uses SparkContext to connect to the cluster manager to submit Spark jobs, and know what resource manager (YARN, Mesos or Standalone) to communicate to. It is the heart of the Spark application.
For more internal details on spark context refer to What does SparkContext do?
Related: How to get current SparkContext & its configurations in Spark
1. SparkContext in spark-shell
By default, Spark shell provides sc object which is an instance of the SparkContext class. We can directly use this object where required.
// 'sc' is a SparkContext variable in spark-shell
scala>>sc.appName
Yields below output.

Similar to the Spark shell, In most of the tools, notebooks, and Azure Databricks, the environment itself creates a default SparkContext object for us to use so you don’t have to worry about creating a spark context.
2. Spark 2.X – Create SparkContext using Scala Program
Since Spark 2.0, we mostly use SparkSession as most of the methods available in SparkContext are also present in SparkSession. Spark session internally creates the Spark Context and exposes the sparkContext
variable to use.
At any given time only one SparkContext
instance should be active per JVM. In case you want to create another you should stop existing SparkContext (using stop()
) before creating a new one.
// Imports
import org.apache.spark.sql.SparkSession
object SparkSessionTest extends App {
// Create SparkSession object
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExamples.com")
.getOrCreate();
// Access spark context
println(spark.sparkContext)
println("Spark App Name : "+spark.sparkContext.appName)
}
// Output:
//org.apache.spark.SparkContext@2fdf17dc
//Spark App Name : SparkByExamples.com
As I explained in the SparkSession article, you can create any number of SparkSession objects however, for all those objects underlying there will be only one SparkContext.
3. Create RDD
Once you create a Spark Context object, use the below to create Spark RDD.
// Create RDD
val rdd = spark.sparkContext.range(1, 5)
rdd.collect().foreach(print)
// Create RDD from Text file
val rdd2 = spark.sparkContext.textFile("src/main/resources/text/alice.txt")
4. Stop SparkContext
You can stop the SparkContext by calling the stop()
method. As explained above you can have only one SparkContext per JVM. If you want to create another, you need to shut down it first by using stop() method and create a new SparkContext.
// SparkContext stop() method
spark.sparkContext.stop()
When Spark executes this statement, it logs the message INFO SparkContext: Successfully stopped SparkContext to the console or to a log file.
5. Spark 1.X – Creating SparkContext using Scala Program
In Spark 1.x, first, you need to create a SparkConf
instance by assigning the app name and setting the master by using the SparkConf static methods setAppName()
and setMaster()
respectively and then pass the SparkConf object as an argument to the SparkContext constructor to create Spark Context.
// Create SpakContext
import org.apache.spark.{SparkConf, SparkContext}
// Create SparkConf object
val sparkConf = new SparkConf().setAppName("sparkbyexamples.com").setMaster("local[1]")
// Create Spark context (deprecated)
val sparkContext = new SparkContext(sparkConf)
SparkContext constructor has been deprecated in 2.0 hence, the recommendation is to use a static method getOrCreate()
that internally creates SparkContext. This function instantiates a SparkContext and registers it as a singleton object.
// Create Spark Context
val sc = SparkContext.getOrCreate(sparkConf)
6. SparkContext Commonly Used Methods
The following are the most commonly used methods of SparkContext. For the complete list, refer to Spark documentation.
longAccumulator()
– It creates an accumulator variable of a long data type. Only a driver can access accumulator variables.
doubleAccumulator()
– It creates an accumulator variable of a double data type. Only a driver can access accumulator variables.
applicationId
– Returns a unique ID of a Spark application.
appName
– Return an app name that was given when creating SparkContext
broadcast
– read-only variable broadcast to the entire cluster. You can broadcast a variable to a Spark cluster only once.
emptyRDD
– Creates an empty RDD
getPersistentRDDs
– Returns all persisted RDDs
getOrCreate()
– Creates or returns a SparkContext
hadoopFile
– Returns an RDD of a Hadoop file
master()
– Returns master that set while creating SparkContext
newAPIHadoopFile
– Creates an RDD for a Hadoop file with a new API InputFormat.
sequenceFile
– Get an RDD for a Hadoop SequenceFile with given key and value types.
setLogLevel
– Change log level to debug, info, warn, fatal, and error
textFile
– Reads a text file from HDFS, local or any Hadoop supported file systems, and returns an RDD
union
– Union two RDDs
wholeTextFiles
– Reads a text file in the folder from HDFS, local or any Hadoop supported file systems and returns an RDD of Tuple2. The first element of the tuple consists file name and the second element consists context of the text file.
7. SparkContext Example
// Complete example of SparkContext
import org.apache.spark.{SparkConf, SparkContext}
object SparkContextExample extends App{
val conf = new SparkConf().setAppName("sparkbyexamples.com").setMaster("local[1]")
val sparkContext = new SparkContext(conf)
val rdd = sparkContext.textFile("src/main/resources/text/alice.txt")
sparkContext.setLogLevel("ERROR")
println("First SparkContext:")
println("APP Name :"+sparkContext.appName)
println("Deploy Mode :"+sparkContext.deployMode)
println("Master :"+sparkContext.master)
// sparkContext.stop()
val conf2 = new SparkConf().setAppName("sparkbyexamples.com-2").setMaster("local[1]")
val sparkContext2 = new SparkContext(conf2)
println("Second SparkContext:")
println("APP Name :"+sparkContext2.appName)
println("Deploy Mode :"+sparkContext2.deployMode)
println("Master :"+sparkContext2.master)
}
FAQ’s on SparkContext
SparkContext is entry point to spark application since spark 1.x. The SparkContext
is the central entry point and controller for Spark applications. It manages resources, coordinates tasks, and provides the necessary infrastructure for distributed data processing in Spark. It plays a vital role in ensuring the efficient and fault-tolerant execution of Spark jobs.
SparkContext is created using SparkContext
class. By default, A spark “driver” is an application that creates the SparkContext in order to execute the job or jobs of a cluster. You can access the spark context from spark spark session object spark.sparkContext
. If you wanted to create spark context by yourself, use the below snippet.
// Create SpakContext
import org.apache.spark.{SparkConf, SparkContext}
val sparkConf = new SparkConf().setAppName(“sparkbyexamples.com”).setMaster(“local[1]”)
val sparkContext = new SparkContext(sparkConf)
Once you have finished using Spark, you can stop the SparkContext
using the stop()
method. This will release all resources associated with the SparkContext
and shut down the Spark application gracefully.
There can only be one active SparkContext per JVM. Having multiple SparkContext
instances in a single application can cause issues like resource conflicts, configuration conflicts, and unexpected behavior.
By default, A spark “driver” is an application that creates the SparkContext in order to execute the job or jobs of a cluster. You can access the spark context from spark spark session object spark.sparkContext
.
8. Conclusion
In this Spark Context article, you have learned what is SparkContext, how to create in Spark 1.x and Spark 2.0, and using with few basic examples. In summary,
- SparkContext is the entry point to any Spark functionality. It represents the connection to a Spark cluster and is responsible for coordinating and distributing the operations on that cluster.
- It was the primary entry point for Spark applications before Spark 2.0.
- SparkContext is used for low-level RDD (Resilient Distributed Dataset) operations, which were the core data abstraction in Spark before DataFrames and Datasets were introduced.
- It is not thread-safe, so in a multi-threaded or multi-user environment, you need to be careful when using a single SparkContext instance.
Happy Learning !!