You are currently viewing Spark Get the Current SparkContext Settings

In Spark/PySpark you can get the current active SparkContext and its configuration settings by accessing spark.sparkContext.getConf.getAll(), here spark is an object of SparkSession and getAll() returns Array[(String, String)], let’s see with examples using Spark with Scala & PySpark (Spark with Python).

Spark Get SparkContext Configurations

In the below Spark example, I have added additional configuration to Spark using SparkConf and retrieve all default config values from SparkContext along with the one I added.


val config = new SparkConf()
config.set("spark.sql.shuffle.partitions","300")
val spark=SparkSession.builder().config(config).master("local[3]")
    .appName("SparkByExamples.com")
    .getOrCreate();
val arrayConfig=spark.sparkContext.getConf.getAll
for (conf <- arrayConfig)
    println(conf._1 +", "+ conf._2)

Yields below output.


spark.app.name, SparkByExamples.com
spark.app.id, local-1618196887324
spark.driver.host, DELL-ESUHAO2KAJ
spark.master, local[3]
spark.executor.id, driver
spark.driver.port, 52984

Use get() method of SparkConf to get the value for a specific configuration.


print("spark.sql.shuffle.partitions ==> "+spark.sparkContext.getConf.get("spark.sql.shuffle.partitions"))
// Display below value
// spark.sql.shuffle.partitions ==> 300

PySpark Get SparkContext Configurations

similarly let’s see how to get the current PySpark SparkContext setting configurations.


from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()

configurations = spark.sparkContext.getConf().getAll()
for item in configurations: print(item)

This prints the below configuration. Alternatively, you can also get the PySpark configurations using spark.sparkContext._conf.getAll()


('spark.app.name', 'SparkByExamples.com')
('spark.rdd.compress', 'True')
('spark.driver.host', 'DELL-ESUHAO2KAJ')
('spark.serializer.objectStreamReset', '100')
('spark.submit.pyFiles', '')
('spark.executor.id', 'driver')
('spark.submit.deployMode', 'client')
('spark.app.id', 'local-1617974806929')
('spark.ui.showConsoleProgress', 'true')
('spark.master', 'local[1]')
('spark.driver.port', '65211')

If you wanted to get a specific configuration.


print(spark.sparkContext.getConf().get("spark.driver.host"))

Conclusion

By using getAll() method of SparkConf you can get all current active Spark/PySpark SparkContext settings, you can also use get() method to get value for specific settings.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium