In Spark/PySpark you can get the current active SparkContext and its configuration settings by accessing spark.sparkContext.getConf.getAll()
, here spark is an object of SparkSession
and getAll()
returns Array[(String, String)]
, let’s see with examples using Spark with Scala & PySpark (Spark with Python).
Spark Get SparkContext Configurations
In the below Spark example, I have added additional configuration to Spark using SparkConf
and retrieve all default config values from SparkContext along with the one I added.
val config = new SparkConf()
config.set("spark.sql.shuffle.partitions","300")
val spark=SparkSession.builder().config(config).master("local[3]")
.appName("SparkByExamples.com")
.getOrCreate();
val arrayConfig=spark.sparkContext.getConf.getAll
for (conf <- arrayConfig)
println(conf._1 +", "+ conf._2)
Yields below output.
spark.app.name, SparkByExamples.com
spark.app.id, local-1618196887324
spark.driver.host, DELL-ESUHAO2KAJ
spark.master, local[3]
spark.executor.id, driver
spark.driver.port, 52984
Use get()
method of SparkConf
to get the value for a specific configuration.
print("spark.sql.shuffle.partitions ==> "+spark.sparkContext.getConf.get("spark.sql.shuffle.partitions"))
// Display below value
// spark.sql.shuffle.partitions ==> 300
PySpark Get SparkContext Configurations
similarly let’s see how to get the current PySpark SparkContext setting configurations.
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
configurations = spark.sparkContext.getConf().getAll()
for item in configurations: print(item)
This prints the below configuration. Alternatively, you can also get the PySpark configurations using spark.sparkContext._conf.getAll()
('spark.app.name', 'SparkByExamples.com')
('spark.rdd.compress', 'True')
('spark.driver.host', 'DELL-ESUHAO2KAJ')
('spark.serializer.objectStreamReset', '100')
('spark.submit.pyFiles', '')
('spark.executor.id', 'driver')
('spark.submit.deployMode', 'client')
('spark.app.id', 'local-1617974806929')
('spark.ui.showConsoleProgress', 'true')
('spark.master', 'local[1]')
('spark.driver.port', '65211')
If you wanted to get a specific configuration.
print(spark.sparkContext.getConf().get("spark.driver.host"))
Conclusion
By using getAll()
method of SparkConf
you can get all current active Spark/PySpark SparkContext
settings, you can also use get()
method to get value for specific settings.
Happy Learning !!
Related Articles
- Spark Get Current Number of Partitions of DataFrame
- Spark – How to get current date & timestamp
- What is Spark Streaming Checkpoint?
- Spark createOrReplaceTempView() Explained
- Spark – Get Size/Length of Array & Map Column
- Find Maximum Row per Group in Spark DataFrame