By using spark.executorEnv.[EnvironmentVariableName]
you can set a single environment variable or multiple variables to Spark/PySpark executors or workers.
Besides these there are other ways as well, let’s see all these with examples.
1. Spark Set Environment Variable to Executor
Use the spark-submit config spark.executorEnv.[EnvironmentVariableName]
to set or add an environment variable to executors or worker nodes. Spark document says the following about this config
Add the environment variable specified by EnvironmentVariableName to the Executor process. The user can specify multiple of these to set multiple environment variables.
spark.org
# For Spark with Scala and Python
spark-submit --conf spark.executorEnv.SOME_ENVIRONMENT=SOME_VALUE
If you are using the Yarn cluster, you can also try the following.
spark-submit --deploy-mode cluster
--conf spark.yarn.appMasterEnv.SOME_ENVIRONMENT=SOME_VALUE
Also, you can add them in conf/spark-defaults.conf file.
2. Add Environment Variable by Creating SparkSession
You can also add an environment variable to the executor in Spark or PySpark while creating the SparkSession. Below is an example of Spark with Scala.
# Imports
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.config("spark.executorEnv.SOME_ENVIRONMENT", "SOME_VALUE") \
.getOrCreate()
Regardless of what approaches you use, you can get the environment variable value in executors by using the following.
import os
some_environment_value = os.environ.get('SOME_ENVIRONMENT')
3. Using Spark Config
If you wanted to just set some value and use it across executors, you can use this.
# Imports
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder \
.config("SOME_ENVIRONMENT", "SOME_VALUE") \
.appName('SparkByExamples.com') \
.getOrCreate()
You can also do the same with spark-submit
spark-submit --conf SOME_ENVIRONMENT=SOME_VALUE
You can get the value of the config property
SOME_ENVIRONMENT = spark.conf.get("SOME_ENVIRONMENT")
print(SOME_ENVIRONMENT)
Conclusion
In this article, you have learned how to set an environment variable to executors or workers by using spark.executorEnv.[EnvironmentVariableName] and also learned other ways with examples.
Related Articles
- Spark spark.table() vs spark.read.table()
- Broadcast Join in Spark
- Spark Internal Execution plan
- Spark Types of Tables and Views
- Spark Drop, Delete, Truncate Differences
- Calculate Size of Spark DataFrame & RDD
- SOLVED Can’t assign requested address: Service ‘sparkDriver’
- Spark Set JVM Options to Driver & Executors