By using spark.executorEnv.[EnvironmentVariableName]
you can set a single environment variable or multiple variables to Spark/PySpark executors or workers.
Besides these there are other ways as well, let’s see all these with examples.
1. Spark Set Environment Variable to Executor
Use the spark-submit config spark.executorEnv.[EnvironmentVariableName]
to set or add an environment variable to executors or worker nodes. Spark document says the following about this config
Add the environment variable specified by EnvironmentVariableName to the Executor process. The user can specify multiple of these to set multiple environment variables.
spark.org
# For Spark with Scala and Python
spark-submit --conf spark.executorEnv.ENV_KEY=ENV_VALUE
If you are using the Yarn cluster, you can also try the following.
# spark-submit
spark-submit --deploy-mode cluster
--conf spark.yarn.appMasterEnv.ENV_KEY=ENV_VALUE
Also, you can add them in conf/spark-defaults.conf
file.
2. Add Environment Variable by Creating SparkSession
You can also add an environment variable to the executor in Spark or PySpark while creating the SparkSession. Below is an example of Spark with Scala.
# Imports
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.config("spark.executorEnv.ENV_KEY", "ENV_VALUE") \
.getOrCreate()
Regardless of what approaches you use, you can get the environment variable value in executors by using the following.
# Access environment variables
import os
some_environment_value = os.environ.get('ENV_KEY')
3. Using -D
You can use -D to set a string of extra JVM options to the executor.
# Using -D spark-submit —— –conf spark.executor.extraJavaOptions=”-DENV_KEY=ENV_VALUE” \4. Using Spark Config
If you want to just set some value to a variable and use it across executors, you can use .config()
while creating SparkSession.
In Apache Spark, you can set environment variables for your Spark applications by using the SparkConf
object. Environment variables set through SparkConf
are specific to your Spark application and are used to configure various aspects of Spark’s behavior.
# Imports
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder \
.config("ENV_KEY", "ENV_VALUE") \
.appName('SparkByExamples.com') \
.getOrCreate()
You can also do the same with spark-submit
# Using spark-submit --conf
spark-submit --conf ENV_KEY=ENV_VALUE
You can get the value of the config property by using spark.conf.get()
.
# Get the property
SOME_ENVIRONMENT = spark.conf.get("ENV_KEY")
print(SOME_ENVIRONMENT)
Conclusion
In this article, you have learned how to set an environment variable to executors or workers by using spark.executorEnv.[EnvironmentVariableName] and also learned other ways with examples.
Related Articles
- Spark Internal Execution plan
- Calculate Size of Spark DataFrame & RDD
- SOLVED Can’t assign requested address: Service ‘sparkDriver’
- Spark Set JVM Options to Driver & Executors
- Difference Between Spark Driver vs Executor
- Difference Between Spark Worker vs Executor
- How to Set Apache Spark Executor Memory
- Usage of Spark Executor extrajavaoptions
- Tune Spark Executor Number, Cores, and Memory