• Post author:
  • Post category:PySpark
  • Post last modified:March 27, 2024
  • Reading time:6 mins read
You are currently viewing Spark Set Environment Variable to Executors

By using spark.executorEnv.[EnvironmentVariableName] you can set a single environment variable or multiple variables to Spark/PySpark executors or workers.

Besides these there are other ways as well, let’s see all these with examples.

1. Spark Set Environment Variable to Executor

Use the spark-submit config spark.executorEnv.[EnvironmentVariableName] to set or add an environment variable to executors or worker nodes. Spark document says the following about this config

Add the environment variable specified by EnvironmentVariableName to the Executor process. The user can specify multiple of these to set multiple environment variables.

spark.org

# For Spark with Scala and Python
spark-submit --conf spark.executorEnv.ENV_KEY=ENV_VALUE

If you are using the Yarn cluster, you can also try the following.


# spark-submit
spark-submit --deploy-mode cluster 
      --conf spark.yarn.appMasterEnv.ENV_KEY=ENV_VALUE

Also, you can add them in conf/spark-defaults.conf file.

2. Add Environment Variable by Creating SparkSession

You can also add an environment variable to the executor in Spark or PySpark while creating the SparkSession. Below is an example of Spark with Scala.


# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
           .appName('SparkByExamples.com') \
           .config("spark.executorEnv.ENV_KEY", "ENV_VALUE") \
           .getOrCreate()

Regardless of what approaches you use, you can get the environment variable value in executors by using the following.


# Access environment variables
import os
some_environment_value = os.environ.get('ENV_KEY')

3. Using -D

You can use -D to set a string of extra JVM options to the executor.

# Using -D spark-submit —— –conf spark.executor.extraJavaOptions=”-DENV_KEY=ENV_VALUE” \

4. Using Spark Config

If you want to just set some value to a variable and use it across executors, you can use .config() while creating SparkSession.

In Apache Spark, you can set environment variables for your Spark applications by using the SparkConf object. Environment variables set through SparkConf are specific to your Spark application and are used to configure various aspects of Spark’s behavior.


# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
           .config("ENV_KEY", "ENV_VALUE") \
           .appName('SparkByExamples.com') \
           .getOrCreate()

You can also do the same with spark-submit


# Using spark-submit --conf
spark-submit --conf ENV_KEY=ENV_VALUE

You can get the value of the config property by using spark.conf.get().


# Get the property
SOME_ENVIRONMENT = spark.conf.get("ENV_KEY")
print(SOME_ENVIRONMENT)

Conclusion

In this article, you have learned how to set an environment variable to executors or workers by using spark.executorEnv.[EnvironmentVariableName] and also learned other ways with examples.

Related Articles

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium