Spark Set Environment Variable to Executors

  • Post author:
  • Post category:PySpark
  • Post last modified:December 12, 2022

By using spark.executorEnv.[EnvironmentVariableName] you can set a single environment variable or multiple variables to Spark/PySpark executors or workers.

Besides these there are other ways as well, let’s see all these with examples.

1. Spark Set Environment Variable to Executor

Use the spark-submit config spark.executorEnv.[EnvironmentVariableName] to set or add an environment variable to executors or worker nodes. Spark document says the following about this config

Add the environment variable specified by EnvironmentVariableName to the Executor process. The user can specify multiple of these to set multiple environment variables.

spark.org

# For Spark with Scala and Python
spark-submit --conf spark.executorEnv.SOME_ENVIRONMENT=SOME_VALUE

If you are using the Yarn cluster, you can also try the following.


spark-submit --deploy-mode cluster 
      --conf spark.yarn.appMasterEnv.SOME_ENVIRONMENT=SOME_VALUE

Also, you can add them in conf/spark-defaults.conf file.

2. Add Environment Variable by Creating SparkSession

You can also add an environment variable to the executor in Spark or PySpark while creating the SparkSession. Below is an example of Spark with Scala.


# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
           .appName('SparkByExamples.com') \
           .config("spark.executorEnv.SOME_ENVIRONMENT", "SOME_VALUE") \
           .getOrCreate()

Regardless of what approaches you use, you can get the environment variable value in executors by using the following.


import os
some_environment_value = os.environ.get('SOME_ENVIRONMENT')

3. Using Spark Config

If you wanted to just set some value and use it across executors, you can use this.


# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
           .config("SOME_ENVIRONMENT", "SOME_VALUE") \
           .appName('SparkByExamples.com') \
           .getOrCreate()

You can also do the same with spark-submit


spark-submit --conf SOME_ENVIRONMENT=SOME_VALUE

You can get the value of the config property


SOME_ENVIRONMENT = spark.conf.get("SOME_ENVIRONMENT")
print(SOME_ENVIRONMENT)

Conclusion

In this article, you have learned how to set an environment variable to executors or workers by using spark.executorEnv.[EnvironmentVariableName] and also learned other ways with examples.

Related Articles

Leave a Reply