You are currently viewing How to Set Apache Spark Executor Memory

How to Set Apache Spark/PySpark Executor Memory? Spark or PySpark executor is a worker node that runs tasks on a cluster. Each executor has its own memory that is allocated by the Spark driver. This memory is used to store cached data, intermediate results, and task output.

In this article, we shall discuss the role of Spark Executor Memory and how to set Spark/PySpark executor memory in multiple ways.

1. Spark Executor Memory

The amount of memory allocated to an executor is determined by the spark.executor.memory configuration parameter, which specifies the amount of memory to allocate per executor. This parameter is set in the Spark configuration file or through the SparkConf object in the application code.

The value of spark.executor.memory can be set in several ways, such as:

  • Fixed value: You can set the value to a fixed amount of memory, such as 4GB or 8GB, depending on the size of the data and the resources available in the cluster.
  • Dynamic allocation: Spark also supports dynamic allocation of executor memory, which allows the Spark driver to adjust the amount of memory allocated to each executor based on the workload. This can be set using the spark.dynamicAllocation.enabled and spark.dynamicAllocation.executorMemoryOverhead configuration parameters.

2. Setting Spark Executor Memory

You can use the spark.executor.memory configuration property to set executor memory, there are several ways how you can set this property by using Spark defaults, SparkConfig. You can also set it by using --executor-memory while submitting the spark application.

2.1 Using the Spark configuration file

You can set the executor memory using Spark configuration, this can be done by adding the following line to your Spark configuration file (e.g., spark-defaults.conf):


// Syntax
spark.executor.memory memory_value

// Example of setting executor memory
spark.executor.memory=4g

Where <memory_value> is the amount of memory you want to allocate to each executor.

Here, In the example value “4g” is the amount of memory allocated to each executor. You can change it to the desired value.

2.2 Using the SparkConf object

Setting it programmatically using the spark.executor.memory configuration parameter in the SparkConf object


// Imports
import org.apache.spark.SparkConf

// Create SparkConf
val conf = new SparkConf()
          .setAppName("My Spark App")
          .setMaster("local[*]")
          .set("spark.executor.memory", "4g")

This sets the executor memory to 4GB.

2.3 Using command-line options

Using the --executor-memory command-line option when launching the Spark application:


// Using spark submit
./bin/spark-submit --class com.example.MyApp 
          --master yarn 
          --executor-memory 4g 
          myapp.jar

You can set the executor memory by passing the --executor-memory option to the spark-submit. This sets the executor memory to 4GB when submitting the Spark application.

2.3 Dynamic Executor memory location

Dynamic allocation is a Spark feature that allows dynamically adding or removing Spark executors to match the workload.


// Dynamic Executor memory location
val conf = new SparkConf()
          .setAppName("My Spark App")
          .setMaster("local[*]")
          .set("spark.dynamicAllocation.enabled", "true")
          .set("spark.executor.memoryOverhead", "1g")

This enables dynamic allocation of executor memory and sets the executor memory overhead to 1GB.

2.4 Setting executor memory on a per-job basis


// Set executor memory while creating spark session
val spark = SparkSession.builder()
            .appName("My Spark App")
            .config("spark.executor.memory", "4g")
            .getOrCreate()

This sets the executor memory to 4GB for the Spark session.

2.5 Using environment variable

You can set the executor memory using the SPARK_EXECUTOR_MEMORY environment variable. This can be done by setting the environment variable before running your Spark application, as follows:


// Set environment variable
export SPARK_EXECUTOR_MEMORY=
spark-submit my_spark_application.py

Where <memory> is the amount of memory you want to allocate to each executor.

It is important to carefully tune the executor memory based on the requirements of the Spark application and the available cluster resources.

3. Conclusion

It is important to set sufficient memory for each executor to avoid out-of-memory errors and maximize the performance of the Spark application. However, allocating too much memory can lead to unnecessary resource wastage, as well as longer garbage collection times. Therefore, it is recommended to carefully tune the executor memory based on the specific requirements of the application and the available cluster resources.

Related Articles

rimmalapudi

Data Engineer. I write about BigData Architecture, tools and techniques that are used to build Bigdata pipelines and other generic blogs.