How to Set Apache Spark/PySpark Executor Memory? Spark or PySpark executor is a worker node that runs tasks on a cluster. Each executor has its own memory that is allocated by the Spark driver. This memory is used to store cached data, intermediate results, and task output.
In this article, we shall discuss the role of Spark Executor Memory and how to set Spark/PySpark executor memory in multiple ways.
Table of contents
1. Spark Executor Memory
The amount of memory allocated to an executor is determined by the spark.executor.memory
configuration parameter, which specifies the amount of memory to allocate per executor. This parameter is set in the Spark configuration file or through the SparkConf object in the application code.
The value of spark.executor.memory
can be set in several ways, such as:
- Fixed value: You can set the value to a fixed amount of memory, such as 4GB or 8GB, depending on the size of the data and the resources available in the cluster.
- Dynamic allocation: Spark also supports dynamic allocation of executor memory, which allows the Spark driver to adjust the amount of memory allocated to each executor based on the workload. This can be set using the
spark.dynamicAllocation.enabled
andspark.dynamicAllocation.executorMemoryOverhead
configuration parameters.
2. Setting Spark Executor Memory
You can use the spark.executor.memory
configuration property to set executor memory, there are several ways how you can set this property by using Spark defaults, SparkConfig. You can also set it by using --executor-memory
while submitting the spark application.
2.1 Using the Spark configuration file
You can set the executor memory using Spark configuration, this can be done by adding the following line to your Spark configuration file (e.g., spark-defaults.conf
):
// Syntax
spark.executor.memory memory_value
// Example of setting executor memory
spark.executor.memory=4g
Where <memory_value>
is the amount of memory you want to allocate to each executor.
Here, In the example value “4g” is the amount of memory allocated to each executor. You can change it to the desired value.
2.2 Using the SparkConf object
Setting it programmatically using the spark.executor.memory
configuration parameter in the SparkConf object
// Imports
import org.apache.spark.SparkConf
// Create SparkConf
val conf = new SparkConf()
.setAppName("My Spark App")
.setMaster("local[*]")
.set("spark.executor.memory", "4g")
This sets the executor memory to 4GB.
2.3 Using command-line options
Using the --executor-memory
command-line option when launching the Spark application:
// Using spark submit
./bin/spark-submit --class com.example.MyApp
--master yarn
--executor-memory 4g
myapp.jar
You can set the executor memory by passing the --executor-memory
option to the spark-submit
. This sets the executor memory to 4GB when submitting the Spark application.
2.3 Dynamic Executor memory location
Dynamic allocation is a Spark feature that allows dynamically adding or removing Spark executors to match the workload.
// Dynamic Executor memory location
val conf = new SparkConf()
.setAppName("My Spark App")
.setMaster("local[*]")
.set("spark.dynamicAllocation.enabled", "true")
.set("spark.executor.memoryOverhead", "1g")
This enables dynamic allocation of executor memory and sets the executor memory overhead to 1GB.
2.4 Setting executor memory on a per-job basis
// Set executor memory while creating spark session
val spark = SparkSession.builder()
.appName("My Spark App")
.config("spark.executor.memory", "4g")
.getOrCreate()
This sets the executor memory to 4GB for the Spark session.
2.5 Using environment variable
You can set the executor memory using the SPARK_EXECUTOR_MEMORY
environment variable. This can be done by setting the environment variable before running your Spark application, as follows:
// Set environment variable
export SPARK_EXECUTOR_MEMORY=
spark-submit my_spark_application.py
Where <memory>
is the amount of memory you want to allocate to each executor.
It is important to carefully tune the executor memory based on the requirements of the Spark application and the available cluster resources.
3. Conclusion
It is important to set sufficient memory for each executor to avoid out-of-memory errors and maximize the performance of the Spark application. However, allocating too much memory can lead to unnecessary resource wastage, as well as longer garbage collection times. Therefore, it is recommended to carefully tune the executor memory based on the specific requirements of the application and the available cluster resources.
Related Articles
- Spark Set JVM Options to Driver & Executors
- Spark Web UI – Understanding Spark Execution
- What is DAG in Spark or PySpark
- Spark SQL Performance Tuning by Configurations
- What is Apache Spark Driver?
- Cannot call methods on a stopped SparkContext in Spark
- Tune Spark Executor Number, Cores, and Memory
- Spark Set Environment Variable to Executors
- Difference Between Spark Driver vs Executor
- Difference Between Spark Worker vs Executor
- Usage of Spark Executor extrajavaoptions