We often need to debug Spark application or job to look at the values in runtime in order to fix issues, we typically use IntelliJ Idea or Eclipse IDE to debug locally or remote running applications written in Scala or Java.
In this article, I will explain how to debug the Spark application running locally and remotely using IntelliJ Idea IDE.
Before you proceed with this article, Install and setup Spark to run local and on remote and have your IntelliJ Idea IDE setup to run Spark applications.
1. Debug Spark application running Locally
To debug a Scala or Java application, you need to run the application with JVM options
agentlib:jdwp is the Java Debug Wire Protocol (JDWP) option, followed by a comma-separated list of sub-option
// Debug Spark application running locally agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005
But to run with spark-submit, you need to add
--conf spark.driver.extraJavaOptions along with options as shown below.
spark-submit \ --name SparkByExamples.com \ --class org.sparkbyexamples.SparkWordCountExample \ --conf spark.driver.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005 spark-by-examples.jar
By running the above command, it prompts you with the below message, and your application pauses.
Listening for transport dt_socket at address: 5005
Now, open the IntelliJ editor and do the following.
- Open the project you wanted to debug
- Open the Spark project you wanted to debug.
- Add some debugging breakpoints to the scala classes.
And, follow the below steps to create Remote application and start to debug.
- Open your Spark application you wanted to debug in IntelliJ Idea IDE
- Access Run -> Edit Configurations, this brings you Run/Debug Configurations window
- Now select Applications and select + sign from the top left corner and select Remote option.
- Enter your debugger name for Name field. for example, enter SparkLocalDebug.
- For Debugger mode option select Attach to local JVM.
- For Transport, select Socket (this selected by default).
- For Host, enter localhost as we are debugging Local and enter the port number for Port. For our example, we are using 5005.
- Finally, select OK. This just creates the Application to debug but it doesn’t start.
In order to start the application, select the Run -> Debug SparkLocalDebug, this tries to start the application by attaching to 5005 port.
Now you should see your spark-submit application running and when it encounter debug breakpoint, you will get the control to IntelliJ.
Now use the debug control keys or options to step through the application. In case if you are not sure how to step through, follow this IntelliJ step through article.
In case you are not running spark application on 5005 port on the localhost, this returns below error message.
Error running 'SparkLocalDebug': Unable to open debugger port (localhost:5005): java.net.ConnectException "Connection refused: connect" (6 minutes ago)
2. Debug Spark application running on Remote server
If you are running spark application on a remote node and you wanted to debug via IntelliJ, you need to set the environment variable SPARK_SUBMIT_OPTS with the debug information.
// Debug Spark application running on Remote server export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5050
Now run your spark-submit, which will wait for the the debugger.
Finally, Open the IntelliJ and follow the above points. and for the host, enter your remote host where your spark application is running.
In this article, you have learned how to debug Spark application or job running local or remote server using IntelliJ IDE, you can also follow the similar steps to debug from eclipse as well.
Happy Learning !!