Debug Spark application Locally or Remote

We often need to debug Spark application or job to look at the values in runtime in order to fix issues, we typically use IntelliJ Idea or Eclipse IDE to debug locally or remote running applications written in Scala or Java.

1. Debug Spark application running Locally

To debug a Scala or Java application, you need to run the application with JVM options agentlib:jdwp, where agentlib:jdwp is the Java Debug Wire Protocol (JDWP) option, followed by a comma-separated list of sub-option


// Debug Spark application running locally
agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

But to run with spark-submit, you need to add agentlib:jdwp with --conf spark.driver.extraJavaOptions along with options as shown below.


spark-submit \
  --name SparkByExamples.com \
  --class org.sparkbyexamples.SparkWordCountExample \
  --conf spark.driver.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005
  spark-by-examples.jar

By running the above command, it prompts you with the below message, and your application pauses.


Listening for transport dt_socket at address: 5005

Now, open the IntelliJ editor and do the following.

Open the project you wanted to debug
Open the Spark project you wanted to debug.
Add some debugging breakpoints to the scala classes.

And, follow the below steps to create Remote application and start to debug.

Open your Spark application you wanted to debug in IntelliJ Idea IDE
Access Run -> Edit Configurations, this brings you Run/Debug Configurations window
Now select Applications and select + sign from the top left corner and select Remote option.
Enter your debugger name for Name field. for example, enter SparkLocalDebug.
For Debugger mode option select Attach to local JVM.
For Transport, select Socket (this selected by default).
For Host, enter localhost as we are debugging Local and enter the port number for Port. For our example, we are using 5005.
Finally, select OK. This just creates the Application to debug but it doesn’t start.

spark debug locally — Spark debug locally with IntelliJ

In order to start the application, select the Run -> Debug SparkLocalDebug, this tries to start the application by attaching to 5005 port.

Now you should see your spark-submit application running and when it encounter debug breakpoint, you will get the control to IntelliJ.

Now use the debug control keys or options to step through the application. In case if you are not sure how to step through, follow this IntelliJ step through article.

In case you are not running spark application on 5005 port on the localhost, this returns below error message.


Error running 'SparkLocalDebug': Unable to open debugger port (localhost:5005): java.net.ConnectException "Connection refused: connect" (6 minutes ago)

2. Debug Spark application running on Remote server

If you are running spark application on a remote node and you wanted to debug via IntelliJ, you need to set the environment variable SPARK_SUBMIT_OPTS with the debug information.


// Debug Spark application running on Remote server
export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5050

Now run your spark-submit, which will wait for the the debugger.

Finally, Open the IntelliJ and follow the above points. and for the host, enter your remote host where your spark application is running.

4. Conclusion

In this article, you have learned how to debug Spark application or job running local or remote server using IntelliJ IDE, you can also follow the similar steps to debug from eclipse as well.

Happy Learning !!

This Post Has 3 Comments

Sri September 4, 2023

Thank you. This article was helpful
NNK December 25, 2020

Thank you for correcting. appreciate your help.
Qinsi Long December 25, 2020

–conf spark.driver.extraJavaOptions=agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005
should be:
–conf spark.driver.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

1. Debug Spark application running Locally

4. Conclusion

Related Articles

Leave a Reply

This Post Has 3 Comments