Debug Spark application Locally or Remote

We often need to debug Spark application or job to look at the values in runtime in order to fix issues, we typically use IntelliJ Idea or Eclipse IDE to debug locally or remote running applications written in Scala or Java.

In this article, I will explain how to debug the Spark application running locally and remotely using IntelliJ Idea IDE.

Before you proceed with this article, Install and setup Spark to run local and on remote and have your IntelliJ Idea IDE setup to run Spark applications.

1. Debug Spark application running Locally

To debug a Scala or Java application, you need to run the application with JVM options agentlib:jdwp, where agentlib:jdwp is the Java Debug Wire Protocol (JDWP) option, followed by a comma-separated list of sub-option

// Debug Spark application running locally

But to run with spark-submit, you need to add agentlib:jdwp with --conf spark.driver.extraJavaOptions along with options as shown below.

spark-submit \
  --name \
  --class org.sparkbyexamples.SparkWordCountExample \
  --conf spark.driver.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

By running the above command, it prompts you with the below message, and your application pauses.

Listening for transport dt_socket at address: 5005

Now, open the IntelliJ editor and do the following.

And, follow the below steps to create Remote application and start to debug.

spark debug locally
Spark debug locally with IntelliJ

In order to start the application, select the Run -> Debug SparkLocalDebug, this tries to start the application by attaching to 5005 port.

Now you should see your spark-submit application running and when it encounter debug breakpoint, you will get the control to IntelliJ.

Now use the debug control keys or options to step through the application. In case if you are not sure how to step through, follow this IntelliJ step through article.

In case you are not running spark application on 5005 port on the localhost, this returns below error message.

Error running 'SparkLocalDebug': Unable to open debugger port (localhost:5005): "Connection refused: connect" (6 minutes ago)

2. Debug Spark application running on Remote server

If you are running spark application on a remote node and you wanted to debug via IntelliJ, you need to set the environment variable SPARK_SUBMIT_OPTS with the debug information.

// Debug Spark application running on Remote server
export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5050

Now run your spark-submit, which will wait for the the debugger.

Finally, Open the IntelliJ and follow the above points. and for the host, enter your remote host where your spark application is running.

4. Conclusion

In this article, you have learned how to debug Spark application or job running local or remote server using IntelliJ IDE, you can also follow the similar steps to debug from eclipse as well.

Happy Learning !!

