How to Run Spark Examples from IntelliJ

Here, I will explain how to run Apache Spark Application examples explained in this blog on windows using Scala & Maven from IntelliJ IDEA. Since the articles mentioned in this tutorial uses Apache Maven as the build system, we will use Maven to build the project.

Make sure you have the following before you proceed.

1. Clone Spark Examples GitHub Project into IntelliJ

Let’s clone Spark By Examples Github project into IntelliJ by using the Version Control option.

  • Open IntelliJ IDEA
  • Create a new project by selecting File > New > Project from Version Control.
Spark maven build intellij

Using this option, we are going to import the project directly from GitHub repository.

Spark maven build intellij
  • On Get from Version Control window, select the Version control as Git and enter the below Github URL for URL and enter the directory where you wanted to clone.
  • If you don’t have Git installed, select the “Download and Install” option from the above window.
  • After Git installation, select the clone option which clones the project into your given folder.
  • This creates a new project on IntelliJ and starts cloning.
  • Now, wait for a few mins to complete the clone and also import the project into the workspace.

Once the cloning completes, you will see the following project workspace structure on IntelliJ.

Spark github clone

2. Run Maven build

Now run the Maven build. First, select the Maven from the right corner, navigate to Lifecycle > install, right-click, and select Run Maven Build.

Spark maven build intellij

This downloads all dependencies mentioned in the pom.xml file and compiles all examples in this tutorial. This also takes a few mins to complete and you should see the below message after a successful build.

Spark maven build intellij

3. Run Spark Program From IntelliJ

After successful Maven build, run src/main/scala/com.sparkbyexamples.spark.SparkSessionTest example from IntelliJ.

In case if you still get errors during the running of the Spark application, please restart the IntelliJ IDE and run the application again. Now you should see the below message in the console.

spark run example intelliJ

Once you complete the running Spark sample example in IntelliJ, you should read what is Spark Session, what is Spark Context, Spark RDD, Spark RDD Actions, Spark RDD Transformations.

Happy Learning !!

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply

This Post Has One Comment

  1. ZipTx

    the pom.xml from git needs updated with new location of maven plugins in order to compile.


    4.4.0 …