Spark Setup with Scala and Run in IntelliJ

I will guide you step-by-step on how to setup Apache Spark with Scala and run in IntelliJ. IntelliJ IDEA is the most used IDE to run Spark applications written in Scala due to its good Scala code completion. In this article, I will explain how to setup and run an Apache Spark application written in Scala using Apache Maven with IntelliJ IDEA.

1. Install JDK

You might be aware that Spark was created in Scala language and Scala is a JVM language that needs JVM to run hence, to compile & execute Spark applications you need to have Java installed on your system.

Download and Install Java 8 or above from Oracle.com or OpenJDK

2. Setup IntelliJ IDEA for Spark

Most of the Spark engineers use IntelliJ IDEA to run Spark applications written in Scala due to its good Scala compatibility hence, It’s better to have a development environment setup using IntelliJ.

IntelliJ IDEA comes with a community & ultimate edition, In order to run the Spark application written in Scala, the community edition is just enough for us, so download IntelliJ IDEA community edition.

  1. You can either download the windows installer(.exe) or as a compressed zip (.zip) file based on your convenience. I’ve downloaded the .zip file.
spark scala intellij maven setup

2. Now, let’s unzip either using Winzip, 7-Zip, or any other zip extracts you have. I’ve used 7-Zip to extract the contents to the folder.

Spark Setup Run IntelliJ

3. Move the extracted folder from Downloads to your working folder. In my case, I am moving it to c:\apps\.

4. Start IntelliJ IDE by running idea64.exe from C:\apps\ideaIC-2020.2.1.win\bin\idea64.exe

3. Create a Scala project In IntelliJ

After starting an IntelliJ IDEA IDE, you will get a Welcome screen with different options.

  1. Select New Project to open the new project window.
Create Scala Project Maven

2. Select Maven from the left panel

3. Check option Create from archetype

4. Select org.scala-tools.archetypes:scala-archetypes-simple.

  • The archetype is a kind of template that creates the right directory structure and downloads the required default dependencies. Since we have selected Scala archetypes, it downloads all Scala dependencies and enables IntelliJ to write Scala code.

5. In the next window, enter the project name. I am naming my project spark-hello-world-example.

6. On the next screen, review the options for artifact-id and group-id

7. Select Finish.

This creates a project on IntelliJ and shows the following project structure on the left Project panel. (you need to expand the project until you see App.scala file.

spark setup run intellij

4. Install Scala Plugin

Now navigate to

  1. Open File > Settings (or using shot keys Ctrl + Alt + s ) . On macOS use IntellijIDEA -> Preferences
  2. Select the Plugins option from the left panel. This brings you to the Feature panel.
  3. Click on Install to install the Scala plugin.
spark setup run maven

4. After plugin installation, restart the IntelliJ IDE.

5. Setup Scala SDK

1. After restart, IntelliJ will prompt you as shown below to Setup Scala SDK. If this doesn’t show up then You can add Scala SDK as a global library for your project in `Project Settings [F4] -> Platform Settings / Global Libraries -> Add -> Scala SDK -> Download`

Add Scala SDK to run Spark

2. Select Setup Scala SDK, it prompts you to the below window,

3. Select the create option.

4. From the next window select the Download option and

5. Choose Scala version 2.13.0 (latest at the time of writing this article)

6. Make changes to pom.xml file

Now, we need to make some changes to the pom.xml file, you can either follow the below instructions or download the pom.xml file GitHub project and replace it with your pom.xml file.

1. First, change the Scala version to the latest version, I am using 2.13.0


 <properties>
    <scala.version>2.13.0</scala.version>
 </properties>

2. Remove the following plugin


      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
          <args>
            <arg>-target:jvm-1.5</arg>
          </args>
        </configuration>
      </plugin>

3. Delete the following line.


<testSourceDirectory>src/test/scala</testSourceDirectory>

7. Delete Unnecessary Files

Now delete the following from the project workspace.

  1. Delete src/test
  2. Delete src/main/scala/org.example.App
spark setup run intellij windows

8. Add Spark Dependencies to Maven pom.xml File

Add Spark dependencies to pom.xml file


    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.13</artifactId>
      <version>3.5.0</version>
      <scope>compile</scope>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.13</artifactId>
      <version>3.5.0</version>
      <scope>compile</scope>
    </dependency>

9. Create Spark Hello world Application on IntelliJ

1. Now create the Scala class SparkSessionTest program, I will treat this as a Hello World Program for Spark. Our hello world example doesn’t display “Hello World” text instead it creates a SparkSession and displays the Spark app name, master, and deployment mode to the console.


package org.example
import org.apache.spark.sql.SparkSession
object SparkSessionTest extends App{
    val spark = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample")
      .getOrCreate();
    
    println("First SparkContext:")
    println("APP Name :"+spark.sparkContext.appName);
    println("Deploy Mode :"+spark.sparkContext.deployMode);
    println("Master :"+spark.sparkContext.master);

    val sparkSession2 = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample-test")
      .getOrCreate();

    println("Second SparkContext:")
    println("APP Name :"+sparkSession2.sparkContext.appName);
    println("Deploy Mode :"+sparkSession2.sparkContext.deployMode);
    println("Master :"+sparkSession2.sparkContext.master);
}

2. Sometimes the dependencies in pom.xml are not automatically loaded hence, re-import the dependencies or restart the IntelliJ.

3. Run the Maven build.

Select Maven from the right top corner, select M, and type or select mvn clean install.

This Installs all Maven dependencies and builds your project. Finally, you should see BUILD SUCCESS as below.

spark setup run intellij windows

4. Now run the Spark application SparkSessionTest program.

5. This should display the below output on the console. In case you still get errors during the running of the Spark application, please restart the IntelliJ IDE and run the application again. Now you should see the below message in the console.

Spark with Scala output

Please comment if you have any questions or errors while setting up the Spark on IntelliJ.

Once you complete the Spark Setup, you should know what is Spark Session, what is Spark Context, and read Spark RDD, Spark RDD Actions, Spark RDD Transformations

Happy Learning !!

Related Article

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply

This Post Has 11 Comments

  1. Anonymous

    Thanks for the right stuff

  2. Sachin Sthalekar

    Hats off to you. Much appreciated for this great resource.

  3. HASAN

    Thank you so much, it would be appreciated if you explain how to install Spark-Scala on Intellij with Gradle build.
    Regards

  4. Anonymous

    oh yeah let me just ‘run the maven build’ on this brand new program you helped me install. I have no bloody idea how to do that some actual guidance on this guide would be nice

  5. Ahmed Quraishi

    How do I contribute as well ?

    1. NNK

      Hi Ahmed, Thanks for sharing your interest in contributing to SparkByExamples.com. I will send you an e-mail on the details.

  6. ZipTX

    In regards to 5.1 … I actually had to do these steps to get the “setup Scala SDK” on a fresh installation: For newer versions of IntelliJ, enable the use plugin registry option within the Maven settings as follows:

    Click File 🡒 Settings.
    Expand Build, Execution, Deployment 🡒 Build Tools 🡒 Maven.
    Check Use plugin registry.
    Click OK or Apply.

    Regardless of version, also invalidate the caches:
    Click File 🡒 Invalidate Caches / Restart.
    Click Invalidate and Restart.

  7. ZipTX

    FYI … Step 5.1 does not show up as captured. You can add Scala SDK as global library for your project in `Project Settings [F4] -> Platform Settings / Global Libraries -> Add -> Scala SDK`

    Also some wordage that you need to create a “Scala Class” in step 9.1 would be helpful

    Thanks for the incredible write-up!

    1. NNK

      Thank you ZipTx

  8. Arun

    Excellent effort .. thank you so much for sharing this across..