Spark Setup with Scala and Run in IntelliJ

  • Post author:
  • Post category:Apache Spark

Among many other IDE’s IntelliJ IDEA is a most used IDE to run Spark application written in Scala due to it’s good Scala code completion, in this article, I will explain how to setup run an Apache Spark application written in Scala using Apache Maven with IntelliJ IDEA.

1. Install JDK

You might be aware that Spark created in Scala language and Scala is a JVM language that needs JVM to run hence, to compile & execute Spark application you need to have Java installed on your system.

Download and Install Java 8 or above from Oracle.com

2. Setup IntelliJ IDEA for Spark

Most of the Spark engineers use IntelliJ IDEA to run Spark applications written in Scala due to it’s good Scala compatibility hence, It’s better to have a development environment setup using IntelliJ.

IntelliJ IDEA comes with community & ultimate edition, In order to run the Spark application written in Scala, community edition is just enough for us, so download IntelliJ IDEA community edition.

  1. You can either download windows installer(.exe) or as a compressed zip (.zip) file based on your convenience. I’ve downloaded the .zip file.
Download IntelliJ IDE

2. Now, let’s unzip either using Winzip, 7-Zip, or any other zip extracts you have. I’ve used 7-Zip to extract the contents to the folder.

Extract IntelliJ IDEA

3. Move the extracted folder from Downloads to your working folder. In my case, I am moving it to c:\apps\.

4. Start IntelliJ IDE by running idea64.exe from C:\apps\ideaIC-2020.2.1.win\bin\idea64.exe

3. Create a Scala project In IntelliJ

After starting an IntelliJ IDEA IDE, you will get a Welcome screen with different options.

  1. Select New Project to open New Project window.
Create Scala Project Maven

2. Select Maven from the left panel

3. Check option Create from archetype

4. Select org.scala-tools.archetypes:scala-archetypes-simple.

  • The archetype is a kind of templates that creates the right directory structure and downloads the required default dependencies. Since we have selected Scala archetypes, it downloads all Scala dependencies and enables IntelliJ to write Scala code.

5. In the next window, enter the project name. I am naming my project as spark-hello-world-example.

6. On next screen, review the options for artifact-id and group-id

7. Select Finish.

spark setup run intellij

You will see the project created on IntelliJ and shows the project structure on left Project panel.

4. Install Scala Plugin

Now navigate to

  1. Open File > Settings (or using shot keys Ctrl + Alt + s )
  2. Select the Plugins option from the left panel. This brings you Feature panel.
  3. Click on Install to install the Scala plugin.
spark setup run intellij windows

4. After plugin install, restart the IntelliJ IDE.

5. Setup Scala SDK

  1. IntelliJ will prompt you as shown below to Setup Scala SDK.
Add Scala SDK to run Spark

2. Select Setup Scala SDK, it prompts you the below window,

3. Select the create option.

spark setup run intellij windows

4. From the next window select the Download option and

5. Choose the Scala version 2.12.12 (latest at the time of writing this article)

6. Make changes to pom.xml file

Now, we need to make some changes in the pom.xml file, you can either follow the below instructions or download the pom.xml file GitHub project and replace into your pom.xml file.

  1. First, change the Scala version to the latest version, I am using 2.12.12
 
 <properties>
    <scala.version>2.12.12</scala.version>
 </properties>

2. Remove following plugin

 
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
          <args>
            <arg>-target:jvm-1.5</arg>
          </args>
        </configuration>
      </plugin>

7. Delete Unnecessary Files

Now delete the following from the project workspace.

  1. Delete src/test
  2. Delete src/main/scala/org.example.App
spark setup run intellij windows

8. Add Spark Dependencies to Maven pom.xml File

Add Spark dependencies to pom.xml file


    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>3.0.0</version>
      <scope>compile</scope>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>3.0.0</version>
      <scope>compile</scope>
    </dependency>

9. Create Spark Hello world Application on IntelliJ

1. Now create the Spark Hello world program. Our hello world example doesn’t display “Hello World” text instead it creates a SparkSession and displays Spark app name, master and deployment mode to console.


package org.example
import org.apache.spark.sql.SparkSession
object SparkSessionTest extends App{
    val spark = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample")
      .getOrCreate();
    
    println("First SparkContext:")
    println("APP Name :"+spark.sparkContext.appName);
    println("Deploy Mode :"+spark.sparkContext.deployMode);
    println("Master :"+spark.sparkContext.master);

    val sparkSession2 = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample-test")
      .getOrCreate();

    println("Second SparkContext:")
    println("APP Name :"+sparkSession2.sparkContext.appName);
    println("Deploy Mode :"+sparkSession2.sparkContext.deployMode);
    println("Master :"+sparkSession2.sparkContext.master);
}

2. Some time the dependencies in pom.xml are not automatic loaded hence, re-import the dependencies or restart the IntelliJ.

3. Run the Maven build.

spark setup run intellij windows

4. Finally Run the Spark application SparkSessionTest

5. This should display below output on the console. In case if you still get errors during the running of the Spark application, please restart the IntelliJ IDE and run the application again. Now you should see the below message in the console.

Spark with Scala output

If you have any questions or error while setting up the Spark on IntelliJ, please comment or ask me a question on Ask me

What to read next?

Once you complete the Spark Setup, you should know what is Spark Session, what is Spark Context and read Spark RDD, Spark RDD Actions, Spark RDD Transformations

Happy Learning !!

NNK

SparkByExamples.com is a BigData and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment using Scala and Maven.

This Post Has One Comment

  1. Arun

    Excellent effort .. thank you so much for sharing this across..

Leave a Reply