Run Spark Hello World Example in IntelliJ

| *** Please Subscribe for Ad Free & Premium Content ***

Post author:Naveen Nelamali
Post category:Apache Spark
Post last modified:October 23, 2023
Reading time:10 mins read

You are currently viewing Run Spark Hello World Example in IntelliJ

I will explain how to run Apache Spark Hello world example in IntelliJ on Windows using Scala & Maven. I created a Spark basic example in Apache Spark GitHub Examples project hence, I will clone this and use it to make it simple.

A simple “Hello, World!” example in Apache Spark typically involves setting up a Spark session, Creating an RDD or DataFrame, and performing a basic transformation on a dataset.

Make sure you have the IntelliJ IDE Setup and run Spark Application with Scala on Windows before you proceed.

1. Spark Maven Dependency

In order to run the Apache Spark Hello World Example on IntelliJ, you would need to have the below Scala and Spark Maven dependencies.


    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>

    <dependency>
      <groupId>org.specs</groupId>
      <artifactId>specs</artifactId>
      <version>1.2.5</version>
      <scope>test</scope>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>${spark.version}</version>
      <scope>compile</scope>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>${spark.version}</version>
      <scope>compile</scope>
    </dependency>

2. Apache Spark Hello World Example

In other languages to demonstrate Hello World, we would just print the Hello World statement in the console, since Apache Spark is a framework to process data in memory, I will show how to create a Spark Session object and print some details from the Spark session object.


import org.apache.spark.sql.SparkSession

object SparkSessionTest {

  def main(args:Array[String]): Unit ={

    val spark = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample")
      .getOrCreate();
    
    println("First SparkContext:")
    println("APP Name :"+spark.sparkContext.appName);
    println("Deploy Mode :"+spark.sparkContext.deployMode);
    println("Master :"+spark.sparkContext.master);

    val sparkSession2 = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample-test")
      .getOrCreate();

    println("Second SparkContext:")
    println("APP Name :"+sparkSession2.sparkContext.appName);
    println("Deploy Mode :"+sparkSession2.sparkContext.deployMode);
    println("Master :"+sparkSession2.sparkContext.master);

  }
}

3. Spark GitHub Clone – Hello World Example Project

To make things simple, I have created a Spark Hello World project in GitHub and I will use this to run the example. First let’s clone the project, build, and run.

Open IntelliJ IDEA
Create a new project by selecting File > New > Project from Version Control.

Using this option, we are going to import the project directly from the GitHub repository.

On Get from Version Control window, select the Version control as Git and enter the below Github URL for URL and enter the directory where you want to clone.


https://github.com/spark-examples/spark-hello-world-example

If you don’t have Git installed, select the “Download and Install” option from the above window.

After Git installation, select the clone option which clones the project into your given folder.
This creates a new project on IntelliJ and starts cloning.
Now, wait for a few minutes to complete the clone and import the project into the workspace.

Once the cloning completes, you will see the project workspace structure on IntelliJ.

4. Run Maven build

Now run the Maven build. First, select the Maven from the right corner, navigate to Lifecycle > install, right-click, and select Run Maven Build.

This downloads all dependencies mentioned in the pom.xml file and compiles all examples in this tutorial. This also takes a few minutes to complete and you should see the below message after a successful build.

5. Run Apache Spark Hello World Program

After a successful Maven build, run src/main/scala/com.sparkbyexamples.spark.SparkSessionTest

In case you still get errors during the running of the Spark application, please restart the IntelliJ IDE and run the application again. Now you should see the below message in the console.

6. Create DataFrame

To complete this Apache Spark Hello World program let’s create a RDD and create a DataFrame from it; run the below program and explore the output.


// Import
import org.apache.spark.sql.SparkSession

// Create SparkSession and Prepare Data
val spark:SparkSession = SparkSession.builder()
   .master("local[1]").appName("SparkByExamples.com")
   .getOrCreate()

import spark.implicits._
val columns = Seq("language","users_count")
val data = Seq(("Java", "20000"), ("Python", "100000"), ("Scala", "3000"))

// Spark Create DataFrame from RDD
val rdd = spark.sparkContext.parallelize(data)

// Create DataFrame
val dfFromRDD1 = rdd.toDF()
dfFromRDD1.printSchema()

Where to go next?

Once you are able to run the Spark Hello Work example, you should read Spark RDD, Create Spark DataFrame, How to read CSV file into Spark

Happy Learning !!

Tags: apache spark setup

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has 2 Comments

NNK November 2, 2020

Hi, By chance if you installed multiple Scala versions, please select either the 2.11 or 2.12 version.
Micky Williamson November 2, 2020

Multiple versions of scala libraries detected!