How to create a Spark Java Project in IntelliJ and run a Maven build? Running Apache Spark in Java is a viable option, and it can be a good choice depending on your project’s requirements and your team’s familiarity with Java. Apache Spark supports multiple programming languages, including Scala, Python, and Java. In this article, I have explained a step-by-step guide on running Spark in Java, IntelliJ, and Maven.
To create a Spark Java project in IntelliJ IDEA and build it with a Maven, follow these steps:
Step 1: Install IntelliJ IDEA: If you haven’t already, download and install IntelliJ IDEA from the official website. You can use the free Community edition or the Ultimate edition for more advanced features.
Step 2: Install Java: Make sure you have Java Development Kit (JDK) installed on your system. You can download it from the Oracle website or use OpenJDK.
Step 3: Create a New Project: Open IntelliJ IDEA and create a new Java project:
- Click on “File” -> “New” -> “Project.”
- On the New Project window, fill in the Name, Location, Language, Built system, and JDK version (Choose JDK 11 version).
- Make sure you select Java for the Language and Maven for the Build system.
- From the Advanced Settings, Fill out the group and artifact ID information.
Step 4: Add Spark Dependency: In your pom.xml
(Maven project file), add the Apache Spark dependencies.
<!-- Add tthe following to your pom.xml file -->
<dependencies>
<!-- Spark dependencies -->
<!-- Use the appropriate version for your setup -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.5.0</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.5.0</version>
<scope>compile</scope>
</dependency>
</dependencies>
IntelliJ IDEA should automatically detect the changes and offer to import the Maven changes. If not, you can right-click on the pom.xml
file and select Maven -> Reload project.
Step 5: Create a Spark Java Class: Create a new Java class that will serve as your Spark application. For example, you can create a class named SparkJavaExample
.
Step 6: Write Your Spark Code: Write your Spark code in the SparkJavaExample
class. Make sure to import necessary Spark classes and set up your SparkContext and SparkSession as needed. Below is an example that explains how to create Java RDD in Spark.
// Create Java RDD Example
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.*;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class SparkJavaExample {
public static void main(String args[]){
// Create SparkSession
SparkSession spark = SparkSession.builder()
.appName("sparkbyexamples.com")
.master("local[*]")
.getOrCreate();
// Create Java SparkContext
JavaSparkContext jsc = new JavaSparkContext(
spark.sparkContext());
// Create RDD
List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> rdd = jsc.parallelize(data);
// Print rdd object
System.out.println(rdd);
// Print rdd contents to console
rdd.collect().forEach(System.out::println);
// Another RDD example
List<String[]> dataList = new ArrayList<>();
dataList.add(new String[] { "California", "CA" });
dataList.add(new String[] { "New York", "NY" });
// Create RDD
JavaRDD<Row> rdd2 = jsc.parallelize(dataList)
.map((String[] row) -> RowFactory.create(row));
// Print rdd object
System.out.println(rdd2);
// Print RDD contents to console
rdd2.collect().forEach(System.out::println);
// Stop the SparkSession and JavaSparkContext
spark.stop();
jsc.stop();
}
}
Step 7: Configure Run/Debug Configuration: Configure the run/debug settings in IntelliJ IDEA:
- Click “Run” -> “Edit Configurations…”
- Click the “+” button to add a new configuration and select “Application.”
- Set the main class to your Spark application class (
SparkJavaExample
in this case).
Step 8: Run Your Spark Application: Click the green “Run” button to execute your Spark application. It will build the Maven project and run your Spark code.
Step 9: View Output: You can view the output of your Spark application in the IntelliJ IDEA console.
Conclusion
That’s it! You’ve created a Spark Java project in IntelliJ IDEA and successfully run a Maven build. Make sure to adjust the Spark version, Java version, and other dependencies in your pom.xml
and Spark code as needed for your specific project requirements.
Related Articles
- Apache Spark Setup with Scala and IntelliJ
- Apache Spark Installation on Windows
- Spark Installation on Linux Ubuntu
- Spark Hello World Example in IntelliJ IDEA
- Spark Word Count Explained with Example
- Spark Setup on Hadoop Cluster with Yarn
- Spark Start History Server
- Spark Shell Command Usage with Examples
- Spark Submit Command Explained with Examples
- What is SparkSession and How to create it?
- What is SparkContext and How to create it?
- How to Check Spark Version