Spark By Examples | Learn Spark Tutorial with Examples
In this Apache Spark Tutorial, you will learn Spark with Scala examples and every example explain here is available at Spark-examples Github project for reference. All Spark examples provided in this Spark Tutorials are basic, simple, easy to practice for beginners who are enthusiastic to learn Spark and were tested in our development environment.
Note: In case if you can’t find the spark example you are looking for on this tutorial page, I would recommend using the Search option from the menu bar to find your tutorial.
Apache Spark Core
In this section of the tutorial, you will learn different concepts of the Spark Core library with examples.Spark Core is the main base library of the Spark which provides the abstraction of how distributed task dispatching, scheduling, basic I/O functionalities and etc.
Spark RDD Tutorial with Examples
RDD (Resilient Distributed Dataset) is a fundamental data structure of Spark and it is the primary data abstraction in Apache Spark and the Spark Core. RDDs are fault-tolerant, immutable distributed collections of objects, which means once you create an RDD you cannot change it. Each dataset in RDD is divided into logical partitions, which can be computed on different nodes of the cluster.
This Spark RDD Tutorial will help you start understanding and using Apache Spark RDD (Resilient Distributed Dataset) with Scala examples. All RDD examples provided in this Tutorial were also tested in our development environment and are available at GitHub spark scala examples project for quick reference.
- RDD Parallelize
- Read text file into RDD
- Read CSV file into RDD
- Ways to create an RDD
- Create empty RDD
- RDD Transformations
- RDD Actions
- RDD Pair Functions
- Generate DataFrame from RDD
Spark DataFrame Tutorial with Examples
In this Spark SQL DataFrame Tutorial, I have explained several mostly used operation/functions on DataFrame & DataSet with working scala examples. This is a work in progress section where you will see more articles coming.
- Different ways to create a DataFrame
- How to create an empty DataFrame
- How to create an empty DataSet
- Spark DataFrame – Rename nested column
- Spark when otherwise usage
- How to Pivot and Unpivot a DataFrame
- Create a DataFrame using StructType & StructField schema
- How to create an array (ArrayType) column on DataFrame
- How to create a map (MapType) column on DataFrame
- How to select the first row of each group
Spark Dataset Tutorial with Examples
Spark SQL provides several built-in functions, When possible try to leverage standard library as they are a little bit more compile-time safety, handles null and perform better when compared to UDF’s. If your application is critical on performance try to avoid using custom UDF at all costs as these are not guarantee on performance.
In this section, we will see several Tutorials with Spark SQL functions using Scala examples.
- Spark Date and Time Functions
- Spark String Functions
- Spark Array Functions
- Spark Map Functions
- Spark Aggregate Functions
- Spark Window Functions
- Spark Sort Functions
Data Source Examples
Spark SQL supports operating on a variety of data sources through the DataFrame interface. This section of the tutorial describes reading and writing data using the Spark Data Sources with scala examples. Using Data source API we can load from or save data to RDMS databases, Avro, parquet, XML e.t.c.
- JSON Example (Read & Write)
- Parquet Example (Read and Write)
- Avro Example (Read and Write)
- Spark 2.3 – Apache Avro Example
- Processing Nested XML structured files
Spark Streaming | Kafka Examples
- Spark Streaming – OutputModes Append vs Complete vs Update
- Spark Streaming – Read JSON Files From Directory with Scala Example
- Spark Streaming – Read data From TCP Socket with Scala Example
- Spark Streaming – Consuming & Producing Kafka messages in JSON format
- Spark Streaming – Consuming & Producing Kafka messages in Avro format
- Spark Batch Processing using Kafka Data Source
Spark – Accessing HBase Examples
- Spark HBase Connectors explained
- Writing Spark DataFrame to HBase table using shc-core Hortonworks library
- Creating Spark DataFrame from Hbase table using shc-core Hortonworks library
Learn Spark from these Books
- Learning Spark: Lightning-Fast Big Data Analysis
- Spark: The Definitive Guide: Big Data Processing Made Simple
- Spark Cookbook
- High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
- Advanced Analytics with Spark: Patterns for Learning from Data at Scale