Spark DataFrame & Dataset Tutorial

This Spark DataFrame Tutorial will help you start understanding and using Spark DataFrame API with Scala examples and All DataFrame examples provided in this Tutorial were tested in our development environment and are available at Spark-Examples GitHub project for easy reference.

Examples I used in this tutorial to explain DataFrame concepts are very simple, easy to practice for beginners who are enthusiastic to learn Spark DataFrame.

This tutorial is work in progress, If you are looking for a specific topic that can’t find here, please don’t disappoint and I would highly recommend searching using the search option on top of the page as I’ve already covered hundreds of Spark DataFrame concepts with real-time examples and you might get lucky finding it.

In case if you still can’t find it, please send me the topic you are looking for in the comments or Q&A section and I will try my best to cover it ASAP.

Finally, subscribe by providing your e-mail to get more updates.

Table of Contents

  • DataFrame Introduction
    • What is Spark DataFrame
    • RDD vs DataFrame
    • DataFrame Advantages
  • Creating Spark DataFrame
    • Create DataFrame
    • Creating empty DataFrame
    • Convert RDD to DataFrame
  • Working with DataFrame columns
    • Add column
    • Rename column
    • Update column
    • Drop column
    • Case when and when otherwise
  • Filtering rows on Dataframe
    • Using filter & where methods
    • Using relation operators
    • Using conditional operators
  • Spark StructType and schema
    • Programatically specifying schema
    • Loading schema from JSON
    • Converting case class to a schema
  • DataFrame Transformations
    • Map transformations
    • Pivot & Unpivot
    • Handling nulls
    • DataFrame group by
  • DataFrame Joins
    • Inner join
    • Outer join
    • Left outer join
    • Right outer join
    • Cross join
    • Self join
  • DataFrame Union
    • Union
    • Union all
  • Spark SQL Functions
    • String functions
    • Math functions
    • Date & Time Functions
    • Array & Map functions
    • Sorting Functions
    • Aggregate Functions
    • Window Functions
  • Spark Datasource API
    • Read & write CSV
    • Read & Write JSON
    • Read & write Avro
    • Read & write parquet
    • Read & write XML
    • Read & Write HBase tables

Spark DataFrame Introduction

Creating Spark DataFrame

In this DataFrame chapter, you will learn how to create a DataFrame in different ways.

Creating DataFrame from a collection data set

Creating DataFrame from files

Creating Empty Dataframe

Converting RDD to DataFrame

Working with DataFrame columns