This Spark DataFrame Tutorial will help you start understanding and using Spark DataFrame API with Scala examples and All DataFrame examples provided in this Tutorial were tested in our development environment and are available at Spark-Examples GitHub project for easy reference.

Examples I used in this tutorial to explain DataFrame concepts are very simple and easy to practice for beginners who are enthusiastic to learn Spark DataFrame.

If you are looking for a specific topic that can’t find here, please don’t disappoint as I would highly recommend searching using the search option on top of the page as I’ve already covered hundreds of Spark Tutorial concepts with real-time examples and you might get lucky finding it.

Related: Spark SQL Tutorial

In case you still can’t find it, please send me the topic you are looking for in the comments or Q&A section and I will try my best to cover it ASAP.

Finally, subscribe by providing your e-mail to get more updates.

Table of Contents

  • DataFrame Introduction
    • What is Spark DataFrame
    • RDD vs DataFrame
    • DataFrame Advantages
  • Creating Spark DataFrame
    • Create DataFrame
    • Creating empty DataFrame
    • Convert RDD to DataFrame
  • Working with DataFrame columns
    • Add column
    • Rename column
    • Update column
    • Drop column
    • Case when and when otherwise
  • Filtering rows on DataFrame
    • Using filter & where methods
    • Using relation operators
    • Using conditional operators
  • Spark StructType and schema
    • Programmatically specifying schema
    • Loading schema from JSON
    • Converting case class to a schema
  • DataFrame Transformations
    • Map transformations
    • Pivot & Unpivot
    • Handling nulls
    • DataFrame group by
  • DataFrame Joins
    • Inner join
    • Outer join
    • Left outer join
    • Right outer join
    • Cross join
    • Self join
  • DataFrame Union
    • Union
    • Union all
  • Spark SQL Functions
    • String functions
    • Math functions
    • Date & Time Functions
    • Array & Map functions
    • Sorting Functions
    • Aggregate Functions
    • Window Functions
  • Spark Datasource API
    • Read & write CSV
    • Read & Write JSON
    • Read & write Avro
    • Read & write parquet
    • Read & write XML
    • Read & Write HBase tables