Spark – Create a DataFrame with Array of Struct column

Problem: How to create a Spark DataFrame with Array of struct column using Spark and Scala? Using StructType and ArrayType classes we can create a DataFrame with Array of Struct column ( ArrayType(StructType) ). From below example column "booksInterested" is an array of StructType which holds "name", "author" and the…

Continue Reading Spark – Create a DataFrame with Array of Struct column

Spark – Define DataFrame with Nested Array

Problem: How to define Spark DataFrame using the nested array column (Array of Array)? Solution: Using StructType we can define an Array of Array (Nested Array) ArrayType(ArrayType(StringType)) DataFrame column using Scala example. The below example creates a DataFrame with a nested array column. From below example column "subjects" is an…

Continue Reading Spark – Define DataFrame with Nested Array

PySpark – explode nested array into rows

Problem: How to explode & flatten nested array (Array of Array) DataFrame columns into rows using PySpark. Solution: PySpark explode function can be used to explode an Array of Array (nested Array) ArrayType(ArrayType(StringType)) columns to rows on PySpark DataFrame using python example. Before we start, let’s create a DataFrame with a nested…

Continue Reading PySpark – explode nested array into rows

PySpark explode array and map columns to rows

In this article, I will explain how to explode array or list and map columns to rows using different PySpark DataFrame functions (explode, explore_outer, posexplode, posexplode_outer) with Python example. Before we start, let’s create a DataFrame with array and map fields, below snippet, creates a DF with columns “name” as…

Continue Reading PySpark explode array and map columns to rows

Spark – explode Array of Array (nested array) to rows

Problem: How to explode & flatten the Array of Array (Nested Array) DataFrame columns into rows using Spark. Solution: Spark explode function can be used to explode an Array of Array (Nested Array) ArrayType(ArrayType(StringType)) columns to rows on Spark DataFrame using scala example. Before we start, let's create a DataFrame…

Continue Reading Spark – explode Array of Array (nested array) to rows

Spark – explode Array of Map to rows

Problem: How to explode the Array of Map DataFrame columns to rows using Spark. Solution: Spark explode function can be used to explode an Array of Map ArrayType(MapType) columns to rows on Spark DataFrame using scala example. Before we start, let's create a DataFrame with map column in an array.…

Continue Reading Spark – explode Array of Map to rows

Spark – explode Array of Struct to rows

Problem: How to explode Array of StructType DataFrame columns to rows using Spark. Solution: Spark explode function can be used to explode an Array of Struct ArrayType(StructType) columns to rows on Spark DataFrame using scala example. Before we start, let's create a DataFrame with Struct column in an array. From…

Continue Reading Spark – explode Array of Struct to rows

Spark explode array and map columns to rows

In this article, I will explain how to explode array or list and map DataFrame columns to rows using different Spark explode functions (explode, explore_outer, posexplode, posexplode_outer) with Scala example. While working with structured files like JSON, Parquet, Avro, and XML we often get data in collections like arrays, lists,…

Continue Reading Spark explode array and map columns to rows