PySpark ArrayType Column With Examples

PySpark pyspark.sql.types.ArrayType (ArrayType extends DataType class) is used to define an array data type column on DataFrame that holds the same type of elements, In this article, I will explain how to create a DataFrame ArrayType column using org.apache.spark.sql.types.ArrayType class and applying some SQL functions on the array columns with examples. While…

Continue Reading PySpark ArrayType Column With Examples

PySpark StructType & StructField Explained with Examples

PySpark StructType & StructField classes are used to programmatically specify the schema to the DataFrame and creating complex columns like nested struct, array and map columns. StructType is a collection of StructField's that defines column name, column data type, boolean to specify if the field can be nullable or not…

Continue Reading PySpark StructType & StructField Explained with Examples

Spark – Convert array to columns

Problem: How to convert a DataFrame array to multiple columns in Spark? Solution: Spark doesn't have any predefined functions to convert the DataFrame array column to multiple columns however, we can write a hack in order to convert. Below is a complete scala example which converts array and nested array…

Continue Reading Spark – Convert array to columns

Spark Schema – Explained with Examples

Spark Schema defines the structure of the DataFrame which you can get by calling printSchema() method on the DataFrame object. Spark SQL provides StructType & StructField classes to programmatically specify the schema. By default, Spark infers the schema from the data, however, sometimes we may need to define our own…

Continue Reading Spark Schema – Explained with Examples

Spark – Create a DataFrame with Array of Struct column

Problem: How to create a Spark DataFrame with Array of struct column using Spark and Scala? Using StructType and ArrayType classes we can create a DataFrame with Array of Struct column ( ArrayType(StructType) ). From below example column "booksInterested" is an array of StructType which holds "name", "author" and the…

Continue Reading Spark – Create a DataFrame with Array of Struct column

Spark – Define DataFrame with Nested Array

Problem: How to define Spark DataFrame using the nested array column (Array of Array)? Solution: Using StructType we can define an Array of Array (Nested Array) ArrayType(ArrayType(StringType)) DataFrame column using Scala example. The below example creates a DataFrame with a nested array column. From below example column "subjects" is an…

Continue Reading Spark – Define DataFrame with Nested Array

PySpark – explode nested array into rows

Problem: How to explode & flatten nested array (Array of Array) DataFrame columns into rows using PySpark. Solution: PySpark explode function can be used to explode an Array of Array (nested Array) ArrayType(ArrayType(StringType)) columns to rows on PySpark DataFrame using python example. Before we start, let’s create a DataFrame with a nested…

Continue Reading PySpark – explode nested array into rows

PySpark explode array and map columns to rows

In this article, I will explain how to explode array or list and map columns to rows using different PySpark DataFrame functions (explode, explore_outer, posexplode, posexplode_outer) with Python example. Before we start, let’s create a DataFrame with array and map fields, below snippet, creates a DF with columns “name” as…

Continue Reading PySpark explode array and map columns to rows

Spark – Flatten nested array to single array column

Problem: How to flatten the Array of Array or Nested Array DataFrame column into a single array column using Spark. Solution: Spark SQL provides flatten function to convert an Array of Array column (nested Array) ArrayType(ArrayType(StringType)) to single array column on Spark DataFrame using scala example. Related: How to flatten…

Continue Reading Spark – Flatten nested array to single array column

Spark – explode Array of Array (nested array) to rows

Problem: How to explode & flatten the Array of Array (Nested Array) DataFrame columns into rows using Spark. Solution: Spark explode function can be used to explode an Array of Array (Nested Array) ArrayType(ArrayType(StringType)) columns to rows on Spark DataFrame using scala example. Before we start, let's create a DataFrame…

Continue Reading Spark – explode Array of Array (nested array) to rows