Spark – explode Array of Map to rows

Problem: How to explode the Array of Map DataFrame columns to rows using Spark. Solution: Spark explode function can be used to explode an Array of Map ArrayType(MapType) columns to rows on Spark DataFrame using scala example. Before we start, let's create a DataFrame with map column in an array.…

Continue Reading Spark – explode Array of Map to rows

Spark – explode Array of Struct to rows

Problem: How to explode Array of StructType DataFrame columns to rows using Spark. Solution: Spark explode function can be used to explode an Array of Struct ArrayType(StructType) columns to rows on Spark DataFrame using scala example. Before we start, let's create a DataFrame with Struct column in an array. From…

Continue Reading Spark – explode Array of Struct to rows

Spark ArrayType Column on DataFrame & SQL

Spark ArrayType (array) is a collection data type that extends DataType class, In this article, I will explain how to create a DataFrame ArrayType column using Spark SQL org.apache.spark.sql.types.ArrayType class and applying some SQL functions on the array column using Scala examples. While working with Spark structured (Avro, Parquet e.t.c) or semi-structured…

Continue Reading Spark ArrayType Column on DataFrame & SQL

Spark SQL Map functions – complete list

In this article, I will explain the usage of the Spark SQL map functions map(), map_keys(), map_values(), map_contact(), map_from_entries() on DataFrame column using Scala example.Though I've explained here with Scala, a similar method could be used to work Spark SQL map functions with PySpark and if time permits I will cover it in the future. If…

Continue Reading Spark SQL Map functions – complete list

Spark SQL StructType & StructField with examples

Spark SQL StructType & StructField classes are used to programmatically specify the schema to the DataFrame and creating complex columns like nested struct, array and map columns. StructType is a collection of StructField's. Using StructField we can define column name, column data type, nullable column (boolean to specify if the…

Continue Reading Spark SQL StructType & StructField with examples