In this H2O Sparkling Water tutorial, you will learn how to convert an H2OFrame into Spark Dataframe, H2OFrame is a primary data store for H2O and it is similar to Spark Dataframe difference being it’s not held in memory instead it stores in H2O cluster.
While working with H20 Sparkling Water we often need to convert H20Frame into Spark DataFrame and vice versa, In this tutorial you will learn different ways to convert H2OFrame into Spark DataFrame
Create H2OContext
First, let’s create an H2OConext object by passing the SparkSession object as an argument to getOrCreate()
method, we would need H2O context object in order to create an H2O frame.
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
val h2oContext = H2OContext.getOrCreate(spark)
Create H2OFrame
Now, let’s create an H2OFrame by reading a CSV file.
//Creating H20Frame
import java.io.File
val dataFile = "src/main/resources/small_zipcode.csv"
val zipH2OFrame = new H2OFrame(new File(dataFile))
Convert H2OFrame into Spark DataFrame
//Convert H20Frame to Spark DataFrame
val zipDF = h2oContext.asDataFrame(zipH2OFrame)
zipDF.printSchema()
zipDF.show(false)
Yields below output. For more options on CSV read Spark read CSV file
root
|-- id: integer (nullable = true)
|-- zipcode: integer (nullable = true)
|-- type: string (nullable = true)
|-- city: string (nullable = true)
|-- state: string (nullable = true)
|-- population: integer (nullable = true)
+---+-------+--------+-------------------+-----+----------+
|id |zipcode|type |city |state|population|
+---+-------+--------+-------------------+-----+----------+
|1 |704 |STANDARD|null |PR |30100 |
|2 |704 |null |PASEO COSTA DEL SUR|PR |null |
|3 |709 |null |BDA SAN LUIS |PR |3700 |
|4 |76166 |UNIQUE |CINGULAR WIRELESS |TX |84000 |
|5 |76177 |STANDARD|null |TX |null |
+---+-------+--------+-------------------+-----+----------+
Complete Example on Converting H2OFrame into Spark DataFrame
package com.sparkbyexamples.spark
import org.apache.spark.h2o.{H2OContext, H2OFrame}
import org.apache.spark.sql.SparkSession
object H2OFrameToDataFrame extends App {
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
val h2oContext = H2OContext.getOrCreate(spark)
//Creating H20Frame
import java.io.File
val dataFile = "src/main/resources/small_zipcode.csv"
val zipH2OFrame = new H2OFrame(new File(dataFile))
//Convert H20Frame to Spark DataFrame
val zipDF = h2oContext.asDataFrame(zipH2OFrame)
zipDF.printSchema()
zipDF.show(false)
}
This example along with dependencies also available at GitHub project.
Conclusion
In this article, you have learned how to create an H2OFrame and convert H2OFrame into Spark DataFrame.
Happy Learning !!