You are currently viewing Convert H2OFrame into Spark DataFrame

In this H2O Sparkling Water tutorial, you will learn how to convert an H2OFrame into Spark Dataframe, H2OFrame is a primary data store for H2O and it is similar to Spark Dataframe difference being it’s not held in memory instead it stores in H2O cluster.

Advertisements

While working with H20 Sparkling Water we often need to convert H20Frame into Spark DataFrame and vice versa, In this tutorial you will learn different ways to convert H2OFrame into Spark DataFrame

Create H2OContext

First, let’s create an H2OConext object by passing the SparkSession object as an argument to getOrCreate() method, we would need H2O context object in order to create an H2O frame.


val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExample")
    .getOrCreate();
val h2oContext = H2OContext.getOrCreate(spark)

Create H2OFrame

Now, let’s create an H2OFrame by reading a CSV file.


  //Creating H20Frame
  import java.io.File
  val dataFile = "src/main/resources/small_zipcode.csv"
  val zipH2OFrame = new H2OFrame(new File(dataFile))

Convert H2OFrame into Spark DataFrame


  //Convert H20Frame to Spark DataFrame
  val zipDF = h2oContext.asDataFrame(zipH2OFrame)
  zipDF.printSchema()
  zipDF.show(false)

Yields below output. For more options on CSV read Spark read CSV file


root
 |-- id: integer (nullable = true)
 |-- zipcode: integer (nullable = true)
 |-- type: string (nullable = true)
 |-- city: string (nullable = true)
 |-- state: string (nullable = true)
 |-- population: integer (nullable = true)

+---+-------+--------+-------------------+-----+----------+
|id |zipcode|type    |city               |state|population|
+---+-------+--------+-------------------+-----+----------+
|1  |704    |STANDARD|null               |PR   |30100     |
|2  |704    |null    |PASEO COSTA DEL SUR|PR   |null      |
|3  |709    |null    |BDA SAN LUIS       |PR   |3700      |
|4  |76166  |UNIQUE  |CINGULAR WIRELESS  |TX   |84000     |
|5  |76177  |STANDARD|null               |TX   |null      |
+---+-------+--------+-------------------+-----+----------+

Complete Example on Converting H2OFrame into Spark DataFrame


package com.sparkbyexamples.spark

import org.apache.spark.h2o.{H2OContext, H2OFrame}
import org.apache.spark.sql.SparkSession

object H2OFrameToDataFrame extends App {

  val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExample")
    .getOrCreate();

  val h2oContext = H2OContext.getOrCreate(spark)

  //Creating H20Frame
  import java.io.File
  val dataFile = "src/main/resources/small_zipcode.csv"
  val zipH2OFrame = new H2OFrame(new File(dataFile))

  //Convert H20Frame to Spark DataFrame
  val zipDF = h2oContext.asDataFrame(zipH2OFrame)

  zipDF.printSchema()
  zipDF.show(false)

}

This example along with dependencies also available at GitHub project.

Conclusion

In this article, you have learned how to create an H2OFrame and convert H2OFrame into Spark DataFrame.

Happy Learning !!