You are currently viewing Convert H2OFrame into Spark DataFrame

In this H2O Sparkling Water tutorial, you will learn how to convert an H2OFrame into Spark Dataframe, H2OFrame is a primary data store for H2O and it is similar to Spark Dataframe difference being it’s not held in memory instead it stores in H2O cluster.

While working with H20 Sparkling Water we often need to convert H20Frame into Spark DataFrame and vice versa, In this tutorial you will learn different ways to convert H2OFrame into Spark DataFrame

Create H2OContext

First, let’s create an H2OConext object by passing the SparkSession object as an argument to getOrCreate() method, we would need H2O context object in order to create an H2O frame.


val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExample")
    .getOrCreate();
val h2oContext = H2OContext.getOrCreate(spark)

Create H2OFrame

Now, let’s create an H2OFrame by reading a CSV file.


  //Creating H20Frame
  import java.io.File
  val dataFile = "src/main/resources/small_zipcode.csv"
  val zipH2OFrame = new H2OFrame(new File(dataFile))

Convert H2OFrame into Spark DataFrame


  //Convert H20Frame to Spark DataFrame
  val zipDF = h2oContext.asDataFrame(zipH2OFrame)
  zipDF.printSchema()
  zipDF.show(false)

Yields below output. For more options on CSV read Spark read CSV file


root
 |-- id: integer (nullable = true)
 |-- zipcode: integer (nullable = true)
 |-- type: string (nullable = true)
 |-- city: string (nullable = true)
 |-- state: string (nullable = true)
 |-- population: integer (nullable = true)

+---+-------+--------+-------------------+-----+----------+
|id |zipcode|type    |city               |state|population|
+---+-------+--------+-------------------+-----+----------+
|1  |704    |STANDARD|null               |PR   |30100     |
|2  |704    |null    |PASEO COSTA DEL SUR|PR   |null      |
|3  |709    |null    |BDA SAN LUIS       |PR   |3700      |
|4  |76166  |UNIQUE  |CINGULAR WIRELESS  |TX   |84000     |
|5  |76177  |STANDARD|null               |TX   |null      |
+---+-------+--------+-------------------+-----+----------+

Complete Example on Converting H2OFrame into Spark DataFrame


package com.sparkbyexamples.spark

import org.apache.spark.h2o.{H2OContext, H2OFrame}
import org.apache.spark.sql.SparkSession

object H2OFrameToDataFrame extends App {

  val spark = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExample")
    .getOrCreate();

  val h2oContext = H2OContext.getOrCreate(spark)

  //Creating H20Frame
  import java.io.File
  val dataFile = "src/main/resources/small_zipcode.csv"
  val zipH2OFrame = new H2OFrame(new File(dataFile))

  //Convert H20Frame to Spark DataFrame
  val zipDF = h2oContext.asDataFrame(zipH2OFrame)

  zipDF.printSchema()
  zipDF.show(false)

}

This example along with dependencies also available at GitHub project.

Conclusion

In this article, you have learned how to create an H2OFrame and convert H2OFrame into Spark DataFrame.

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium