In this H2O Sparkling Water tutorial, you will learn how to convert or transform an H2OFrame into Spark SQL Dataframe, H2O Frame is a primary data store for H2O and it is similar to Spark Dataframe difference being it’s not held in memory instead it stores in H2O cluster.
Here, we will create a Spark DataFrame and convert it to Sparkling Water H2OFrame using asH2OFrame()
method of the H2OContext object.
Create SparkSession object
First, let’s create a SparkSession object.
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
Create Spark DataFrame
Using SparkSession object “spark” read a CSV file into DataFrame. Below example creates a Spark DataFrame “zipCodesDF
“
val zipCodes = "src/main/resources/small_zipcode.csv"
val zipCodesDF = spark.read.option("header", "true")
.option("inferSchema", "true")
.csv(zipCodes)
zipCodesDF.printSchema()
zipCodesDF.show(false)
Yields below output. For more options on CSV read Spark read CSV file
root
|-- id: integer (nullable = true)
|-- zipcode: integer (nullable = true)
|-- type: string (nullable = true)
|-- city: string (nullable = true)
|-- state: string (nullable = true)
|-- population: integer (nullable = true)
+---+-------+--------+-------------------+-----+----------+
|id |zipcode|type |city |state|population|
+---+-------+--------+-------------------+-----+----------+
|1 |704 |STANDARD|null |PR |30100 |
|2 |704 |null |PASEO COSTA DEL SUR|PR |null |
|3 |709 |null |BDA SAN LUIS |PR |3700 |
|4 |76166 |UNIQUE |CINGULAR WIRELESS |TX |84000 |
|5 |76177 |STANDARD|null |TX |null |
+---+-------+--------+-------------------+-----+----------+
Create H2OContext object
Now, Let’s create an H2OContext object by passing the spark session object as an argument as we would need H2O context in order to convert.
val h2oContext = H2OContext.getOrCreate(spark)
Convert Spark DataFrame into H2OFrame
H2OContext provides asH2OFrame()
which takes Spark DataFrame object as a parameter and converts to Sparkling Water H2OFrame.
val h2OFrame = h2oContext.asH2OFrame(zipCodesDF)
let’s see a few operations on H2OFrame, for example,
h2OFrame.names()
returns all column names of the H2OFrame.
This returns id
,zipcode
,type
,city
,state
,population
h2OFrame.numRows()
– Returns the number of rows in an H2OFrame
h2OFrame.rename()
– Renames the column names.
h2OFrame.rename("zipcode","postcode")
println(h2OFrame.names().mkString(","))
id,postcode,type,city,state,population
Complete Example on Converting Spark DataFrame into H2OFrame
package com.sparkbyexamples.spark
import org.apache.spark.h2o.H2OContext
import org.apache.spark.sql.SparkSession
object H2OFrameFromDataFrame extends App {
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
val zipCodes = "src/main/resources/small_zipcode.csv"
val zipCodesDF = spark.read.option("header", "true")
.option("inferSchema", "true")
.csv(zipCodes)
val h2oContext = H2OContext.getOrCreate(spark)
val h2oFrame = h2oContext.asH2OFrame(zipCodesDF)
println(h2oFrame._names.mkString(","))
println(h2oFrame.names().mkString(","))
println(h2oFrame.numRows()) // returns 5
println(h2oFrame.numCols()) // returns 6
h2oFrame.rename("zipcode","postcode")
println(h2oFrame.names().mkString(","))
}
This example along with dependencies is also available at GitHub project.
Conclusion
In this article, you have learned how to create an H2OContext and what is H2OFrame and finally converting Spark SQL DataFrame to Sparkling Water H2OFrame.
Happy Learning !!
Related Articles
- Convert H2OFrame into Spark DataFrame
- Find Maximum Row per Group in Spark DataFrame
- Spark DataFrame – Fetch More Than 20 Rows & Column Full Value
- Spark Dataframe – Show Full Column Contents?
- Spark DataFrame Cache and Persist Explained
- Spark DataFrame Where Filter | Multiple Conditions
- Spark DataFrame Select First Row of Each Group?
- Spark DataFrame withColumn
- Spark Merge Two DataFrames with Different Columns or Schema
Hi Serina, Here you go.. https://sparkbyexamples.com/h2o-sparkling-water-tutorial-beginners/
Bro, what is this sparkling H2O concept n why we are using it, fst please share the basic information about it.