H2OContext is an entry point to the Sparkling Water and this is used to connect to external H2O cluster or to create a standalone and local cluster. H2O Context class provides methods to transform or convert RDD/DataFrame to H2OFrame and H2OFrame to RDD/DataFrame. and also implicits conversions.
Creating H2OContext object
H2OContext is created using the builder method getOrCreate()
, this takes SparkSession object as a parameter and optionally H2OConf object.
val spark = SparkSession.builder()
.master("local[1]")
.appName("SparkByExample")
.getOrCreate();
val h2oContext = H2OContext.getOrCreate(spark)
Using H2OConf with H2O context
H2OConf (H2O Configuration) is a wrapper on SparkConf and inherits all properties from SparkConf and provides additional properties related to the H2O Sparkling Water cluster.
val h2oConf = new H2OConf(spark).setCloudName("CloudName1")
val h2oContext = H2OContext.getOrCreate(spark, h2oConf )
Above example provides a cloud name “CloudName1” to the H2O cluster, For more details please read Usage of H2O Configuration
asH2OFrame() – Convert DataFame to H2OFrame
Use asH2OFrame()
method of H2OContext to convert or transform Spark DataFrame to Sparkling water H2OFrame.
Syntax
asH2OFrame(df: DataFrame): H2OFrame
asH2OFrame(df: DataFrame, frameName: String)
Example
val zipCodes = "src/main/resources/small_zipcode.csv"
val zipCodesDF = spark.read.option("header", "true")
.option("inferSchema", "true")
.csv(zipCodes)
val h2oContext = H2OContext.getOrCreate(spark)
val h2oFrame = h2oContext.asH2OFrame(zipCodesDF)
asDataFrame() – Convert H2OFrame to DataFrame
Use asDataFrame()
method of H2OContext to convert or transform Sparkling water H2OFrame to Spark DataFrame.
Syntax
def asDataFrame[T <: Frame](fr: T, copyMetadata: Boolean = true): DataFrame
Example
val h2oContext = H2OContext.getOrCreate(spark)
//Creating H2OFrame
import java.io.File
val dataFile = "src/main/resources/small_zipcode.csv"
val zipH2OFrame = new H2OFrame(new File(dataFile))
//Convert H2OFrame to Spark DataFrame
val zipDF = h2oContext.asDataFrame(zipH2OFrame)
asH2OFrame() – Convert RDD to H2OFrame
asH2OFrame()
is an overloaded method, this takes Spark RDD as a parameter and returns H2OFrame.
asH2OFrame(rdd: SupportedRDD): H2OFrame
asH2OFrame(rdd: SupportedRDD, frameName: String)
asRDD() – H2OFrame to RDD
asRDD()
transforms H2O Frame to Spark RDD
asRDD[A <: Product : TypeTag : ClassTag](fr: H2OFrame)
asH2OFrameKeyString()
asH2OFrameKeyString()
method takes RDD/DataFrame as a parameter and transforms to H2OFrame and returns a string representation of H2O Frame Key.
//RDD
asH2OFrameKeyString(rdd: SupportedRDD): String
asH2OFrameKeyString(rdd: SupportedRDD, frameName: String): String
//DataFrame
asH2OFrameKeyString(df: DataFrame): String
asH2OFrameKeyString(df: DataFrame, frameName: String): String
toH2OFrameKey()
toH2OFrameKey()
method takes RDD/DataFrame as an input and transforms them into H2O frame key.
//RDD
toH2OFrameKey(rdd: SupportedRDD): Key[_]
toH2OFrameKey(rdd: SupportedRDD, frameName: String): Key[_]
//DataFrame
toH2OFrameKey(df: DataFrame): Key[Frame]
toH2OFrameKey(df: DataFrame, frameName: String): Key[Frame]
Other methods of Sparkling Water H2OContext
h2oLocalClient()
– Returns IP:Port of REST API of H2O client
setH2OClientLogLevel()
– Change the log level for the driver
get()
– Returns existing H2O context
getH2ONodes()
– Returns an array of nodes
stop()
– Stops H2O context when it’s running in automatic mode of external backend. In Internal mode, context shutdowns when Spark stop’s it’s executors.
asH2OFrame(fr: Frame): H2OFrame
– Creates a new H2OFrame from another Frame
Related Articles
- H2O Sparkling water Introduction
- Running Sparkling Water as External Backend
- Sparkling Water – H2OFrame
- Sparkling Water – H2OConf
- Install & Running Sparkling Water on Mac OS
- Install & Running Sparkling Water on Ubuntu
- Running Sparkling Water as Internal Backend