H2O Sparkling water Introduction

Sparkling Water contains the same features and functionality as H2O and it enables users to run H2O machine learning algorithms API on top of the Spark cluster allowing H2O to benefit from Spark capabilities like fast, scalable and distributed in-memory processing.

Sparling Water also enables users to run H2O Machine Learning models using Java, Scala, R and Python languages.

Integrating these two open-source environments (Spark & H2O) provides a seamless experience for users who want to make a query using Spark SQL, feed the results into H2O to build a model and make predictions, and then use the results again in Spark. For any given problem, better interoperability between tools provides a better experience.

– H2O Sparkling Water
Sparkling Water  Architecture
Source: H2O.ai

Installing & Running Sparkling Water Shell on Windows

In order to run Sparkling Shell, you need to have an Apache Spark installed on your computer and have the SPARK_HOME environment variable set to the Spark home directory. If you do not have it installed, download it from here, unzip and set SPARK_HOME environment variable to your Spark directory.

Now, download H2O Sparkling Water and unzip the downloaded file. In my case, I’ve download Sparkling Water version 3.28 which supports Spark 2.4.4 and unzip into C:\apps\opt\sparkling-water

cd C:\apps\opt\sparkling-water\bin

  Spark master (MASTER)     : local[*]
  Spark home   (SPARK_HOME) : C:\apps\opt\spark-2.4.4-bin-hadoop2.7
  H2O build version         : (yu)
  Spark build version       : 2.4.4
  Scala version             : 2.11

20/02/13 07:34:48 WARN NativeCodeLoader: Unable to load native-hadoop library fo
r your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLeve
Spark context Web UI available at http://DELL-ESUHAO2KAJ:4040
Spark context available as 'sc' (master = local[*], app id = local-1581608102876
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191)
Type in expressions to have them evaluated.
Type :help for more information.


Now let’s create H2OContext by taking SparkSession object “spark” as a parameter, This creates an H2O Cloud inside the Spark Cluster.

scala> import org.apache.spark.h2o._
import org.apache.spark.h2o._

scala> val h2oContext = H2OContext.getOrCreate(spark)
h2oContext: org.apache.spark.h2o.H2OContext =

Sparkling Water Context:
 * Sparkling Water Version:
 * H2O name: sparkling-water-prabha_local-1581608102876
 * cluster size: 1
 * list of used nodes:
  (executorId, host, port)

  Open H2O Flow in browser: (CMD + click in Mac OSX)


This also runs an H2O Flow web UI interface to interact with H2O. Open H2O Flow in browser: (change the IP address to your system IP)

Sparkling Water H20 Flow
H2O Flow

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing H2O Sparkling water Introduction
H20 Sparkling Water Introduction