You are currently viewing H2O Sparkling Water Installation on Windows

In this tutorial, you will learn how to install H2O Sparkling Water on Windows and running H2O sparkling-shell and H2O Flow web interface. In order to run Sparkling Water, you need to have an Apache Spark installed on your computer.

Sparkling Water enables users to run H2O machine learning algorithms on the Spark cluster which allows H2O to benefit from Spark capabilities like fast, scalable and distributed in-memory processing.

First, download Apache Spark, unzip the binary to a directory on your computer and have the SPARK_HOME environment variable set to the Spark home directory. I’ve downloaded spark-2.4.4-bin-hadoop2.7 version, Depending on when you reading this download the latest version available and the steps should not have changed much.

Now, download H2O Sparkling Water and unzip the downloaded file. In my case, I’ve download Sparkling Water version 3.28 which supports Spark 2.4.4 and unzip into C:\apps\opt\sparkling-water

After successfully installation, open a command line on windows and change directory to your sparkling water bin directory. In my case C:\apps\opt\sparkling-water\bin.

To start Sparkling shell, enter sparkling-shell on the command line and press enter which outputs something like below. This also initializes Spark Context with Web UI available at http://192.168.56.1:4040 (change IP address to your system IP)


cd C:\apps\opt\sparkling-water\bin
C:\apps\opt\sparkling-water\bin>sparkling-shell

-----
  Spark master (MASTER)     : local[*]
  Spark home   (SPARK_HOME) : C:\apps\opt\spark-2.4.4-bin-hadoop2.7
  H2O build version         : 3.28.0.3 (yu)
  Spark build version       : 2.4.4
  Scala version             : 2.11
----

20/02/13 07:34:48 WARN NativeCodeLoader: Unable to load native-hadoop library fo
r your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLeve
l(newLevel).
Spark context Web UI available at http://DELL-ESUHAO2KAJ:4040
Spark context available as 'sc' (master = local[*], app id = local-1581608102876
).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Now let’s create H2OContext by taking SparkSession object “spark” as a parameter, This creates an H2O Cloud inside the Spark Cluster.


scala> import org.apache.spark.h2o._
import org.apache.spark.h2o._

scala> val h2oContext = H2OContext.getOrCreate(spark)
h2oContext: org.apache.spark.h2o.H2OContext =

Sparkling Water Context:
 * Sparkling Water Version: 3.28.0.3-1-2.4
 * H2O name: sparkling-water-prabha_local-1581608102876
 * cluster size: 1
 * list of used nodes:
  (executorId, host, port)
  ------------------------
  (driver,192.168.56.1,54321)
  ------------------------

  Open H2O Flow in browser: http://192.168.56.1:54321 (CMD + click in Mac OSX)

scala>

This also runs an H2O Flow web UI interface to interact with H2O. Open H2O Flow in browser: http://192.168.56.1:54321 (change the IP address to your system IP)

Sparkling Water H2O Flow

Conclusion

In this article, you have learned to install H2O Sparkling Water on Windows OS and running sparkling-shell and finally created H2OContext where you can access the H2O Flow web UI interface.

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium