Apache Hive Tutorial with Examples

Note: Work in progress where you will see more articles coming in the near future.

What is Apache Hive?

Apache Hive is an open-source data warehouse solution for Hadoop infrastructure. It is used to process structured data of large datasets and provides a way to run HiveQL queries.

What not?

  • Hive not designed for OLTP processing
  • It’s not a relational database (RDBMS)
  • Not used for row-level updates for real-time systems.

Apache Hive Advantages?

  • Supports large datasets
  • Runs on Hadoop infrastructure which uses commodity hardware
  • Supports SQL syntax
  • Provides Beeline client which is used to connect from Java, Scala, C#, Python, and many more languages.

Different ways to process Hive data

  • Map-reduce application
  • Pig scripts
  • HiveQL

Hive Installation

Start HiveServer2 & Connect Beeline

Hive Clients

HiveQL DDL Commands

HiveQL DML Commands

Hive Partition and Bucket

Hive Java Examples

Hive Scala Examples

Hive Spark Examples

Hive PySpark Examples

Hive Error or Exceptions