Hive Create Partition Table Explained

The Hive partition table can be created using PARTITIONED BY clause of the CREATE TABLE statement. Use the partition key column along with the data type in PARTITIONED BY clause.

In this article you will learn what is Hive partition, why do we need partitions, its advantages, and finally how to create a partition table.

Why do we need Partitions?

Hive partition is a way to organize a large table into several smaller tables based on one or multiple columns (partition key, for example, date, state e.t.c).

The hive partition is similar to table partitioning available in SQL server or any other RDBMS database tables.

Partition eliminates creating smaller physical tables, accessing, and managing them separately.

When you load the data into the partition table, Hive internally splits the records based on the partition key and stores each partition data into a separate directory on HDFS. The name of the directory would be partition key and it’s value.

Also, note that while loading the data into the partition table, Hive eliminates the partition key from the actual loaded file on HDFS as it is redundant information and could be get from the partition folder name.

Create Hive Partition Table

To create a Hive table with partitions, you need to use PARTITIONED BY clause along with the column you wanted to partition and its type. Let’s create a partition table and load the CSV file into it.

To demonstrate partitions, I will be using a different dataset than I used before, you can download it from GitHub, It’s a simplified zipcodes codes where I have RecordNumber, Country, City, Zipcode, and State columns. I will be using State as a partition column.

Start your Hive beeline or Hive terminal and create the managed table as below.

CREATE TABLE zipcodes(
RecordNumber int,
Country string,
City string,
Zipcode int)
PARTITIONED BY(state string)

Describe Hive Partition Table

Let’s describe the Hive partition table we just created, describe command shows all partitions information

jdbc:hive2://> DESCRIBE zipcodes;

Yields below output. Notice the highlighted partition information for metadata of the partition columns.

hive create partition table

Use DESCRIBE FORMATTED for more information of the table partition.

jdbc:hive2://> DESCRIBE FORMATTED zipcodes;

Yields below output. Note the highlighted column names where it shows all partition column of the table and location where partitions will store.

hive create partition table

You can also create a partition table with multiple partition keys as shown below. Here I am partitioned with state and zipcode.

CREATE TABLE zipcodes_multiple(
RecordNumber int,
Country string,
City string)
PARTITIONED BY(state string, zipcode string)

Happy Learning !!

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing Hive Create Partition Table Explained