You are currently viewing Hive Create Partition Table Explained

The Hive partition table can be created using PARTITIONED BY clause of the CREATE TABLE statement. Use the partition key column along with the data type in PARTITIONED BY clause.

In this article you will learn what is Hive partition, why do we need partitions, its advantages, and finally how to create a partition table.

Why do we need Partitions?

Hive partition is a way to organize a large table into several smaller tables based on one or multiple columns (partition key, for example, date, state e.t.c).

The hive partition is similar to table partitioning available in SQL server or any other RDBMS database tables.

Partition eliminates creating smaller physical tables, accessing, and managing them separately.

When you load the data into the partition table, Hive internally splits the records based on the partition key and stores each partition data into a separate directory on HDFS. The name of the directory would be partition key and it’s value.

Also, note that while loading the data into the partition table, Hive eliminates the partition key from the actual loaded file on HDFS as it is redundant information and could be get from the partition folder name.

Create Hive Partition Table

To create a Hive table with partitions, you need to use PARTITIONED BY clause along with the column you wanted to partition and its type. Let’s create a partition table and load the CSV file into it.

To demonstrate partitions, I will be using a different dataset than I used before, you can download it from GitHub, It’s a simplified zipcodes codes where I have RecordNumber, Country, City, Zipcode, and State columns. I will be using State as a partition column.

Start your Hive beeline or Hive terminal and create the managed table as below.

CREATE TABLE zipcodes(
RecordNumber int,
Country string,
City string,
Zipcode int)
PARTITIONED BY(state string)

Describe Hive Partition Table

Let’s describe the Hive partition table we just created, describe command shows all partitions information

jdbc:hive2://> DESCRIBE zipcodes;

Yields below output. Notice the highlighted partition information for metadata of the partition columns.

hive create partition table

Use DESCRIBE FORMATTED for more information of the table partition.

jdbc:hive2://> DESCRIBE FORMATTED zipcodes;

Yields below output. Note the highlighted column names where it shows all partition column of the table and location where partitions will store.

hive create partition table

You can also create a partition table with multiple partition keys as shown below. Here I am partitioned with state and zipcode.

CREATE TABLE zipcodes_multiple(
RecordNumber int,
Country string,
City string)
PARTITIONED BY(state string, zipcode string)

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply