• Post author:
  • Post category:Apache Hive
  • Post last modified:January 12, 2023
  • Reading time:9 mins read
You are currently viewing How to Update or Drop a Hive Partition?

Hive ALTER TABLE command is used to update or drop a partition from a Hive Metastore and HDFS location (managed table). You can also manually update or drop a Hive partition directly on HDFS using Hadoop commands, if you do so you need to run the MSCK command to synch up HDFS files with Hive Metastore.

Related Articles

Hive partition breaks the table into multiple tables (on HDFS multiple subdirectories) based on the partition key. Partition key could be one or multiple columns. For each distinct value of the partition key, a subdirectory will be created on HDFS.

Let’s say you have a large table with a state column and you often required to run analytics-related queries for each state hence, the state column is qualified to be a partition column.

In order to explain update and drop Hive partition let’s assume you have a zipcodes table with the below data.


+--------------+---------+-------+
| recordnumber | zipcode | state |
+--------------+---------+-------+
| 54355        | 35146   | AL    |
| 54356        | 35585   | AL    |
| 39827        | 85209   | AZ    |
| 39828        | 85210   | AZ    |
| 49348        | 34487   | FL    |
| 76512        | 27203   | NC    |
| 76513        | 27204   | NC    |
| 4            | 704     | PR    |
| 3            | 704     | PR    |
| 61391        | 76166   | TX    |
+--------------+---------+-------+

Update Hive Partition

You can use the Hive ALTER TABLE command to change the HDFS directory location of a specific partition. The below example update the state=NC partition location from the default Hive store to a custom location /data/state=NC.


jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes PARTITION(state='NC') 
SET LOCATION '/data/state=NC';

Rename Hive Partition

You can also use ALTER TABLE with PARTITION RENAME to rename the Hive partition. The below example rename partition state=’AL’ to state=’NY’


jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes PARTITION (state='AL') RENAME TO PARTITION (state='NY');

Manually Renaming Partitions on HDFS

Alternatively, you can also rename the partition directory on the HDFS. let’s rename partition state=’NY’ back to it’s original state=’AL’


hdfs dfs -mv /user/hive/warehouse/zipcodes/state=NY /user/hive/warehouse/zipcodes/state=AL

When you manually modify the partitions directly on HDFS, you need to run MSCK REPAIR TABLE to update the Hive Metastore. Not doing so will result in inconsistent results.

  • SELECT doesn’t show the renamed partition
  • <a href="https://sparkbyexamples.com/apache-hive/hive-show-all-table-partitions/">SHOW PARTITIONS</a> still shows the older partition

jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+------------+
| partition  |
+------------+
| state=AZ   |
| state=FL   |
| state=NC   |
| state=NY   |
| state=PR   |
| state=TX   |
+------------+

If you notice above, it still showing partition state=NY, to correct this run MSCK REPAIR TABLE


jdbc:hive2://127.0.0.1:10000>MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;

Above command synchronize zipcodes table on Hive Metastore. Now run the show partition command which shows state=AL partition.

Drop or Delete Hive Partition

Hive drop or delete partition is performed using ALTER TABLE tablename DROP command. Dropping a partition from a table removes the data from HDFS and from Hive Metastore.


jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes DROP IF EXISTS PARTITION (state='AL');

When dropping a partition that doesn’t exist, it returns an error. This behavior can be changed using IF EXISTS optional clause; where it executes the drop partitions only when the partition present in the Hive table.

If Trash is configured by setting true to hive.warehouse.data.skipTrash property, dropping a Hive partition moves the partition data to users .Trash directory. You can recover this data post drop if needed.

Note: Data moving to .Trash directory happens only for Internal/Managed table. For the external table, DROP partition just removes the partition from Hive Metastore and the partition is still present on HDFS. You need to run explicitly hadoop fs -rm commnad to remove the partition from HDFS.

Using PURGE

You can use the PURGE option to not move the data to .Trash directory, the data will be permanently removed and it can not be recovered.


jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes DROP IF EXISTS PARTITION (state='AL') PURGE;

Above example permanently drops state=AL partition

Manually Dropping Partitions on HDFS

You can also delete the partition directly from HDFS using below command.


user@namenode:~/hive$ hdfs dfs -rm -R /user/hive/warehouse/zipcodes/state=AL
Deleted /user/hive/warehouse/zipcodes/state=AL

Running SELECT command on the table doesn’t show the records from removed partitions, however, SHOW PARTITIONS still shows the deleted partitions.


jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
|             partition             |
+-----------------------------------+
| state=AL                          |
| state=AZ                          |
| state=FL                          |
| state=NC                          |
| state=PR                          |
| state=TX                          |
+-----------------------------------+
6 rows selected (0.096 seconds)

In order to fix this, you need to run MSCK REPAIR TABLE as shown below.


jdbc:hive2://127.0.0.1:10000> MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
|             partition             |
+-----------------------------------+
| state=AZ                          |
| state=FL                          |
| state=NC                          |
| state=PR                          |
| state=TX                          |
+-----------------------------------+
5 rows selected (0.096 seconds)

Conclusion

In this article, you have learned how to update, drop or delete hive partition using ALTER TABLE command, and also learned using SHOW PARTITIONS to show the partitions of the table, using MSCK REPAIR to synch Hive Metastore with the HDFS data.

Reference

Prabha

Prabha is an accomplished data engineer with a wealth of experience in architecting, developing, and optimizing data pipelines and infrastructure. With a strong foundation in software engineering and a deep understanding of data systems, Prabha excels in building scalable solutions that handle diverse and large datasets efficiently. At SparkbyExamples.com Prabha writes her experience in Spark, PySpark, Python and Pandas.

Leave a Reply