• Post author:
  • Post category:Apache Hive
  • Post last modified:May 7, 2024
  • Reading time:9 mins read
You are currently viewing How to Update or Drop a Hive Partition?

Hive ALTER TABLE command is used to update or drop a partition from a Hive Metastore and HDFS location (managed table). You can also manually update or drop a Hive partition directly on HDFS using Hadoop commands, if you do so you need to run the MSCK command to synch up HDFS files with Hive Metastore.

Advertisements

Related Articles

Hive partition breaks the table into multiple tables (on HDFS multiple subdirectories) based on the partition key. Partition key could be one or multiple columns. For each distinct value of the partition key, a subdirectory will be created on HDFS.

Let’s say you have a large table with a state column and you often required to run analytics-related queries for each state hence, the state column is qualified to be a partition column.

In order to explain update and drop Hive partition let’s assume you have a zipcodes table with the below data.


+--------------+---------+-------+
| recordnumber | zipcode | state |
+--------------+---------+-------+
| 54355        | 35146   | AL    |
| 54356        | 35585   | AL    |
| 39827        | 85209   | AZ    |
| 39828        | 85210   | AZ    |
| 49348        | 34487   | FL    |
| 76512        | 27203   | NC    |
| 76513        | 27204   | NC    |
| 4            | 704     | PR    |
| 3            | 704     | PR    |
| 61391        | 76166   | TX    |
+--------------+---------+-------+

Update Hive Partition

You can use the Hive ALTER TABLE command to change the HDFS directory location of a specific partition. The below example update the state=NC partition location from the default Hive store to a custom location /data/state=NC.


jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes PARTITION(state='NC') 
SET LOCATION '/data/state=NC';

Rename Hive Partition

You can also use ALTER TABLE with PARTITION RENAME to rename the Hive partition. The below example rename partition state=’AL’ to state=’NY’


jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes PARTITION (state='AL') RENAME TO PARTITION (state='NY');

Manually Renaming Partitions on HDFS

Alternatively, you can also rename the partition directory on the HDFS. let’s rename partition state=’NY’ back to it’s original state=’AL’


hdfs dfs -mv /user/hive/warehouse/zipcodes/state=NY /user/hive/warehouse/zipcodes/state=AL

When you manually modify the partitions directly on HDFS, you need to run MSCK REPAIR TABLE to update the Hive Metastore. Not doing so will result in inconsistent results.

  • SELECT doesn’t show the renamed partition
  • SHOW PARTITIONS still shows the older partition

jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+------------+
| partition  |
+------------+
| state=AZ   |
| state=FL   |
| state=NC   |
| state=NY   |
| state=PR   |
| state=TX   |
+------------+

If you notice above, it still showing partition state=NY, to correct this run MSCK REPAIR TABLE


jdbc:hive2://127.0.0.1:10000>MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;

Above command synchronize zipcodes table on Hive Metastore. Now run the show partition command which shows state=AL partition.

Drop or Delete Hive Partition

Hive drop or delete partition is performed using ALTER TABLE tablename DROP command. Dropping a partition from a table removes the data from HDFS and from Hive Metastore.


jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes DROP IF EXISTS PARTITION (state='AL');

If you attempt to drop a partition that does not exist, an error will be returned. However, you can alter this behavior using the optional IF EXISTS clause. With this clause, the drop partitions command will only execute if the partition exists in the Hive table.

If the Trash feature is enabled by setting hive.warehouse.data.skipTrash property to true, dropping a Hive partition relocates the partition data to the user’s .Trash directory. This allows you to recover the data if needed after the drop operation.

Note: Data moving to .Trash directory happens only for Internal/Managed table. For the external table, DROP partition just removes the partition from Hive Metastore and the partition is still present on HDFS. You need to run explicitly hadoop fs -rm commnad to remove the partition from HDFS.

Using PURGE

By utilizing the PURGE option, you can prevent the data from being moved to the .Trash directory. Instead, it will be permanently deleted, and recovery will not be possible.


jdbc:hive2://127.0.0.1:10000> ALTER TABLE zipcodes DROP IF EXISTS PARTITION (state='AL') PURGE;

Above example permanently drops state=AL partition

Manually Dropping Partitions on HDFS

You can also delete the partition directly from HDFS using below command.


user@namenode:~/hive$ hdfs dfs -rm -R /user/hive/warehouse/zipcodes/state=AL
Deleted /user/hive/warehouse/zipcodes/state=AL

Running SELECT command on the table doesn’t show the records from removed partitions, however, SHOW PARTITIONS still shows the deleted partitions.


jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
|             partition             |
+-----------------------------------+
| state=AL                          |
| state=AZ                          |
| state=FL                          |
| state=NC                          |
| state=PR                          |
| state=TX                          |
+-----------------------------------+
6 rows selected (0.096 seconds)

In order to fix this, you need to run MSCK REPAIR TABLE as shown below.


jdbc:hive2://127.0.0.1:10000> MSCK REPAIR TABLE zipcodes SYNC PARTITIONS;
jdbc:hive2://127.0.0.1:10000> SHOW PARTITIONS zipcodes;
+-----------------------------------+
|             partition             |
+-----------------------------------+
| state=AZ                          |
| state=FL                          |
| state=NC                          |
| state=PR                          |
| state=TX                          |
+-----------------------------------+
5 rows selected (0.096 seconds)

Conclusion

In this article, you have learned how to update, drop or delete hive partition using ALTER TABLE command, and also learned using SHOW PARTITIONS to show the partitions of the table, using MSCK REPAIR to synch Hive Metastore with the HDFS data.

Reference

Leave a Reply