You are currently viewing Hive Load Partitioned Table with Examples

Using INSERT INTO HiveQL statement you can Insert the data into Hive Partitioned Table and use LOAD DATA HiveQL statement to Load the CSV file into Hive Partitioned Table. In this article, I will explain how to insert or load files using examples.

If you already have a partitioned table created by following my Create Hive Partitioned Table article, skip to the next section.


CREATE TABLE zipcodes(
RecordNumber int,
Country string,
City string,
Zipcode int)
PARTITIONED BY(state string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

Hive LOAD File from HDFS into Partitioned Table

Use LOAD DATA HiveQL command to load the data from HDFS into a Hive Partition table. By default, HIVE considers the specified path as an HDFS location.

Let’s Download the zipcodes.CSV from GitHub, upload it to HDFS using the below command.


hdfs dfs -put zipcodes.csv /data/

Now run LOAD DATA command from Hive beeline to load into a partitioned table.

Note: Remember the partitioned column should be the last column on the file to loaded data into right partitioned column of the table.


jdbc:hive2://> LOAD DATA INPATH '/data/zipcodes.csv' INTO TABLE zipcodes;

If your partition column is not at the end then you need to do following.

  • Create another table without partition.
  • Load data into the table (Assume state is at first column).
  • Insert into the partitioned table by selecting columns from the non-partitioned table (make sure you select state at the end).

1) CREATE TABLE zipcodes_tmp(State string,RecordNumber int,Country string,City string,Zipcode int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
2) LOAD DATA INPATH '/data/zipcodes.csv' INTO TABLE zipcodes_tmp;
3) INSERT OVERWRITE TABLE zipcodes PARTITION(state) SELECT RecordNumber,Country,City,Zipcode,State from  zipcodes_tmp;

Hive LOAD File from LOCAL to Partitioned Table

Below examples loads the local file into partitioned table.


jdbc:hive2://> LOAD DATA LOCAL INPATH '/tmp/zipcodes.csv' INTO TABLE zipcodes;

INSERT Data into Partition Table

You can also use INSERT INTO to insert data into the Hive partitioned table. Insert into just appends the data into the specified partition. If a partition doesn’t exist, it dynamically creates the partition and inserts the data into the partition.


INSERT INTO zipcodes VALUES (891,'US','TAMPA',33605,'FL');

Alternatively, you can also specify the PARTITION clause while insert data into Hive partition table. When using the PARTITION clause, you should not have partition column data in VALUES.


INSERT INTO zipcodes PARTITION(state='FL') VALUES (891,'US','TAMPA',33605);

Use INSERT OVERWRITE TABLE to delete existing data in a partition and load with new data.


INSERT OVERWRITE TABLE zipcodes PARTITION(state='NA') VALUES (896,'US','TAMPA',33607);

This removes the data from NA partition and loads with new records.

Conclusion

In summary, LOAD DATA HiveQL command is used to load the file into a hive existing or new partition of the table, use INSERT INTO to insert specific rows into a partition, and finally, use INSERT OVERWRITE to overwrite the partition with the new rows.

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply