Use the LOAD DATA command to load the data files like CSV into Hive Managed or External table. In this article, I will explain how to load data files into a table using several examples.
Typically Hive Load command just moves the data from LOCAL
or HDFS location to Hive data warehouse location or any custom location without applying any transformations.
Hive LOAD Command Syntax
Below is a syntax of the Hive LOAD DATA
command.
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE]
INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
[INPUTFORMAT 'inputformat' SERDE 'serde']
Depending on the Hive version you are using, LOAD
syntax slightly changes. please refer to the Hive DML document.
filepath
– Supports absolute and relative paths. If you use optional clause LOCAL the specified filepath
would be referred from the server where hive beeline is running otherwise it would use the HDFS path.
LOCAL
– Use LOCAL if you have a file in the server where the beeline is running.
OVERWRITE
– It deletes the existing contents of the table and replaces with the new content.
PARTITION
– Loads data into specified partition.
INPUTFORMAT
– Specify Hive input format to load a specific file format into table, it takes text, ORC, CSV etc.
SERDE
– can be the associated Hive SERDE.
If you already have a table created by following Create Hive Managed Table article, skip to the next section.
CREATE TABLE IF NOT EXISTS emp.employee (
id int,
name string,
age int,
gender string )
COMMENT 'Employee Table'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
Note: In order to load the CSV comma-separated file to the Hive table, you need to create a table with ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Hive LOAD CSV File from HDFS
Hive LOAD DATA
statement is used to load the text, CSV, ORC file into Table. Load statement performs the same regardless of the table being Managed/Internal vs External
Now, let’s see how to load a data file into the Hive table we just created.
- Create a data file (for our example, I am creating a file with comma-separated columns)
- Now use the Hive
LOAD
command to load the file into the table.
LOAD DATA INPATH '/user/hive/data/data.csv' INTO TABLE emp.employee;
Note that after loading the data, the source file will be deleted from the source location, and the file loaded to the Hive data warehouse location or to the LOCATION
specified while creating a table.
hdfs dfs -ls /user/hive/warehouse/emp.db/employee/
-rw-r--r-- 1 hive supergroup 52 2020-10-09 19:29 /user/hive/warehouse/emp.db/employee/data.txt
Use SELECT
command to get the data from a table and confirm data loaded successfully without any issues.
SELECT * FROM emp.employee
LOAD CSV File from the LOCAL filesystem
Use LOCAL
optional clause to load CSV file from the local filesystem into the Hive table without uploading to HDFS.
LOAD DATA LOCAL INPATH '/home/hive/data.csv' INTO TABLE emp.employee;
Unlike loading from HDFS, source file from LOCAL
file system won’t be removed.
Use OVERWRITE clause
Use optional OVERWRITE
clause of the LOAD command to delete the contents of the target table and replaced it with the records from the file referred.
LOAD DATA LOCAL INPATH '/home/hive/data.csv' OVERWRITE INTO TABLE emp.employee;
Use PARTITION clause
If you have a partitioned table, use PARTITION
optional clause to load data into specific partitions of the table. you can also use OVERWRITE
to remove the contents of the partition and re-load.
LOAD DATA LOCAL INPATH '/home/hive/data.csv' OVERWRITE INTO TABLE emp.employee PARTITION(date=2020);
Use INSERT INTO
Like SQL, you can also use INSERT INTO
to insert rows into Hive table.
INSERT INTO emp.employee values(7,'scott',23,'M');
INSERT INTO emp.employee values(8,'raman',50,'M');
Happy Learning !!
Related Articles
- Hive Export Table into HDFS file
- How to Create Partitioned Hive Table
- How to Update or Drop a Hive Partitions
- How to Show All Partitions of a Table
- Hive DDL Commands Explained with Examples
- Hive – INSERT INTO vs INSERT OVERWRITE Explained
- Connect to Hive using JDBC connection
- How to Connect to Hive Using Beeline
- Hive Load Partitioned Table with Examples
- Hive – How to Show All Partitions of a Table?