Use the LOAD DATA command to load the data files like CSV into Hive Managed or External table. In this article, I will explain how to load data files into a table using several examples.
Typically Hive Load command just moves the data from
LOCAL or HDFS location to Hive data warehouse location or any custom location without applying any transformations.
Hive LOAD Command Syntax
Below is a syntax of the Hive
LOAD DATA command.
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde']
Depending on the Hive version you are using,
LOAD syntax slightly changes. please refer to the Hive DML document.
filepath – Supports absolute and relative paths. If you use optional clause LOCAL the specified
filepath would be referred from the server where hive beeline is running otherwise it would use the HDFS path.
LOCAL – Use LOCAL if you have a file in the server where the beeline is running.
OVERWRITE – It deletes the existing contents of the table and replaces with the new content.
PARTITION – Loads data into specified partition.
INPUTFORMAT – Specify Hive input format to load a specific file format into table, it takes text, ORC, CSV etc.
SERDE – can be the associated Hive SERDE.
If you already have a table created by following Create Hive Managed Table article, skip to the next section.
CREATE TABLE IF NOT EXISTS emp.employee ( id int, name string, age int, gender string ) COMMENT 'Employee Table' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Note: In order to load the CSV comma-separated file to the Hive table, you need to create a table with
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Hive LOAD CSV File from HDFS
LOAD DATA statement is used to load the text, CSV, ORC file into Table. Load statement performs the same regardless of the table being Managed/Internal vs External
Now, let’s see how to load a data file into the Hive table we just created.
- Create a data file (for our example, I am creating a file with comma-separated columns)
- Now use the Hive
LOADcommand to load the file into the table.
LOAD DATA INPATH '/user/hive/data/data.csv' INTO TABLE emp.employee;
Note that after loading the data, the source file will be deleted from the source location, and the file loaded to the Hive data warehouse location or to the
LOCATION specified while creating a table.
hdfs dfs -ls /user/hive/warehouse/emp.db/employee/ -rw-r--r-- 1 hive supergroup 52 2020-10-09 19:29 /user/hive/warehouse/emp.db/employee/data.txt
SELECT command to get the data from a table and confirm data loaded successfully without any issues.
SELECT * FROM emp.employee
LOAD CSV File from the LOCAL filesystem
LOCAL optional clause to load CSV file from the local filesystem into the Hive table without uploading to HDFS.
LOAD DATA LOCAL INPATH '/home/hive/data.csv' INTO TABLE emp.employee;
Unlike loading from HDFS, source file from
LOCAL file system won’t be removed.
Use OVERWRITE clause
OVERWRITE clause of the LOAD command to delete the contents of the target table and replaced it with the records from the file referred.
LOAD DATA LOCAL INPATH '/home/hive/data.csv' OVERWRITE INTO TABLE emp.employee;
Use PARTITION clause
If you have a partitioned table, use
PARTITION optional clause to load data into specific partitions of the table. you can also use
OVERWRITE to remove the contents of the partition and re-load.
LOAD DATA LOCAL INPATH '/home/hive/data.csv' OVERWRITE INTO TABLE emp.employee PARTITION(date=2020);
Use INSERT INTO
Like SQL, you can also use
INSERT INTO to insert rows into Hive table.
INSERT INTO emp.employee values(7,'scott',23,'M'); INSERT INTO emp.employee values(8,'raman',50,'M');
Happy Learning !!