You are currently viewing Hive Load CSV File into Table

Use the LOAD DATA command to load the data files like CSV into Hive Managed or External table. In this article, I will explain how to load data files into a table using several examples.

Advertisements

Typically Hive Load command just moves the data from LOCAL or HDFS location to Hive data warehouse location or any custom location without applying any transformations.

Hive LOAD Command Syntax

Below is a syntax of the Hive LOAD DATA command.


LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] 
INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] 
[INPUTFORMAT 'inputformat' SERDE 'serde']

Depending on the Hive version you are using, LOAD syntax slightly changes. please refer to the Hive DML document.

filepath – Supports absolute and relative paths. If you use optional clause LOCAL the specified filepath would be referred from the server where hive beeline is running otherwise it would use the HDFS path.

LOCAL – Use LOCAL if you have a file in the server where the beeline is running.

OVERWRITE – It deletes the existing contents of the table and replaces with the new content.

PARTITION – Loads data into specified partition.

INPUTFORMAT – Specify Hive input format to load a specific file format into table, it takes text, ORC, CSV etc.

SERDE – can be the associated Hive SERDE.

If you already have a table created by following Create Hive Managed Table article, skip to the next section.


CREATE TABLE IF NOT EXISTS emp.employee (
 id int,
 name string,
 age int,
 gender string )
 COMMENT 'Employee Table'
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ',';

Note: In order to load the CSV comma-separated file to the Hive table, you need to create a table with ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

Hive LOAD CSV File from HDFS

Hive LOAD DATA statement is used to load the text, CSV, ORC file into Table. Load statement performs the same regardless of the table being Managed/Internal vs External

Now, let’s see how to load a data file into the Hive table we just created.

  • Create a data file (for our example, I am creating a file with comma-separated columns)
  • Now use the Hive LOAD command to load the file into the table.

LOAD DATA INPATH '/user/hive/data/data.csv' INTO TABLE emp.employee;

Note that after loading the data, the source file will be deleted from the source location, and the file loaded to the Hive data warehouse location or to the LOCATION specified while creating a table.


hdfs dfs -ls /user/hive/warehouse/emp.db/employee/
-rw-r--r--   1 hive supergroup         52 2020-10-09 19:29 /user/hive/warehouse/emp.db/employee/data.txt

Use SELECT command to get the data from a table and confirm data loaded successfully without any issues.


SELECT * FROM emp.employee
hive load csv table

LOAD CSV File from the LOCAL filesystem

Use LOCAL optional clause to load CSV file from the local filesystem into the Hive table without uploading to HDFS.


LOAD DATA LOCAL INPATH '/home/hive/data.csv' INTO TABLE emp.employee;

Unlike loading from HDFS, source file from LOCAL file system won’t be removed.

Use OVERWRITE clause

Use optional OVERWRITE clause of the LOAD command to delete the contents of the target table and replaced it with the records from the file referred.


LOAD DATA LOCAL INPATH '/home/hive/data.csv' OVERWRITE INTO TABLE emp.employee;

Use PARTITION clause

If you have a partitioned table, use PARTITION optional clause to load data into specific partitions of the table. you can also use OVERWRITE to remove the contents of the partition and re-load.


LOAD DATA LOCAL INPATH '/home/hive/data.csv' OVERWRITE INTO TABLE emp.employee PARTITION(date=2020);

Use INSERT INTO

Like SQL, you can also use INSERT INTO to insert rows into Hive table.


INSERT INTO emp.employee values(7,'scott',23,'M');
INSERT INTO emp.employee values(8,'raman',50,'M');

Happy Learning !!

Leave a Reply