Export Hive Table into CSV File with Header?

To export a Hive table into a CSV file you can use either INSERT OVERWRITE DIRECTORY or by piping the output result of the select query into a CSV file. In this article, I will explain how to export the Hive table into a CSV file on HDFS, Local directory from Hive CLI and Beeline, using HiveQL script, and finally exporting data with column names on the header.

Related: Start HiveServer2 and use Beeline commands

First, start HiveServer2 and connect using the beeline as shown below.


bin/beeline -u connect jdbc:hive2://192.168.1.48:10000 scott tiger

Replace IP address, username, and password according to your setup.

If you are using older version of Hive and wanted to use Hive CLI, use below option.


$HIVE_HOME/bin/hive

Create Table and Load a few Rows

In order to export the table into a CSV file, first, let’s create a table employee in the emp database and load the table with some data.

hive export table csv file

Follow the below steps to LOAD data into this table.

  • Create a data file (for our example, I am creating a file with comma-separated fields)
  • Upload the data file (data.txt) to HDFS. Note you can also load the data from LOCAL without uploading to HDFS.
  • Now use the Hive LOAD command to load the file into table.

LOAD DATA INPATH '/user/hive/data/data.txt' INTO TABLE emp.employee;

The emp.employee table is loaded with below data.

hive export table csv file

1. Exporting from Hive or Beeline Terminal

Hive provides an INSERT OVERWRITE DIRECTORY statement to export a Hive table into a file, by default the exported data has a ^A (invisible character) as a field separator.

If you wanted to export the Hive table into a CSV file (with comma delimiter fields), use the option ROW FORMAT DELIMITED FIELDS TERMINATED BY and specify the field delimiter you want.

1.1 Export Table into CSV File on HDFS

By default INSERT OVERWRITE DIRECTORY command exports result of the specified query into HDFS location.


#Exports to HDFS directory
INSERT OVERWRITE DIRECTORY '/user/data/output/export' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' SELECT * FROM emp.employee;

This exports the complete Hive table into an export directory on HDFS. Based on your table size, this command may export data into multiple files.

Since our file is very small it exports into a single file and you can use the HDFS command to check the content of the exported file.


hdfs dfs -cat /user/data/output/export/000000_0
1,James,30,M
2,Ann,40,F
3,Jeff,41,M
4,Jennifer,20,F

When you have data exported into multiple files, If you want a single file then you need to concatenate them on the client-side once the export is done.


hdfs dfs -cat /user/data/output/export/* | hadoop fs -put - /user/data/output/export/output.txt

This concatenates all files and put it back into HDFS at specified location.

1.2 Export Table into CSV File on LOCAL Directory

Use optional LOCAL option to export data from Hive table into local CSV file.


#Exports to LOCAL directory
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/export' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' SELECT * FROM emp.employee;

This exports the result of the select query in CSV format into export directory at the /tmp/ location. If your dataset is small, you can use Unix cat command to check the data.


cat /tmp/export/000000_0
1,James,30,M
2,Ann,40,F
3,Jeff,41,M
4,Jennifer,20,F
[email protected]:~/hive$

If you have a huge table, this exports the data into multiple part files, You can combine them into a single file using Unix cat command as shown below.


cat /tmp/export/* > output.csv

Note: Since Hive uses MapReduce or Tez behind to run the queries and they create multiple part files when you have large data, it’s not a good practice to export the data along with column names on the header as each exported part files contains a header record. In case if you want to export with a header for any reason, I’ve some examples here to export with column names.

2. Export Hive Table into CSV Using Beeline

In this section, I will explain exporting a Hive table into CSV using beeline CLI directly.

If you are using an older version of the hive and using the hive command then jump to exporting table using the Hive command

2.1 Export Using Beeline into HDFS Directory

Instead of connecting to Hive/Beeline CLI and running commands may not be an option for some use cases. Alternatively, you can export directly using Hive/Beeline command.


#Exports to HDFS location
bin/beeline -u jdbc:hive2://192.168.1.148:10000 -n scott -p tiger -e "INSERT OVERWRITE DIRECTORY '/user/data/output/export' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' SELECT * FROM emp.employee"

2.2 Export Using Beeline into LOCAL Directory

Use optional LOCAL clause to export a CSV file from Hive table into local directory.


#Exports to LOCAL directory
bin/beeline -u jdbc:hive2://192.168.1.148:10000 -n scott -p tiger -e "INSERT OVERWRITE LOCAL DIRECTORY '/tmp/export' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' SELECT * FROM emp.employee"

2.3 Export Hive Table into CSV File with Header

By default Beeline terminal outputs the data into a tabular format, by changing the format into CSV2, Hive beeline returns the results in a CSV format. By piping this output into a CSV file, we will get a CSV file with a header. Let’s see this in action.


#This exports with field names on header
bin/beeline -u jdbc:hive2://localhost:10000 -n scott -p tiger --outputformat=csv2 -e "SELECT * FROM emp.employee" > export.csv

This exports the result of the select query in CSV format into export.csv file at the current local directory. If your dataset is small, you can use Unix cat command to check the data.


cat export.csv
employee.id,employee.name,employee.age,employee.gender
1,James,30,M
2,Ann,40,F
3,Jeff,41,M
4,Jennifer,20,F
[email protected]:~/hive$

3. Export Hive Table into CSV Using Hive CLI Command

If you are using an older version of Hive, below are the different examples to export a table into a CSV file.

This example exports to HDFS directory.


#Exports to HDFS directory
bin/hive -e "INSERT OVERWRITE DIRECTORY '/user/data/output/export' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' SELECT * FROM emp.employee"

This example exports to LOCAL directory.


#Exports to LOCAL directory
bin/hive -e "INSERT OVERWRITE LOCAL DIRECTORY '/tmp/export' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' SELECT * FROM emp.employee"

Another example which exports into LOCAL directory. here we use sed command to replace the tab with comma separator.


#Exports to LOCAL directory
bin/hive -e "SELECT * FROM emp.employee" | sed 's/[\t]/,/g'  > export.csv

You can also specify a property set hive.cli.print.header=true before the SELECT to export CSV file with field/column names on the header.


#This exports with field names on header
bin/hive -e 'set hive.cli.print.header=true; SELECT * FROM emp.employee' | sed 's/[\t]/,/g'  > export.csv

If your Hive version supports, you can also try this. This is a clean approach and doesn’t have to use sed to replace tab or special char with comma.


#Exports to LOCAL directory using CSV2 option
bin/hive --outputformat=csv2 -e 'SELECT * FROM emp.employee' > export.csv

Conclusion

In this article, you have learned how to export the table and the result of the query into a CSV file on HDFS location and LOCAL location also learned exporting using hive, beeline and from their respective terminals

Hope you like it !!

Happy Learning !!

You May Also Like

Leave a Reply

Export Hive Table into CSV File with Header?