You are currently viewing Why Hive Table is loading with NULL values?

Problem: I created a new managed table in Hive and tried loading the data to the table, post LOAD command, select shows all NULL values as the data got loaded NULL for all rows and columns.

I’ve created a Hive table as shown below.


jdbc:hive2://127.0.0.1:10000> CREATE TABLE IF NOT EXISTS emp.employee (
 . . . . . . . . . . . . . .> id int,
 . . . . . . . . . . . . . .> name string,
 . . . . . . . . . . . . . .> age int,
 . . . . . . . . . . . . . .> gender string )
 . . . . . . . . . . . . . .> COMMENT 'Employee Table';

I’ve a data file with comma separated fields, upload the data file (data.txt) to HDFS and loaded the file using below command.


LOAD DATA INPATH '/user/hive/data/data.txt' INTO table emp.employee;

selecting the table returns the following.


+--------------+----------------+---------------+------------------+
| employee.id  | employee.name  | employee.age  | employee.gender  |
+--------------+----------------+---------------+------------------+
| NULL         | NULL           | NULL          | NULL             |
| NULL         | NULL           | NULL          | NULL             |
| NULL         | NULL           | NULL          | NULL             |
+--------------+----------------+---------------+------------------+

How to resolve Loading NULL values into Hive Table?

You are getting NULL values loaded to the Hive table because your data is in comma-separated whereas Hive default separator is ^A hence Hive cannot recognizes your columns and loaded them as NULL values.

Since the default field/column terminator in Hive is ^, you need to explicitly mention your custom terminator explicitly using ROW FORMAT DELIMITED FIELDS TERMINATED BY while creating your table.

For example


0: jdbc:hive2://127.0.0.1:10000> CREATE TABLE IF NOT EXISTS emp.employee (
. . . . . . . . . . . . . . . .> id int,
. . . . . . . . . . . . . . . .> name string,
. . . . . . . . . . . . . . . .> age int,
. . . . . . . . . . . . . . . .> gender string)
. . . . . . . . . . . . . . . .> ROW FORMAT DELIMITED
. . . . . . . . . . . . . . . .> FIELDS TERMINATED BY ',';

Now load the same file into the table


LOAD DATA INPATH '/user/hive/data/data.txt' INTO table emp.employee;

Now, the table gets loaded with real values. SELECT * FROM emp.employee returns below output.


+--------------+----------------+---------------+------------------+
| employee.id  | employee.name  | employee.age  | employee.gender  |
+--------------+----------------+---------------+------------------+
| 1            | James          | 30            | M                |
| 2            | Ann            | 40            | F                |
| 3            | Jeff           | 41            | M                |
| 4            | Jennifer       | 20            | F                |
+--------------+----------------+---------------+------------------+

Hope this resolves loading NULL values to Hive table.

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium