You are currently viewing How to replace NULL values with Default in Hive

Use nvl() function in Hive to replace all NULL values of a column with a default value, In this article, I will explain with an example.

Advertisements

You can use this function to

  • Replace all NULL values with -1 or 0 or any number for the integer column.
  • Replace all NULL values with empty space for string types.
  • Replace with any value based on your need.

Related: Spark Replace NULL Values with Empty String

Before we just into examples, first, let’s create a table and load some records into it.


CREATE TABLE IF NOT EXISTS employee (
 id int,
 name string,
 age int,
 gender string,
 salary decimal
)
COMMENT 'Employee Table'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

And the table is loaded with the following records where it has NULL values on few columns.


jdbc:hive2://> SELECT * FROM employee;
+--------------+----------------+---------------+------------------+------------------+
| employee.id  | employee.name  | employee.age  | employee.gender  | employee.salary  |
+--------------+----------------+---------------+------------------+------------------+
| 1            | James          | 30            | M                | 10000            |
| 2            | Ann            | 40            | F                | 20000            |
| 3            | Jeff           | 41            | M                | 30000            |
| 4            | Jennifer       | 20            | F                | 40000            |
| 5            | Kum            | 30            | M                | NULL             |
| 7            | Jeni           | NULL          |                  | NULL             |
+--------------+----------------+---------------+------------------+------------------+
6 rows selected (0.149 seconds)

Hive Replace Null Values

Hive nvl() function takes two argument, first being a column name where you wanted to replace NULL values and the second being the default value you wanted to replace with.


0: jdbc:hive2://> select id,name,nvl(age,-1) as age,nvl(salary,-1) as salary from employee;
OK
+-----+-----------+------+---------+
| id  |   name    | age  | salary  |
+-----+-----------+------+---------+
| 1   | James     | 30   | 10000   |
| 2   | Ann       | 40   | 20000   |
| 3   | Jeff      | 41   | 30000   |
| 4   | Jennifer  | 20   | 40000   |
| 5   | Kum       | 30   | -1      |
| 7   | Jeni      | -1   | -1      |
+-----+-----------+------+---------+
6 rows selected (0.212 seconds)
0: jdbc:hive2://>

As you see it replaced the NULL values on the salary and age columns. You can also replace strings with an empty value.

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply