You are currently viewing R – Replace NA with 0 in Multiple Columns

Use R dplyr::coalesce() to replace NA with 0 on multiple dataframe columns by column name and dplyr::mutate_at() method to replace by column name and index. tidyr:replace_na() to replace. Using these methods and packages you can also replace NA with an empty string in R dataframe.

The dplyr and tidyr are third-party packages that are required to install first and load before use.

1. Quick Examples of Replace NA with 0 in Multiple Columns

Following are quick examples of how to replace NA with 0 (zero) in multiple dataframe columns of R dataframe.


# Quick Examples
# Example 1 - Replace on multiple columns by column name
# Load dplyr library
library("dplyr")
my_dataframe2 <- my_dataframe %>% 
  mutate(id = coalesce(id, 0),
         pages = coalesce(pages, 0))

# Example 3 - Replace NA on multiple columns by name
library("dplyr")
library("tidyr")
my_dataframe3 <- my_dataframe %>% 
    mutate_at(c('id','pages'), ~replace_na(.,0))

# Example 4 - Replace NA on multiple columns by Index
library("dplyr")
library("tidyr")
my_dataframe3 <- my_dataframe %>% 
    mutate_at(c(1,3), ~replace_na(.,0))

Let’s create a dataframe with all numeric columns id, pages, and price.


# Create dataframe
my_dataframe=data.frame(id=c(11,NA,33),
                        pages=c(32,45,NA),
                        price=c(144,NA,321))
print(my_dataframe)

# Output
#  id pages price
#1 11    32   144
#2 NA    45    NA
#3 33    NA   321

2. Replace NA with 0 on Multiple Columns by Name

In order to repalce NA with 0 on multiple columns by name use R dplyr::coalesce() package along with mutate(). coalesce() method takes dataframe column name and the value you wanted to replace with.

The dplyr is third-party package that is required to install first using install.packages('dplyr') and load it using library("dplyr")

Syntax: Following is a syntax on how to use these methods together.


# Syntax
my_dataframe <- my_dataframe %>% 
  mutate(col_name1 = coalesce(col_name1, 0),
         col_name2 = coalesce(col_name2, 0))

Here, my_dataframe is a datafram and col_name* is a column name where you wanted to replace NA values. The infix operator %>% is a pipe, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator


# Example 1 - Rename on multiple columns
# Load dplyr library
library("dplyr")

my_dataframe2 <- my_dataframe %>% 
  mutate(id = coalesce(id, 0),
         pages = coalesce(pages, 0))
print(my_dataframe2)

Yields below output.


# Output
  id pages price
1 11    32   144
2  0    45    NA
3 33     0   321

Note that the NA values are not present anymore on columns id and pages.

3. Replace NA on Multiple Columns by Index

Use tidyr::replace_na() to update NA values with 0 on selected multiple column indexes. dplyr::mutate_at() takes vector with index numbers and replace_na() replaces all NA with 0 on all multiple indexes specified with vector.

tidyr is a third-party package that is required to install install.packages('tidyr') and load it using library("tidyr").

Syntax: Following is a syntax on how to use these methods together.


# Syntax
my_dataframe <- my_dataframe %>% 
    mutate_at(c(col_idex1,col_index2,..), ~replace_na(.,0))

Here, my_dataframe is a dataframe and col_index* is a column index where you wanted to replace NA values.


# Load library
library("dplyr")
library("tidyr")

# Replace NA on multiple columns by Index
my_dataframe3 <- my_dataframe %>% 
    mutate_at(c(1,3), ~replace_na(.,0))
print(my_dataframe3)

Yields below output.


# Output
  id pages price
1 11    32   144
2  0    45     0
3 33    NA   321

Note that the NA values are not present anymore on columns id and price.

4. Using mutate_at() with Column Names

You can also use mutate_at() to replace NA with 0 by selecting column names.


# Load library
library("dplyr")
library("tidyr")

# Replace NA on multiple columns by name
my_dataframe3 <- my_dataframe %>% 
    mutate_at(c('id','price'), ~replace_na(.,0))
print(my_dataframe3)

5. Conclusion

In this article, you have learned how to replace or update a NA value with 0 on multiple columns of R dataframe by index and name. The dplyr is third-party package that is required to install first using install.packages('dplyr') and load it using library("dplyr"), similarly tidyr also required to install install.packages('tidyr') and load library("tidyr") before using

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium