You are currently viewing R – Replace NA with 0 in Multiple Columns

Use R dplyr::coalesce() to replace NA with 0 on multiple dataframe columns by column name and dplyr::mutate_at() method to replace by column name and index. tidyr:replace_na() to replace. Using these methods and packages you can also replace NA with an empty string in R dataframe.

Advertisements

The dplyr and tidyr are third-party packages that are required to install first and load before use.

1. Quick Examples of Replace NA with 0 in Multiple Columns

Following are quick examples of how to replace NA with 0 (zero) in multiple dataframe columns of R dataframe.


# Quick Examples
# Example 1 - Replace on multiple columns by column name
# Load dplyr library
library("dplyr")
my_dataframe2 <- my_dataframe %>% 
  mutate(id = coalesce(id, 0),
         pages = coalesce(pages, 0))

# Example 3 - Replace NA on multiple columns by name
library("dplyr")
library("tidyr")
my_dataframe3 <- my_dataframe %>% 
    mutate_at(c('id','pages'), ~replace_na(.,0))

# Example 4 - Replace NA on multiple columns by Index
library("dplyr")
library("tidyr")
my_dataframe3 <- my_dataframe %>% 
    mutate_at(c(1,3), ~replace_na(.,0))

Let’s create a dataframe with all numeric columns id, pages, and price.


# Create dataframe
my_dataframe=data.frame(id=c(11,NA,33),
                        pages=c(32,45,NA),
                        price=c(144,NA,321))
print(my_dataframe)

# Output
#  id pages price
#1 11    32   144
#2 NA    45    NA
#3 33    NA   321

2. Replace NA with 0 on Multiple Columns by Name

In order to repalce NA with 0 on multiple columns by name use R dplyr::coalesce() package along with mutate(). coalesce() method takes dataframe column name and the value you wanted to replace with.

The dplyr is third-party package that is required to install first using install.packages('dplyr') and load it using library("dplyr")

Syntax: Following is a syntax on how to use these methods together.


# Syntax
my_dataframe <- my_dataframe %>% 
  mutate(col_name1 = coalesce(col_name1, 0),
         col_name2 = coalesce(col_name2, 0))

Here, my_dataframe is a datafram and col_name* is a column name where you wanted to replace NA values. The infix operator %>% is a pipe, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator


# Example 1 - Rename on multiple columns
# Load dplyr library
library("dplyr")

my_dataframe2 <- my_dataframe %>% 
  mutate(id = coalesce(id, 0),
         pages = coalesce(pages, 0))
print(my_dataframe2)

Yields below output.


# Output
  id pages price
1 11    32   144
2  0    45    NA
3 33     0   321

Note that the NA values are not present anymore on columns id and pages.

3. Replace NA on Multiple Columns by Index

Use tidyr::replace_na() to update NA values with 0 on selected multiple column indexes. dplyr::mutate_at() takes vector with index numbers and replace_na() replaces all NA with 0 on all multiple indexes specified with vector.

tidyr is a third-party package that is required to install install.packages('tidyr') and load it using library("tidyr").

Syntax: Following is a syntax on how to use these methods together.


# Syntax
my_dataframe <- my_dataframe %>% 
    mutate_at(c(col_idex1,col_index2,..), ~replace_na(.,0))

Here, my_dataframe is a dataframe and col_index* is a column index where you wanted to replace NA values.


# Load library
library("dplyr")
library("tidyr")

# Replace NA on multiple columns by Index
my_dataframe3 <- my_dataframe %>% 
    mutate_at(c(1,3), ~replace_na(.,0))
print(my_dataframe3)

Yields below output.


# Output
  id pages price
1 11    32   144
2  0    45     0
3 33    NA   321

Note that the NA values are not present anymore on columns id and price.

4. Using mutate_at() with Column Names

You can also use mutate_at() to replace NA with 0 by selecting column names.


# Load library
library("dplyr")
library("tidyr")

# Replace NA on multiple columns by name
my_dataframe3 <- my_dataframe %>% 
    mutate_at(c('id','price'), ~replace_na(.,0))
print(my_dataframe3)

5. Conclusion

In this article, you have learned how to replace or update a NA value with 0 on multiple columns of R dataframe by index and name. The dplyr is third-party package that is required to install first using install.packages('dplyr') and load it using library("dplyr"), similarly tidyr also required to install install.packages('tidyr') and load library("tidyr") before using

References