R – Replace NA values with 0 (zero)

  • Post author:

How do I replace NA values on a numeric column with 0 (zero) in an R DataFrame (data.frame)? You can replace NA values with zero(0) on numeric columns of R data frame by using is.na(), replace(), imputeTS::replace(), dplyr::coalesce(), dplyr::mutate_at(), dplyr::mutate_if(), and tidyr::replace_na() functions.

For numeric columns, it is best to replace them with zero or any value that makes sense, and for strings, replace them with empty space. Using these methods you can also replace NA values with empty string.

Generally, NA values are considered missing values, and doing any operation on these values results in inconsistent results, hence before processing data, it is good practice to handle these missing values. In this article, we will see how to replace NA values with Zero in an R data frame with examples like replaced by a single index, multiple indexes, single column name, multiple column names, and on all columns.

1. Quick Examples of Replace NA Values with 0

Below are quick examples of how to replace data frame column values from NA to 0 in R.


#Quick Examples

#Example 1 - Replace na values with 0 using is.na()
my_dataframe[is.na(my_dataframe)] <- 0

#Example 2 - Replace on selected column
my_dataframe["pages"][is.na(my_dataframe["pages"])] <- 0
print(df)

#Example 3 - By using replace() & is.na()
my_dataframe <- replace(my_dataframe, is.na(my_dataframe), 0)

#Example 4 - Another way
my_dataframe <- my_dataframe %>% replace(is.na(.), 0)

#Example 5 - Load the imputeTS package
library("imputeTS")
#Replace NA avalues with 0
my_dataframe <- na_replace(my_dataframe, 0)

#Example 6 - Replace NA with zero on all numeric column
library("dplyr")
my_dataframe <- mutate_all(my_dataframe, ~coalesce(.,0))

#All below examples required these libraries
library("tidyr")
library("dplyr")

#Example 7 - Replace NA with zero on all numeric column
my_dataframe <- mutate_all(my_dataframe, ~replace_na(.,0))

#Example 8 - Replace NA using setnafill() from data.table
library("data.table")
my_dataframe <- setnafill(my_dataframe, fill=0)

#Example 9 - Replace na with zero on specific numeric column
#Load dplyr library
my_dataframe <- my_dataframe %>% 
          mutate(id = coalesce(id, 0))

# Example 10 - Replace on multiple columns
my_dataframe <- my_dataframe %>% 
  mutate(id = coalesce(id, 0),
         pages = coalesce(pages, 0))

# Example 11 - Load tidyr library
my_dataframe <- my_dataframe %>% 
    mutate_at(1, ~replace_na(.,0))

# Example 12 - Replace NA on multiple columns by Index
my_dataframe <- my_dataframe %>% 
    mutate_at(c(1,3), ~replace_na(.,0))

# Example 13 - Replace NA on multiple columns by name
my_dataframe <- my_dataframe %>% 
    mutate_at(c('id','pages'), ~replace_na(.,0))

# Example 14 - Replace only numeric columns
my_dataframe <- my_dataframe %>% 
    mutate_if(is.numeric, ~replace_na(., 0))

As you noticed above, I have used the following methods to replace NA values with 0 in R.

  • Using is.na()
  • Using replace()
  • Using replace() from imputeTS package
  • Using coalesce() from dplyr package
  • Using mutate(), mutate_at(), mutate_if() from dplyr package
  • Using replace_na() from tidyr package
  • Using setnafill() from data.table package

Let’s create a data frame with some NA values, run these examples and validate the result.


#Create dataframe with 5 rows and 3 columns
my_dataframe=data.frame(id=c(2,1,3,4,NA),
        name=c('sravan',NA,'chrisa','shivgami',NA),
        gender=c(NA,'m',NA,'f',NA))

#Display dataframe
print(my_dataframe)

Output:


#Output
  id     name gender
1  2   sravan   <NA>
2  1     <NA>      m
3  3   chrisa   <NA>
4  4 shivgami      f
5 NA     <NA>   <NA>

2. Replace NA values with 0 using is.na()

is.na() is used to check whether the given data frame column value is equal to NA or not in R. If it is NA, it will return TRUE, otherwise FALSE. So by specifying it inside-[] (index), it will return NA and assigns it to 0. In this way, we can replace NA values with Zero(0) in an R DataFrame.


#Replace na values with 0 using is.na()
my_dataframe[is.na(my_dataframe)] = 0

#Display the dataframe
print(my_dataframe)

Output:


#Output
  id     name gender
1  2   sravan      0
2  1        0      m
3  3   chrisa      0
4  4 shivgami      f
5  0        0      0

In the above output, we can see that NA values are replaced with 0’s.

3. Replace NA values with 0 in a DataFrame using replace()

Let’s see another way to change NA values with zero using the replace(). It will take three parameters.


#Replace NA avalues with 0
my_dataframe <- replace(my_dataframe,is.na(my_dataframe),0)
  1. the first parameter is the input data frame.
  2. the second parameter takes is.na() method to check if it is NA
  3. the last parameter takes value 0, which will replace the value present in the second parameter

Output:


# Output
  id     name gender
1  2   sravan      0
2  1        0      m
3  3   chrisa      0
4  4 shivgami      f
5  0        0      0

In the above output, we can see that NA values are replaced with 0’s.

4. Replace NA values with 0 using replace() from “imputeTS”

replace() is used to replace NA with 0 in an R data frame. It is available in imputeTS package. so we have to install and load this package before using rename() method.

imputeTS is a third-party library hence, in order to use imputeTS library, you need to first install it by using install.packages('imputeTS'). Once installation completes, load the imputeTS library in order to use this replace() method. To load a library in R, use library("imputeTS").


#Replace NA avalues with 0
my_dataframe <- na_replace(my_dataframe, 0)

Output:


# Output
  id     name gender
1  2   sravan      0
2  1        0      m
3  3   chrisa      0
4  4 shivgami      f
5  0        0      0

In the above output, we can see that NA values are replaced with 0’s.

5. Replace NA with Zero on All Numeric Values

There are several other ways to rename NA with zero in the R data frame by using methods from the dplyr package.

All previous examples use the Base R built-in functions that can be used on a smaller dataset but, for bigger data sets, you have to use methods from dplyr package as they perform 30% faster. dplyr package uses C++ code to evaluate. Let’s create another data frame with all numeric columns and run these examples.


# Create dataframe with numeric columns
my_dataframe=data.frame(pages=c(32,45,NA,22,NA),
                        chapters=c(NA,86,11,15,NA),
                        price=c(144,553,321,567,NA))

# Replace NA using coalesce() from dplyr
library("dplyr")
my_dataframe <- mutate_all(my_dataframe, ~coalesce(.,0))

# Replace NA using replace_na() from tidyr
library("dplyr")
library("tidyr")
my_dataframe <- mutate_all(my_dataframe, ~replace_na(.,0))

# Replace NA using setnafill() from data.table
library("data.table")
my_dataframe <- setnafill(my_dataframe, fill=0)

All above examples yield the same below output.


# Output
  id pages chapters price
1 11    32        0   144
2 22    45       86   553
3 33     0       11   321
4 44    22       15   567
5  0     0        0     0

Here, the coalesce() function is from dplyr package. This returns the first non-missing value of its arguments.

6. Update NA with Zero By Specific Column Name

Here we use mutate() function with coalesce() from dplyr package. This updates NA values with zero on the id column. By using this on character columns you will get an error.


# Load dplyr library
library("dplyr")
#Replace NA with zero on specific numeric column
my_dataframe <- my_dataframe %>% 
            mutate(id = coalesce(id, 0))

7. Update NA with Zero on Multiple Columns by Name

Let’s use the same above approach but replace NA with zero on multiple columns by column name.


# Replace on multiple columns
library("dplyr")
my_dataframe <- my_dataframe %>% 
  mutate(id = coalesce(id, 0),
         pages = coalesce(pages, 0))

8. Replace NA with 0 on Column by Index

Use mutate_at() to specify the index number where you wanted to replace NA values with zero in R data frame.


# Load tidyr library
library("tidyr")
library("dplyr")
my_dataframe <- my_dataframe %>% 
    mutate_at(1, ~replace_na(.,0))
print(my_dataframe)

Yields below output.


# Output
  id pages chapters price
1 11    32       NA   144
2 22    45       86   553
3 33    NA       11   321
4 44    22       15   567
5  0    NA       NA    NA

9. Replace NA on Multiple Columns by Index

mutate_at() also takes vector with index numbers which is used to replace NA with 0 on multiple columns and replace_na() replaces all NA with 0.


# Replace NA on multiple columns by Index
library("tidyr")
library("dplyr")
my_dataframe <- my_dataframe %>% 
    mutate_at(c(1,3), ~replace_na(.,0))

print(my_dataframe)

Yields below output.


# Output
  id pages chapters price
1 11    32        0   144
2 22    45       86   553
3 33    NA       11   321
4 44    22       15   567
5  0    NA        0    NA

10. Replace Only on Numeric Columns

When you have data.frame with a mix of numeric and character columns, to update only numeric columns from NA with 0 use mutate_if() with is.numeric as a parameter.


# Replace only numeric columns
library("tidyr")
library("dplyr")
my_dataframe <- my_dataframe %>% 
    mutate_if(is.numeric, ~replace_na(., 0))

11. Data with Factor Values

If you have data with numeric and characters most of the above examples work without issue. But, if you have factor values, first you need to convert them to a character before replacing NA with zero.


#Change factors to character type
my_dataframe[i] <- lapply(my_dataframe[i], as.character)

# Replace NA with 0
my_dataframe[is.na(my_dataframe)] <- 0 

# Change character columns back to factors
my_dataframe[i] <- lapply(my_dataframe[i], as.factor) 

12. Conclusion

In this article, I have explained several ways to replace NA values with zero (0) on numeric columns of R data frame. We can use replace() method in two ways. One is from imputeTS package and another way is we can use it directly.

Related Articles

References

  1. replace() in R
  2. imputeTS() package in R
  3. NA
r replace NA 0

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing R – Replace NA values with 0 (zero)