R – Replace Zero (0) with NA on Dataframe Column

Spread the love

By using methods from R base, dplyr and tidyr packages we can replace zero (0) with NA value on dataframe columns. In this article, I have covered 10 ways to replace. Also, I have covered replace 0 with NA on a single column, multiple columns and by index position in an R dataframe.

1. Quick Examples of Replace Zero with NA Value

Following are quick examples of how to replace zero with NA values in an R Dataframe.


# Quick Examples

#Example 1 - Replace on all columns
df[df == 0] <- NA

#Example 2 - Replace on selected olumns
df["pages"][df["pages"] == 0] <- NA

#Example 3 - Replace using negation
is.na(df) <- !df

#Example 4 - Using replace() function
df <- replace(df, df==0, NA)

#Example 5 -  Replace using dplyr::na_if()
library(dplyr)  
df <- na_if(df, 0)

#Example 6 - Replace using dplyr::mutate_all() 
library(dplyr) 
df <- df %>% mutate_all(~na_if(., 0)))

#Example 7 - Replace only on all Numeric columns
library(dplyr) 
df <- df %>% mutate_if(is.numeric, ~na_if(., 0))

#Example 8 - Replace only on selected columns
library(dplyr) 
df <- df %>% mutate_at(c('pages'), ~na_if(., 0))

#Example 9 - Replace only on selected column index
library(dplyr) 
df <- df %>% mutate_at(c(2), ~na_if(., 0))

#Example 10 - Replacing on tibble
df <-tibble(
  col1 = c("A", B, "NA"),
  col2 = c(0, 2, NA),
  col3 = c(1, NA, 5)
)
df <- df %>% mutate_if(is.numeric , replace_na, replace = 0)

Let’s create an R dataframe, run these examples and validate the results.


# Create dataframe with numeric columns
df=data.frame(pages=c(32,0,0),
                        chapters=c(20,86,0),
                        price=c(144,0,321))
#Output
#  pages chapters price
#1    32       20   144
#2     0       86     0
#3     0        0   321

2. Replace 0 with NA in an R Dataframe

As you saw above R provides several ways to replace 0 with NA on dataframe, among all the first approach would be using the directly R base feature. Use df[df==0] to check if the value of a dataframe column is 0, if it is 0 you can assign the value NA. The below example replaces all 0 values on all columns with NA.

Related: Refer to this article, if you wanted to replace NA with 0 before processing your data.


#Example 1 - Replace on all columns
df[df == 0] <- NA
print(df)

#Output
#  pages chapters price
#1    32       20   144
#2    NA       86    NA
#3    NA       NA   321

This is the most generic approach where you can use this on vector as well to replace its values.

3. Replace using Negation

Alternatively, you can also achieve using negation and is.na() function.


#Example 2 - Replace using negation
is.na(df) <- !df
print(df)

Yields the same output as above

4. Replace Selected Columns

When you have multiple columns in R dataframe and you would require to select a single column to replace 0 with NA, you can achieve this by following. This updates only column pages.


#Example 3 - Replace on selected columns
df["pages"][df["pages"] == 0] <- NA
print(df)

#Output
#  pages chapters price
#1    32       20   144
#2    NA       86     0
#3    NA        0   321

5. Using R replace() function to update 0 with NA

R has a built-in function called replace() that replaces values in a vector with another value, for example, zeros with NAs.


#Example 4 - Using replace() function
df <- replace(df, df==0, NA)
print(df)

#Output
#  pages chapters price
#1    32       20   144
#2    NA       86    NA
#3    NA       NA   321

6. Update 0 with NA using R dplyr::na_if()

All previous examples use the Base R built-in functions that can be used on a smaller dataset but, for bigger data sets, you have to use methods from dplyr package as they perform 30% faster. dplyr package uses C++ code to evaluate.

The dplyr is third-party package that is required to install first using install.packages('dplyr') and load it using library("dplyr"). na_if() is a method from dplyr package.


#Example 5 -  Replace using dplyr::na_if()
library("dplyr")  
df <- na_if(df, 0)
print(df)

#Output
#  pages chapters price
#1    32       20   144
#2    NA       86    NA
#3    NA       NA   321

7. Update 0 with NA using dplyr::mutate_all()

mutate_all() is another method in dplyr package to substitute the zero with NA value on all dataframe columns.


#Example 6 - Replace using dplyr::mutate_all() 
library(dplyr) 
df <- df %>% mutate_all(~na_if(., 0))
print(df)

#Output
#  pages chapters price
#1    32       20   144
#2    NA       86    NA
#3    NA       NA   321

8. Replace on All Numeric columns

mutate_if() affects variables selected with a predicate function, here is.numeric is used as a predicate to replace values only on numeric columns. Since we have all numeric columns, it updates all columns with NA for value 0.


#Example 7 - Replace only on all Numeric columns
library(dplyr) 
df <- df %>% mutate_if(is.numeric, ~na_if(., 0))
print(df)

#Output
#  pages chapters price
#1    32       20   144
#2    NA       86    NA
#3    NA       NA   321

Yields the same output as above.

9. Replace Zero with NA Only on Selected Columns

mutate_at() affects variables selected with a character vector or vars(). Here we update values only on pages column.


#Example 8 - Replace only on selected columns
library(dplyr) 
df <- df %>% mutate_at(c('pages'), ~na_if(., 0))
print(df)

#Output
#  pages chapters price
#1    32       20   144
#2    NA       86     0
#3    NA        0   321

10. Replace Zero with NA on Selected Column Indexs

If you pass a vector with index position to mutate_at(), it replaces all NA with 0 on selected index position columns in R dataframe. This updates index 2 which is chapters column. Note that in R the index starts from 1.


#Example 9 - Replace only on selected column index
library(dplyr) 
df <- df %>% mutate_at(c(2), ~na_if(., 0))
print(df) 

#Output
# pages chapters price
#1    32       20   144
#2     0       86     0
#3     0       NA   321

11. Replacing on tibble

If you have tibble, use the following approach to replace it. Use tibble() to create a tibble.


#Example 10 - Replacing on tibble
df <-tibble(
  col1 = c("A", B, "NA"),
  col2 = c(0, 2, NA),
  col3 = c(1, NA, 5)
)
df <- df %>% mutate_if(is.numeric , replace_na, replace = 0)
print(df)

#Output
# A tibble: 3 × 3
#  col1   col2  col3
#  <chr> <dbl> <dbl>
#1 A         0     1
#2 B         2     0
#3 NA        0     5

12. Conclusion

In this article, I have covered 10 ways to replace the zero with NA value in R dataframe. Also, I have covered how to replace on a single column, multiple columns, and columns by index positions using R base function and dplyr package methods.

References

Naveen (NNK)

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing R – Replace Zero (0) with NA on Dataframe Column