You are currently viewing R – Replace NA with Empty String in a DataFrame

How to replace NA (missing values) with blank space or an empty string in an R dataframe? You can replace NA values with blank space on columns of R dataframe (data.frame) by using is.na(), replace() methods. And use dplyr::mutate_if() to replace only on character columns when you have mixed numeric and character columns, use dplyr::mutate_at() to replace on multiple selected columns by index and name.

Advertisements
  • R base is.na() function
  • R base replace() function
  • dplyr::mutate_if() and tidyr::replace_na()
  • dplyr::mutate_at() and tidyr::replace_na()

Generally, NA values are considered missing values, and doing any operation on these values results in inconsistent results, hence before processing data, it is good practice to handle these missing values. Similarly, using these you can also replace NA with zero (0) in R.

1. Quick Examples of Replace NA Values with Empty String

Below are quick examples of how to replace dataframe column values from NA to blank space or an empty string in R.


# Quick Examples

#Example 1 - Replace na values with blank using is.na()
my_dataframe[is.na(my_dataframe)] <- ""

#Example 2 - By using replace() & is.na()
my_dataframe <- replace(my_dataframe, is.na(my_dataframe), "")

# All below examples need to load these libraries
library("dplyr")
library("tidyr")

#Example 3 - Replace only string columns
my_dataframe <- my_dataframe %>% 
      mutate_if(is.character, ~replace_na(.,""))

# Example 4 - Replace on selected columns by Name
my_dataframe <- my_dataframe %>%
  mutate_at(c('name','gender'), ~replace_na(.,""))

# Example 5 - Replace on selected columns by Index
my_dataframe <- my_dataframe %>% 
  mutate_at(c(1,2), ~replace_na(.,""))

Let’s create a dataframe with some NA values, run these examples and validate the result.


#Create dataframe
my_dataframe=data.frame(
  name=c('sravan',NA,'chrisa','shivgami',NA),
  gender=c(NA,'m',NA,'f',NA))

#Display dataframe
print(my_dataframe)

Output:


#Output
      name gender
1   sravan   <NA>
2     <NA>      m
3   chrisa   <NA>
4 shivgami      f
5     <NA>   <NA>

2. Replace NA values with Empty String using is.na()

is.na() is used to check whether the given dataframe column value is equal to NA or not in R. If it is NA, it will return TRUE, otherwise FALSE. So by specifying it inside-[] (index), it will return NA and assigns it to space. In this way, we can replace NA (missing values) with empty string in an R DataFrame.

Syntax:


#Syntax
df[is.na(df)] = "value to replace"

where my_dataframe is the input dataframe. Let’s run an example to update NA values with blank space in R dataframe.


#Replace na values with blank using is.na()
my_dataframe[is.na(my_dataframe)] <- ""

#Display the dataframe
print(my_dataframe)

Output:


#Output
      name gender
1   sravan       
2               m
3   chrisa       
4 shivgami      f
5                

In the above output, we can see that NA values are replaced with blank space.

3. Replace NA values with Blank Space using replace()

Let’s see another way to change NA values with zero using the replace(). It will take three parameters.

Syntax:


#Syntax
replace(df,is.na(df),"value to replace")

Parameters:

  1. the first parameter is the input dataframe.
  2. the second parameter takes is.na() method to check if it is NA
  3. the last parameter takes value “” (blank), which will replace the value present in the second parameter

Example: Replace NA with blank space in the dataframe using replace()


#By using replace() & is.na()
my_dataframe <- replace(my_dataframe, is.na(my_dataframe), "")

#Display dataframe
print(my_dataframe)

Yields the same output as above.

Alternatively, you can also write the above statement using %>% operator. In order to use this, load library dplyr.


#Example 2 - Using %>% 
library("dplyr")
my_dataframe <- my_dataframe %>% 
     replace(is.na(my_dataframe), "")
print(my_dataframe)

4. Replace NA with Empty String only on Character Columns

All examples above use dataframe with only characters hence renaming NA with an empty string is straight forward but, in real-time we would get a mix of numeric and character columns, and running the above examples results in an error hence, we need to use qualifiers to apply the change only on character columns ignoring numeric columns.

You can apply conditions by using dplyr::mutate_if() and is.character is used to check if the column is a character or not and apply tidyr::replace_na() only on character columns.


#Create dataframe
my_dataframe=data.frame(
  id=c(2,1,3,4,NA),
  name=c('sravan',NA,'chrisa','shivgami',NA),
  gender=c(NA,'m',NA,'f',NA))

#Load library
library("dplyr")
library("tidyr")

#Replace only character columns
my_dataframe <- my_dataframe %>%
       mutate_if(is.character, ~replace_na(., ""))
print(my_dataframe)

Yields below output. Notice that colid id still have NA values as it’s been ignored because it holds numeric values.


#Output
  id     name gender
1  2   sravan       
2  1               m
3  3   chrisa       
4  4 shivgami      f
5 NA                

5. Replace NA with Empty String on Selected Multiple Columns

To replace NA with an empty string on selected multiple columns by name use mutate_at() function with vector c() of column names.


# Replace on selected multiple columns
library("dplyr")
library("tidyr")
my_dataframe <- my_dataframe %>% 
    mutate_at(c('name','gender'), ~replace_na(.,""))
print(my_dataframe)

Yields the same output as above.

6. Replace NA with Empty String on Selected Multiple Index

Finally, if you wanted to replace NA with an empty string on selected multiple r dataframe columns by index use mutate_at() function with vector c() of index values.


# Replace on selected multiple index
library("dplyr")
library("tidyr")
my_dataframe <- my_dataframe %>% 
     mutate_at(c(2,3), ~replace_na(.,""))
print(my_dataframe)

Yields the same output as above. You can find more details on the mutate() function and its variants in the R Documentation.

6. Conclusion

In this article, I have explained several ways to replace NA also called missing values with blank space or an empty string in the R dataframe by using is.na(), replace() methods. And use dplyr::mutate_if() to replace only on character columns when you have mixed numeric and character columns, use dplyr::mutate_at() to replace on multiple selected columns by index and name.

References

  1. replace() in R
  2. imputeTS() package in R
  3. What is NA or Missing Values?

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium