How to Replace Values in R with Examples

There are several ways to replace/update column values in R DataFrame. In this article, I will explain how to update data frame column values, and update single, multiple, and all columns by using the R base functions/notation, dplyr package.

Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV files to R DataFrame. Also, refer to Import Excel File into R.


# Create dataframe
df <- data.frame(id=c(1,2,3,NA),
      address=c('Orange St','Anton Blvd','Jefferson Pkwy',''),
      work_address=c('Main St',NA,'Apple Blvd','Portola Pkwy'))

df

Output:


# Output
  id        address work_address
1  1      Orange St      Main St
2  2     Anton Blvd         <NA>
3  3 Jefferson Pkwy   Apple Blvd
4 NA                Portola Pkwy  

Notice that the column names are: id, pages, name, chapters, and price.

1. Update Data Frame Column Value

To replace a column value in R use square bracket notation df[], By using this you can update values on a single column or on all columns. To refer to a single column use df$column_name. The following example updates Orange St with Portola Pkwy on the address column.


# Replace String with Another Stirng on a single column
df$address[df$address == 'Orange St'] <- 'Portola Pkwy'
df

# Output
#  id        address work_address
#1  1   Portola Pkwy      Main St
#2  2     Anton Blvd         <NA>
#3  3 Jefferson Pkwy   Apple Blvd
#4 NA                Portola Pkwy

To update values on all columns.


# Replace String with Another String on All Columns
df[df=="Portola Pkwy"] <- "Orange St"
df

# Output
#  id        address work_address
#1  1      Orange St      Main St
#2  2     Anton Blvd         <NA>
#3  3 Jefferson Pkwy   Apple Blvd
4 NA                   Orange St

Use str_replace() method from stringr package to update part of a column string with another string in R DataFrame. The following updates substring St with Street on the address column.


# Replace String with another String
library(stringr)
df$address <- str_replace(df$address, "St", "Street")
print(df)

# Output
#  id        address work_address
#1  1  Orange Street      Main St
#2  2     Anton Blvd         <NA>
#3  3 Jefferson Pkwy   Apple Blvd
4 NA                   Orange St

2. Replace with Another Column Value

In R replacing a column value with another column is a mostly used example, let’s say you wanted to apply some calculation on the existing column and updates the result with on the same column, this can be achieved with the below R example. Here I am multiplying column id with the number 5 and assigning the result to the same id column. Similarly, you can also assign the result to another column.


# Create new column from existing column
df['id'] <- df['id'] * 5
df

# Output
#  id        address work_address
#1  5  Orange Street      Main St
#2 10     Anton Blvd         <NA>
#3 15 Jefferson Pkwy   Apple Blvd
#4 NA                   Orange St

3. Update Based on Condition

In case you wanted to replace column values based on a condition, you need to check with the condition and assign the value from another column to this column when the condition matched. The below example updates the address column with the value of work_address when only if the address value is 'Orange Street'.


# Replace column value with another based on condition
df$address[df$address == 'Orange Street'] <- df$work_address
df

# Output
#  id        address work_address
#1  5        Main St      Main St
#2 10     Anton Blvd         <NA>
#3 15 Jefferson Pkwy   Apple Blvd
#4 NA                   Orange St

4. Using dplyr Package

Use mutate() function from dplyr package to change column values, dplyr is a third-party package hence, you need to load the library using library("dplyr") to use its methods. In case you don’t have this package, install it using install.packages("dplyr").

For bigger data sets it is best to use the methods from dplyr package as they perform 30% faster. dplyr package uses C++ code to evaluate.


# Using dplyr package
library(dplyr)    
df <- df %>% 
   mutate(address = ifelse(address == '',work_address,address))
df

Yields below output.


# Output
  id        address work_address
1  5  Orange Street      Main St
2 10     Anton Blvd         <NA>
3 15 Jefferson Pkwy   Apple Blvd
4 NA      Orange St    Orange St

5. Update Missing Values with Empty/Blank String

NA values are considered as missing values, to replace these missing (NA) values with empty strings use the below example. Here, is.na() function checks if a column value is NA, if yes then it updates it with an empty string. Similarly, you can also replace empty string with NA values.


#Replace na values with blank using is.na()
df[is.na(df)] <- ""
df

# Output
#  id        address work_address
#1  5        Main St      Main St
#2 10     Anton Blvd             
#3 15 Jefferson Pkwy   Apple Blvd
#4         Orange St     Orange St

6. Update Missing Values with 0

Replacing all missing values with an empty string is not a good approach as you may have integer values and an empty string is not the right thing to use. So to replace NA with 0 on integer columns use the below approach.


# Replace only numeric columns
library("tidyr")
library("dplyr")

df <- df %>% 
    mutate_if(is.numeric, ~replace_na(., 0))

7. Complete Example of Update Column Values


# Create dataframe
df <- data.frame(id=c(1,2,3,NA),
          address=c('Orange St','Anton Blvd','Jefferson Pkwy',''),
          work_address=c('Main St',NA,'Apple Blvd','Portola Pkwy'))

df

# Replace String with Another Stirng on a single column
df$address[df$address == 'Orange St'] <- 'Portola Pkwy'
df

# Replace String with Another String on All Columns
df[df=="Portola Pkwy"] <- "Orange St"
df

# Replace String with another String
library(stringr)
df$address <- str_replace(df$address, "St", "Street")
print(df)

# Create new column from existing column
df['id'] <- df['id'] * 5
df

# Replace column value with another based on condition
df$address[df$address == 'Orange Street'] <- df$work_address
df

# Using dplyr package
library(dplyr)    
df <- df %>% 
   mutate(address = ifelse(address == '',work_address,address))
df

#Replace na values with blank using is.na()
df[is.na(df)] <- ""
df

8. Conclusion

In this article, you have learned how to change/update data frame column values, and change single, multiple, and all columns by using the R base functions/notation, and dplyr package.

References

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing How to Replace Values in R with Examples