You are currently viewing R dplyr mutate() – Replace Column Values

Use mutate() and its other verbs mutate_all(), mutate_if() and mutate_at() from R dplyr package to replace/update the values of the column (string, integer, or any type) in DataFrame (data.frame). For more methods of this package refer to the R dplyr tutorial.

dplyr is a third-party package hence, you need to load the library using library("dplyr") to use its methods. In case you don’t have this package, install it using install.packages("dplyr").

For bigger data sets it is best to use the methods from dplyr package as they perform 30% faster to replace column values. dplyr package uses C++ code to evaluate.

Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.


# Create DataFrame
df <- data.frame(id=c(1,2,3,NA),
         address=c('Orange St','Anton Blvd','Jefferson Pkwy',''),
         work_address=c('Main St',NA,'Apple Blvd','Portola Pkwy'))

df

# Output
#  id        address work_address
#1  1      Orange St      Main St
#2  2     Anton Blvd         <NA>
#3  3 Jefferson Pkwy   Apple Blvd
#4 NA                Portola Pkwy

1. Replace using dplyr mutate() – Update on Selected Column

Use mutate() method from dplyr package to replace R DataFrame column value. The following example replaces all instances of the street with st on the address column.


library("dplyr")
# Replace on selected column
df <- df %>% 
  mutate(address = str_replace(address, "St", "Street"))
df

Here, %>% is an infix operator which acts as a pipe, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator.

2. Replace using dplyr mutate_all() – Update All Columns

Use mutate_all() from dplyr package to change values on all columns, the following example replaces all instances of Street with St on all columns. Since we have Street on the address and work_address columns, these two would get updated.


library("dplyr")
# Replace on all columns
df <- df %>% 
  mutate_all(funs(str_replace(., "St", "Street")))
df

3. Replace using dplyr mutate_if() – Update On All Numeric Columns

Use mutate_if() to update the column values conditionally, the following example replaces NA with 0 on all numeric columns. is.numeric selects only numeric columns.


library("dplyr")
library("tidyr")
#Example 3 - Replace only on numeric columns
df <- df %>% 
  mutate_if(is.numeric, ~replace_na(.,0))
df

4. Replace using dplyr mutate_at() – Update on Multiple Columns

mutate_all() method is used to update on multiple selected columns by name. The following examples update address and work_address columns.


library("dplyr")
# Replace on selected columns
df <- df %>% 
  mutate_at(c('address','work_address'),funs(str_replace(., "St", "Street")))
df

5. Replace using dplyr mutate_at() – Update on Selected Column Index Position

Similarly, you can also use mutate_all() method to select multiple columns by position index and replace the specified values. The following example updates columns 2 and 3 which are the address and work_address columns.


library("dplyr")
# Replace on select index
df <- df %>% 
  mutate_at(c(2,3),funs(str_replace(., "St", "Street")))
df

6. Complete Example

Following is a complete example of using mutate(), mutate_all(), mutate_if() and mutate_at() from dplyr to replace/change the column values in an R DataFrame


# Create DataFrame
df <- data.frame(id=c(1,2,3,NA),
      address=c('Orange St','Anton Blvd','Jefferson Pkwy',''),
      work_address=c('Main St',NA,'Apple Blvd','Portola Pkwy'))

df

library("dplyr")
# Replace on selected columns
df <- df %>% 
  mutate_at(c('address','work_address'),funs(str_replace(., "St", "Street")))
df

# Replace on select index
df <- df %>% 
  mutate_at(c(2,3),funs(str_replace(., "St", "Street")))
df

# Replace on all columns
library("dplyr")
df <- df %>% 
  mutate_all(funs(str_replace(., "St", "Street")))
df

#Example 3 - Replace only on numeric columns
library("tidyr")
df <- df %>% 
  mutate_if(is.numeric, ~replace_na(.,0))
df

7. Conclusion

In this article, you have learned how to use methods from dplyr package to replace/update values in an R dataframe. dplyr is a third-party package hence, you need to load the library using library("dplyr") to use its methods. In case you don’t have this package, install it using install.packages("dplyr").

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has One Comment

  1. Chuck

    Hello, great post but looks like it needs a bit of an update. I received the following error: “Warning: `funs()` was deprecated in dplyr 0.8.0.
    Please use a list of either functions or lambdas”

Comments are closed.