How to remove duplicate values (duplicates) from vector in R? Handling duplicate values is one of the challenging tasks when analyzing the data. Removing duplicates comes under data cleaning which is a challenging task in data analytics. Data cleaning needs to be done before performing any operations on data as having duplicate values results in inconsistent results.
The following methods are used to remove duplicates from vector in R. In my other articles I have explained how to remove duplicate rows from DataFrame, I would recommend reading it.
- duplicated()
- unique()
- dplyr package
- union()
1. Quick Examples of Remove Duplicates from Vector
Below are quick examples of removing duplicate values from vector in R.
# Rempve duplicates from vector in R
# Create vector
v <- c('A','B','D','C','A','F','G','C','d','E','E')
#Identify Duplicates
duplicated(v)
# Remove duplicate values
!duplicated(v)
v[!duplicated(v)]
# Using unique()
unique(v)
# To remove contiguous duplicated elements only
library(dplyr)
v <- c('A','A','D','C','C','F','F','C','d','E','E')
v[v != lag(v)]
v[v != lag(v, default = v[1])]
# Using union
v <- c('A','B','D','C','A','F','G','C','d','E','E')
union(v,v)
2. Using duplicated() to Remove Duplicates from Vector
R base provides duplicated()
function that can be used to remove duplicates from the vector. This method actually identifies the duplicate values in the vector and returns a logical vector indicating which items are duplicates. If a value is duplicated its position will represent TRUE otherwise FALSE.
# Create vector
v <- c('A','B','D','C','A','F','G','C','d','E','E')
#Identify Duplicates
duplicated(v)
# Output
# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
Now you can negate this result and use it with R bracket notation [] to return a vector after removing duplicate values.
# Remove duplicate values
!duplicated(v)
v[!duplicated(v)]
# Output
#> !duplicated(v)
# [1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE
#> v[!duplicated(v)]
#[1] "A" "B" "D" "C" "F" "G" "d" "E"
3. Using unique()
Use the unique()
function to remove duplicates from the R vector. This function returns the desired unique values with just one statement.
# Using unique()
unique(v)
# Output
[1] "A" "B" "D" "C" "F" "G" "d" "E"
4. Using dplyr Package
To remove contiguous duplicate elements from the vector use function lag() from dplyr package. In order to use this function, first, you need to install R package and load it by using library() function.
# To remove contiguous duplicated elements only
library(dplyr)
v <- c('A','A','D','C','C','F','F','C','d','E','E')
v[v != lag(v, default = v[1])]
# Output
#[1] "D" "C" "F" "C" "d" "E"
5. Using union() to Remove Duplicate Values From Vector
Finally, using union() also we can remove duplicate values from Vector in R. Union is basically used to combine the results from two objects and removes any duplicates present in the combined results.
# Using union
v <- c('A','B','D','C','A','F','G','C','d','E','E')
union(v,v)
# Outputs
#[1] "A" "B" "D" "C" "F" "G" "d" "E"
Conclusion
In this article, I have explained how to remove duplicates (duplicate values) from vector in R by using duplicated(), unique(), union() and function from dplyr package. You can find the complete example from this article at Github R Programming Examples Project.
Related Articles
- Explain Character Vector in R?
- How to Get Vector Length in R?
- Add or Append Element to Vector in R?
- How to Remove NA from Vector?
- How to Create a Vector in R?
- How to Create a DataFrame From Vectors?
- How to Create Empty Vector in R?
- Create Character Vector in R?
- How to Convert Vector to List in R?
- How to Convert List to Vector in R?
- How to Concatenate Vector in R?
- Merge Vector Elements into String in R?
- How to Subset Vector in R?
- How to Sort Vector in R?
- How to Convert List to String in R?
- How to Remove Values from R Vector?