You are currently viewing R Remove Duplicates From Vector

How to remove duplicate values (duplicates) from vector in R? Handling duplicate values is one of the challenging tasks when analyzing the data. Removing duplicates comes under data cleaning which is a challenging task in data analytics. Data cleaning needs to be done before performing any operations on data as having duplicate values results in inconsistent results.

The following methods are used to remove duplicates from vector in R. In my other articles I have explained how to remove duplicate rows from DataFrame, I would recommend reading it.

  • duplicated()
  • unique()
  • dplyr package
  • union()

1. Quick Examples of Remove Duplicates from Vector

Below are quick examples of removing duplicate values from vector in R.


# Rempve duplicates from vector in R

# Create vector
v <- c('A','B','D','C','A','F','G','C','d','E','E')

#Identify Duplicates
duplicated(v)

# Remove duplicate values
!duplicated(v)
v[!duplicated(v)]

# Using unique()
unique(v)

# To remove contiguous duplicated elements only 
library(dplyr)
v <- c('A','A','D','C','C','F','F','C','d','E','E')
v[v != lag(v)]
v[v != lag(v, default = v[1])]

# Using union
v <- c('A','B','D','C','A','F','G','C','d','E','E')
union(v,v)

2. Using duplicated() to Remove Duplicates from Vector

R base provides duplicated() function that can be used to remove duplicates from the vector. This method actually identifies the duplicate values in the vector and returns a logical vector indicating which items are duplicates. If a value is duplicated its position will represent TRUE otherwise FALSE.


# Create vector
v <- c('A','B','D','C','A','F','G','C','d','E','E')

#Identify Duplicates
duplicated(v)

# Output
# [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE

Now you can negate this result and use it with R bracket notation [] to return a vector after removing duplicate values.


# Remove duplicate values
!duplicated(v)
v[!duplicated(v)]

# Output
#> !duplicated(v)
# [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE
#> v[!duplicated(v)]
#[1] "A" "B" "D" "C" "F" "G" "d" "E"

3. Using unique()

Use the unique() function to remove duplicates from the R vector. This function returns the desired unique values with just one statement. 


# Using unique()
unique(v)

# Output
[1] "A" "B" "D" "C" "F" "G" "d" "E"

4. Using dplyr Package

To remove contiguous duplicate elements from the vector use function lag() from dplyr package. In order to use this function, first, you need to install R package and load it by using library() function.


# To remove contiguous duplicated elements only 
library(dplyr)
v <- c('A','A','D','C','C','F','F','C','d','E','E')
v[v != lag(v, default = v[1])]

# Output
#[1] "D" "C" "F" "C" "d" "E"

5. Using union() to Remove Duplicate Values From Vector

Finally, using union() also we can remove duplicate values from Vector in R. Union is basically used to combine the results from two objects and removes any duplicates present in the combined results. 


# Using union
v <- c('A','B','D','C','A','F','G','C','d','E','E')
union(v,v)

# Outputs
#[1] "A" "B" "D" "C" "F" "G" "d" "E"

Conclusion

In this article, I have explained how to remove duplicates (duplicate values) from vector in R by using duplicated(), unique(), union() and function from dplyr package. You can find the complete example from this article at Github R Programming Examples Project.

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium