Pandas.Index.drop_duplicates() Explained

Pandas.Index.drop_duplicates() function is used to drop/remove duplicates from an index. It is often required to remove duplicate data as part of Data analysis.

<strong>Index.drop_duplicates()</strong> function returns Index object with the duplicate values removed. This function provides the flexibility to choose which duplicate value to be retained. We can drop all duplicate values from the list or leave the first/last occurrence of the duplicated values.

1. Syntax of Index.drop_duplicates()

Following is the syntax of the index.drop_duplicates(). Parameter keep takes one of the following values ‘first’, ‘last’, False, default is ‘first’.

# Syntax of Index.drop_duplicates()
  • first’ : Drop duplicates except for the first occurrence.
  • last’ : Drop duplicates except for the last occurrence.
  • False : Drop all duplicates.

This return Index with duplicate values removed. The parameter ‘keep‘ controls which duplicate values should be removed. The value ‘first’ keeps the first occurrence for each set of duplicated entries.

1. Drop All Duplicates in pandas Index.

Pandas Index is a immutable sequence used for indexing and alignment. This is used to store axis labels for all pandas objects. Sometimes you may have duplicates in pandas index and you can drop these using index.drop_duplicates() (dropduplicates). In order to explain this with example, first, lets create an Index which contains duplicates values as show in below.

importing pandas as pd
import pandas as pd
# Creating the Index
idx = pd.Index([15, 21, 4, 4, 22, 4, 3, 21])
# Print the Index

Below is the output .

# Output:
Int64Index([15, 21, 4, 4, 22, 4, 3, 21], dtype='int64')

Now, let’s drop all occurrences of duplicate values in the Index by using drop_duplicates() as shown below, I am using keep=False as I wanted to remove all occurance of duplicates.

# Drop all duplicate occurrences of the index
idx2=idx.drop_duplicates(keep = False)

Following is the output for the above example, where you see all the duplicates are removed.

# Output:
Int64Index([15, 22, 3], dtype='int64')

2. Drop Duplicates Except the First Occurrence

Now drop all occurrences of duplicates in the Index except the first occurrence. By default ‘first‘ is taken as a value to the keep parameter. Below is the example code.

# Drop Duplicates Except the First Occurrence
idx2 = idx.drop_duplicates(keep ='first')

So after applying drop_duplicates(keep=’first’) on Index object idx , all the duplicates in the Index has been dropped by keeping the first occurences . Below is the output for the same.

# Output:
Int64Index([15, 21, 4, 22, 3], dtype='int64')

Related: Pandas Get List of All Duplicate Rows


In this article I have explained how to drop duplicates based on Index using Index.drop_duplicates() function. Also explained how to use the keep parameter that takes ‘first/last/false’ values, which controls the deletion of duplicate values.


Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply

You are currently viewing Pandas.Index.drop_duplicates() Explained