Pandas.Index.drop_duplicates() function is used to drop/remove duplicates from an index. It is often required to remove duplicate data as part of Data analysis.
<strong>Index.drop_duplicates()</strong> function returns Index object with the duplicate values removed. This function provides the flexibility to choose which duplicate value to be retained. We can drop all duplicate values from the list or leave the first/last occurrence of the duplicated values.
1. Syntax of Index.drop_duplicates()
Following is the syntax of the index.drop_duplicates(). Parameter
keep takes one of the following values ‘
False, default is ‘
# Syntax of Index.drop_duplicates() Index.drop_duplicates(keep='first')
first’ : Drop duplicates except for the first occurrence.
last’ : Drop duplicates except for the last occurrence.
False: Drop all duplicates.
This return Index with duplicate values removed. The parameter ‘
keep‘ controls which duplicate values should be removed. The value ‘
first’ keeps the first occurrence for each set of duplicated entries.
1. Drop All Duplicates in pandas Index.
Pandas Index is a immutable sequence used for indexing and alignment. This is used to store axis labels for all pandas objects. Sometimes you may have duplicates in pandas index and you can drop these using index.drop_duplicates() (dropduplicates). In order to explain this with example, first, lets create an Index which contains duplicates values as show in below.
importing pandas as pd import pandas as pd # Creating the Index idx = pd.Index([15, 21, 4, 4, 22, 4, 3, 21]) # Print the Index print(idx)
Below is the output .
# Output: Int64Index([15, 21, 4, 4, 22, 4, 3, 21], dtype='int64')
Now, let’s drop all occurrences of duplicate values in the Index by using drop_duplicates() as shown below, I am using keep=False as I wanted to remove all occurance of duplicates.
# Drop all duplicate occurrences of the index idx2=idx.drop_duplicates(keep = False) print(idx2)
Following is the output for the above example, where you see all the duplicates are removed.
# Output: Int64Index([15, 22, 3], dtype='int64')
2. Drop Duplicates Except the First Occurrence
Now drop all occurrences of duplicates in the Index except the first occurrence. By default ‘
first‘ is taken as a value to the keep parameter. Below is the example code.
# Drop Duplicates Except the First Occurrence idx2 = idx.drop_duplicates(keep ='first') print(idx2)
So after applying drop_duplicates(keep=’first’) on Index object
idx , all the duplicates in the Index has been dropped by keeping the first occurences . Below is the output for the same.
# Output: Int64Index([15, 21, 4, 22, 3], dtype='int64')
In this article I have explained how to drop duplicates based on Index using Index.drop_duplicates() function. Also explained how to use the keep parameter that takes ‘first/last/false’ values, which controls the deletion of duplicate values.
- How to Drop Rows From Pandas DataFrame Examples
- Drop Single & Multiple Columns From Pandas DataFrame
- Get the Row Count From Pandas DataFrame
- Change Column Data Type On Pandas DataFrame
- Pandas apply() Function to Single & Multiple Column(s)
- pandas.DataFrame.drop_duplicates() – Examples
- How to Drop Duplicate Columns in pandas DataFrame
- Pandas Drop Duplicate Rows in DataFrame