Pandas DataFrame.duplicated()
function is used to get/find/select a list of all duplicate rows(all or selected columns) from pandas. Duplicate rows means, having multiple rows on all columns. Using this method you can get duplicate rows on selected multiple columns or all columns. In this article, I will explain these with several examples.
Key Points –
- Use the
.duplicated()
method to check if rows are duplicates, returning a boolean Series indicating duplicate status. - Applying
DataFrame[DataFrame.duplicated()]
filters and returns only the duplicate rows in the DataFrame. - By default,
.duplicated()
checks all columns to identify duplicates unless specified otherwise. - The
keep
parameter in.duplicated()
allows you to specify whether to mark all duplicates, or keep the first or last occurrence only. - Use the
subset
parameter in.duplicated()
to specify particular columns for detecting duplicates, useful for partial matching. - Setting
keep=False
in.duplicated()
flags all instances of duplicate rows, returning every duplicate occurrence in the DataFrame.
1. Quick Examples of Get List of All Duplicate Items
If you are in a hurry, below are some quick examples of how to get a list of all duplicate rows in pandas DataFrame.
# Quick examples of get list of all duplicate items
# Select duplicate rows except first occurrence
# Based on all columns
df2 = df[df.duplicated()]
# Select duplicate row based on all columns
df2 = df[df.duplicated(keep=False)]
# Get duplicate last rows based on all columns
df2 = df[df.duplicated(keep = 'last')]
# Get list Of duplicate rows using single columns
df2 = df[df['Courses'].duplicated() == True]
# Get list of duplicate rows based on 'Courses' column
df2 = df[df.duplicated('Courses')]
# Get list Of duplicate rows using multiple columns
df2 = df[df[['Courses', 'Fee','Duration']].duplicated() == True]
# Get list of duplicate rows based on list of column names
df2 = df[df.duplicated(['Courses','Fee','Duration'])]
Now, let’s create a DataFrame with a few duplicate rows on all columns. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas","Python","Spark","pandas"],
'Fee' :[20000,25000,22000,30000,22000,20000,30000],
'Duration':['30days','40days','35days','50days','40days','30days','50days'],
'Discount':[1000,2300,1200,2000,2300,1000,2000]
}
df = pd.DataFrame(technologies)
print("Create DataFrame\n",df)
Yields below output.

2. Select Duplicate Rows Based on All Columns
You can use df[df.duplicated()]
without any arguments to get rows with the same values on all columns. It takes defaults values subset=None
and keep=‘first’
. The below example returns two rows as these are duplicate rows in our DataFrame.
In this example, the duplicated()
method without specifying a subset will consider all columns for identifying duplicate rows. The resulting DataFrame df2
will contain the duplicate rows based on all columns.
# Select duplicate rows of all columns
df2 = df[df.duplicated()]
print("Duplicate rows based on all columns:\n",df2)
Yields below output.

You can set 'keep=False'
in the duplicated function to get all the duplicate items without eliminating duplicate rows.
# Select duplicate row based on all columns
df2 = df[df.duplicated(keep=False)]
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 20000 30days 1000
3 pandas 30000 50days 2000
5 Spark 20000 30days 1000
6 pandas 30000 50days 2000
3. Get List of Duplicate Last Rows Based on All Columns
If you want to get a list of the last occurrence of duplicate rows based on all columns in a Pandas DataFrame, you can use the duplicated()
method with the keep='last'
parameter. This will mark all occurrences of duplicates as True
except for the last one
In the below example, the duplicated(keep='last')
method marks all occurrences of duplicate rows as True
except for the last one. The resulting DataFrame df2
will contain only the last occurrence of each duplicate row based on all columns.
# Get duplicate last rows based on all columns
df2 = df[df.duplicated(keep = 'last')]
print("Duplicate last rows based on all columns:\n",df2)
Yields below output.
# Output:
# Duplicate last rows based on all columns:
Courses Fee Duration Discount
0 Spark 20000 30days 1000
3 pandas 30000 50days 2000
4. Get List of Duplicate Rows Using Single Columns
You want to select duplicate rows based on single columns then pass the column name as an argument. For instance, create a DataFrame (df2
) containing all rows where the ‘Courses’ column has duplicate values. The duplicated()
method without specifying keep
defaults to marking all occurrences of duplicates as True
, keeping the first occurrence and marking subsequent occurrences.
# Get list Of duplicate rows using single columns
df2 = df[df['Courses'].duplicated() == True]
print(df2)
# Get list of duplicate rows based on 'Courses' column
df2 = df[df.duplicated('Courses')]
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
4 Python 22000 40days 2300
5 Spark 20000 30days 1000
6 pandas 30000 50days 2000
5. Get List of Duplicate Rows Using Multiple Columns
If you want to get a list of duplicate rows based on multiple columns in a Pandas DataFrame, you can use the duplicated()
method with the subset
parameter and specify the list of columns you want to consider for identifying duplicates.
In the below example, the df.duplicated(subset=['Courses','Fee','Duration'])
method identifies duplicate rows based on both the ‘Courses’, ‘Fee’, and ‘Duration’ columns. The resulting DataFrame df2
will contain all rows that have duplicates based on both columns.
# Get list of duplicate rows based on list of column names
df2 = df[df.duplicated(subset=['Courses','Fee','Duration'])]
print(df2)
# Get list Of duplicate rows using multiple columns
df2 = df[df[['Courses', 'Fee','Duration']].duplicated() == True]
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
5 Spark 20000 30days 1000
6 pandas 30000 50days 2000
6. Get List of Duplicate Rows Using Sort Values
Let’s see how to sort the results of duplicated()
method. You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values()
method.
# Get list Of duplicate rows using sort values
df2 = df[df.duplicated(['Discount'])==True].sort_values('Discount')
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
5 Spark 20000 30days 1000
6 pandas 30000 50days 2000
4 Python 22000 40days 2300
You can use sort_values("Discount")
instead to sort after duplicate filter.
# Using sort values
df2 = df[df.Discount.duplicated(keep=False)].sort_values("Discount")
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 20000 30days 1000
5 Spark 20000 30days 1000
3 pandas 30000 50days 2000
6 pandas 30000 50days 2000
1 PySpark 25000 40days 2300
4 Python 22000 40days 2300
Complete Example For Get List of All Duplicate Items
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas","Python","Spark","pandas"],
'Fee' :[20000,25000,22000,30000,22000,20000,30000],
'Duration':['30days','40days','35days','50days','40days','30days','50days'],
'Discount':[1000,2300,1200,2000,2300,1000,2000]
}
df = pd.DataFrame(technologies)
print(df)
# Select duplicate rows
# Except first occurrence based on all columns
df2 = df[df.duplicated()]
# Select duplicate row based on all columns
df2 = df[df.duplicated(keep=False)]
print(df2)
# Get duplicate last rows based on all columns
df2 = df[df.duplicated(keep = 'last')]
print(df2)
# Get list Of duplicate rows
# Using single columns
df2 = df[df['Courses'].duplicated() == True]
print(df2)
# Get list of duplicate rows based on 'Courses' column
df2 = df[df.duplicated('Courses')]
print(df2)
# Get list Of duplicate rows
# Using multiple columns
df2 = df[df[['Courses', 'Fee','Duration']].duplicated() == True]
print(df2)
# Get list of duplicate rows based on list of column names
df2 = df[df.duplicated(['Courses','Fee','Duration'])]
print(df2)
# Get list Of duplicate rows
# Using sort values
df2 = df[df.duplicated(['Discount'])==True].sort_values('Discount')
print(df2)
# Using sort values
df2 = df[df.Discount.duplicated(keep=False)].sort_values("Discount")
print(df2)
Frequently Asked Questions on Pandas Get List of All Duplicate Rows
To find duplicate rows in a Pandas DataFrame based on a specific column, you can use the duplicated()
method with the subset
parameter. For example, the duplicated(subset=['ID'])
method identifies duplicate rows based on the values in the ‘ID’ column. The resulting DataFrame duplicate_rows
will contain all rows that have duplicates in the ‘ID’ column.
You can identify duplicate rows based on multiple columns in a Pandas DataFrame using the duplicated()
method with the subset
parameter. Simply provide a list of column names to the subset
parameter.
To get a list of all duplicate rows based on all columns in a Pandas DataFrame, you can use the duplicated()
method without specifying the subset
parameter.
You can find duplicate rows based on the entire DataFrame, not just specific columns, by using the duplicated()
method without specifying the subset
parameter. This approach considers the entire row when identifying duplicates.
If you want to find the last occurrence of duplicate rows in a Pandas DataFrame, you can use the duplicated()
method with the keep='last'
parameter. This will mark all occurrences of duplicates as True
except for the last one.
It’s possible to sort the DataFrame before identifying duplicate rows. You can use the sort_values
method to sort the DataFrame based on one or more columns. Sorting the DataFrame can be helpful if you want to group duplicate rows together or if you want to identify duplicates in a specific order.
Conclusion
In this article, you have learned how to get/select a list of all duplicate rows (all or multiple columns) using pandas DataFrame duplicated()
method with examples.
Happy Learning !!
Related Articles
- Set Order of Columns in Pandas DataFrame
- Pandas Add Constant Column to DataFrame
- Rename Index Values of Pandas DataFrame
- Pandas Rename Index of DataFrame
- Pandas Difference Between Two DataFrames
- How to Transpose() DataFrame in Pandas?
- pandas.DataFrame.drop_duplicates() – Examples
- Pandas.Index.drop_duplicates() Explained
- Pandas Select Rows Based on List Index
- How to Read Excel Multiple Sheets in Pandas
- How to Drop Duplicate Columns in pandas DataFrame
- Select Rows From List of Values in Pandas DataFrame