Pandas Drop Rows With Condition

To drop rows in a Pandas DataFrame based on a specific condition, you can use boolean indexing or the drop method. Use pandas.DataFrame.drop() method to delete/remove rows with condition(s). In this article, I will explain Pandas drop rows with the condition by using drop() method, which facilitates the removal of rows from a DataFrame based on specified conditions, enhancing data cleaning and manipulation capabilities.

Key Points –

Pandas provides the drop() method to remove rows from a DataFrame based on specified conditions.
Conditions can be defined using logical expressions involving column values or by applying functions to DataFrame columns.
The drop() method accepts parameters such as axis to specify whether to drop rows (axis=0) or columns (axis=1), and inplace to modify the DataFrame in place if set to True.
Conditions can be constructed using comparison operators such as ==, !=, <, >, <=, and >=, as well as logical operators like & (and), | (or), and ~ (not).
The drop() method in Pandas offers flexibility to remove rows satisfying logical conditions, enhancing data filtering capabilities for analysis and processing.

Quick Examples of Drop Rows With Condition

If you are in a hurry, below are some quick examples of pandas drop rows with conditions.


# Quick examples of drop rows with condition
# Using DataFrame.drop() 
# To Drop Rows with Condition
df.drop(df[df['Fee'] >= 24000].index, inplace = True)

# Remove rows
df = pd.DataFrame(technologies)
df2 = df[df.Fee >= 24000]

# Reset index after deleting rows
df2 = df[df.Fee >= 24000].reset_index()

# If you have space in column name.
# Surround the column name with single quote
df2 = df[df['column name']]

# Using loc
df2 = df.loc[df["Fee"] >= 24000 ]

# Delect rows based on multiple column value
df2 = df[(df['Fee'] >= 22000) & (df['Discount'] == 2300)]

# Drop rows with None/NaN
df2 = df[df.Discount.notnull()]

To run some examples of drop rows with the condition, let’s create a DataFrame with some rows and columns and then execute the examples provided to validate the results


# Create pandas DataFrame
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python"],
    'Fee' :[22000,25000,np.nan,24000],
    'Duration':['30days',None,'55days',np.nan],
    'Discount':[1000,2300,1000,np.nan]
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

Using DataFrame.drop() to Drop Rows with Condition

Using DataFrame.drop() to drop rows based on a condition is a common operation in Pandas. This method allows you to remove rows from a DataFrame that meet specific criteria.


# Using DataFrame.drop() to Drop Rows with Condition
df.drop(df[df['Fee'] >= 24000].index, inplace = True)
print("Drop rows with condition:\n", df)

In the above example, rows with a Fee value greater than or equal to 24000 are dropped from the DataFrame. The df[df['Fee'] >= 24000] selects rows where the Fee column satisfies the condition. Then, the .index attribute is used to get the index labels of these rows, which are passed to df.drop() to remove them from the original DataFrame. Finally, inplace=True ensures that the changes are made to the original DataFrame in place. This example yields the below output.

Resetting the row index after removing rows is often recommended to maintain a clean and consecutive index.

Using loc[] to Drop Rows by Condition

Alternatively, another commonly used approach to drop rows based on a condition involves using boolean indexing with both .loc[] and directly on the DataFrame using boolean indexing with df[].


# Remove row
df2 = df[df.Fee >= 24000]
print("Drop rows with condition:\n", df2)

# Using loc[]
df2 = df.loc[df["Fee"] >= 24000 ]
print("Drop rows with condition:\n", df2)

Yields below output.


# Output:
# Drop rows with condition:
   Courses      Fee Duration  Discount
1  PySpark  25000.0     None    2300.0
3   Python  24000.0      NaN       NaN

Drop Rows Based on Multiple Conditions

Dropping rows based on multiple conditions can be achieved by combining boolean indexing with logical operators such as & (and), | (or), and ~ (not). You can just extend the usage of the above examples to do so.


# Delect rows based on multiple column value
df = pd.DataFrame(technologies)
df = df[ (df['Fee'] >= 22000) & (df['Discount'] == 2300)]
print("Drop rows with condition:\n", df)

# Resetting the row index
df.reset_index(drop=True, inplace=True)
print("\nDataFrame after resetting the index:\n", df)

In the above example, select rows based on multiple conditions in a Pandas DataFrame. Specifically, it selects rows where the Fee column is greater than or equal to 22000 and the Discount column is equal to 2300. This example yields the below output.


# Output:
# Drop rows with condition:
   Courses      Fee Duration  Discount
1  PySpark  25000.0     None    2300.0

This code creates a boolean mask by applying the specified conditions to the DataFrame and then uses the mask to select only the rows that satisfy both conditions.

Other Ways to Delete Rows from Pandas DataFrame

Similarly, the query() method in Pandas provides a convenient way to filter DataFrame rows based on a query expression. However, please note that the query() method in Pandas DataFrame is used for filtering rows based on a condition, not for deleting rows directly. Instead, you can use the drop() method to remove rows based on the condition



# Delete rows using DataFrame.query()
df2=df.query("Courses == 'Spark'")

# Using variable
value='Spark'
df2=df.query("Courses == @value")

# Inpace
df.query("Courses == 'Spark'",inplace=True)

# Not equals, in & multiple conditions
df.query("Courses != 'Spark'")
df.query("Courses in ('Spark','PySpark')")
df.query("`Courses Fee` >= 23000")
df.query("`Courses Fee` >= 23000 and `Courses Fee` <= 24000")

# Other ways to Delete Rows
df.loc[df['Courses'] == value]
df.loc[df['Courses'] != 'Spark']
df.loc[df['Courses'].isin(values)]
df.loc[~df['Courses'].isin(values)]
df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]
df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000 )]

df[df["Courses"] == 'Spark'] 
df[df['Courses'].str.contains("Spark")]
df[df['Courses'].str.lower().str.contains("spark")]
df[df['Courses'].str.startswith("P")]

# Using lambda
df.apply(lambda row: row[df['Courses'].isin(['Spark','PySpark'])])
df.dropna()

Delete Rows Based on Inverse of Condition

Finally, the negation (~) operation can be used to drop all rows except those that meet a certain condition. This is a very handy feature in Pandas for quickly filtering rows based on the inverse of a condition.


# Delect rows based on inverse of column values
df1 = df[~(df['Courses'] == "PySpark")].index 
df.drop(df1, inplace = True)
print("Drop rows with condition:\n", df)

# Dropping rows where the 'Courses' column is not equal to 'PySpark' 
# Using negation (~)
df.drop(df[~(df['Courses'] == "PySpark")].index, inplace=True)
print("Drop rows with condition:\n", df)

The above program df[~(df['Courses'] == "PySpark")].index generates the index labels of rows where the Courses column is not equal to PySpark. Then, df.drop() is used to delete those rows based on the generated index labels. This effectively deletes rows where the Courses column value is not equal to PySpark. This example yields the below output.


# Output:
# Drop rows with condition:
   Courses    Fee Duration  Discount
b  PySpark  25000   50days      2300
f  PySpark  25000   50days      2000

Complete Example

Below is a complete example of how to remove/delete/drop rows with conditions in Pandas DataFrame.


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python"],
    'Fee' :[22000,25000,np.nan,24000],
    'Duration':['30day',None,'55days',np.nan],
    'Discount':[1000,2300,1000,np.nan]
          }
df = pd.DataFrame(technologies)
print(df)

# Using drop() to remove rows
df.drop(df[df['Fee'] >= 24000].index, inplace = True)
print(df)

# Remove rows
df = pd.DataFrame(technologies)
df2 = df[df.Fee >= 24000]
print(df2)

# Reset index after deleting rows
df2 = df[df.Fee >= 24000].reset_index()
print(df2)

# If you have space in column name.
# Surround the column name with single quote
df2 = df[df['column name']]

# Using loc
df2 = df.loc[df["Fee"] >= 24000 ]
print(df2)

# Delect rows based on multiple column value
df2 = df[(df['Fee'] >= 22000) & (df['Discount'] == 2300)]
print(df2)

# Drop rows with None/NaN
df2 = df[df.Discount.notnull()]
print(df2)

FAQ on Drop Pandas Rows with Condition

How can I drop rows in a Pandas DataFrame based on a specific condition?

To drop rows in a Pandas DataFrame based on a specific condition, you can use boolean indexing.

What if I want to drop rows where a specific column has missing values?

To drop rows where a specific column has missing values (NaN), you can use the dropna method.

Can I drop rows based on multiple conditions?

You can drop rows based on multiple conditions in a Pandas DataFrame using logical operators like & (AND) or | (OR) to combine the conditions.

How can I drop rows based on conditions and reset the index?

If you want to drop rows based on conditions and reset the index of the DataFrame, you can use the reset_index method.

Is there a way to drop rows based on conditions and update the DataFrame in-place?

You can drop rows based on conditions and update the DataFrame in-place by using the inplace=True parameter with the drop method.

Conclusion

In this article, I have explained how to drop/delete/remove Pandas DataFrame rows with single and multiple conditions by using examples.

Happy Learning !!

References

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html