• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:17 mins read
You are currently viewing Pandas Drop Rows With Condition

To drop rows in a Pandas DataFrame based on a specific condition, you can use boolean indexing or the drop method. Use pandas.DataFrame.drop() method to delete/remove rows with condition(s). In my earlier article, I have covered how to drop rows by index from DataFrame, and in this article, I will cover several examples of dropping rows with conditions, for example, string matching on a column value.

Alternatively, you can also achieve dropping rows by filtering rows and assigning them to another DataFrame.

1. Quick Examples of Drop Rows With Condition in Pandas

If you are in a hurry, below are some quick examples of pandas dropping/removing/deleting rows with condition(s).


# Quick Examples of dropping rows with condition

# Example 1: Using drop() to delete rows based on column value
df.drop(df[df['Fee'] >= 24000].index, inplace = True)

# Example 2: Remove rows
df2 = df[df.Fee >= 24000]

# Example 3: If you have space in column name
# Specify column name with in single quotes
df2 = df[df['column name']]

# Example 4: Using loc[]
df2 = df.loc[df["Fee"] >= 24000 ]

# Example 5: Delete rows based on multiple column value
df2 = df[ (df['Fee'] >= 22000) & (df['Discount'] == 2300)]

# Example 6: Drop rows with None/NaN
df2 = df[df.Discount.notnull()]

Let’s create a DataFrame with a few rows and columns and execute some examples to learn how to drop the DataFrame rows. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


# Create pandas DataFrame
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python"],
    'Fee' :[22000,25000,np.nan,24000],
    'Duration':['30days',None,'55days',np.nan],
    'Discount':[1000,2300,1000,np.nan]
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

pandas drop rows condition

2. Using DataFrame.drop() to Drop Rows with Condition

drop() method takes several parameters that help you to delete rows from DataFrame by checking condition. When the condition expression satisfies it returns True which actually removes the rows.

In the below example, drops rows where the ‘Fee’ column has values greater than or equal to 24000. The inplace=True parameter modifies the original DataFrame without the need to reassign the result.


# Using DataFrame.drop() to Drop Rows with Condition
df.drop(df[df['Fee'] >= 24000].index, inplace = True)
print("Drop rows with condition:\n", df)

Yields below output.

pandas drop rows condition

After removing rows, it is always recommended to reset the row index.

2. Using loc[] to Drop Rows by Condition

Alternatively, you can also try another most used approach to drop rows by condition using loc[] and df[].

Note that these methods actually filter the data, by negating this you will get the desired output.


# Remove row
df2 = df[df.Fee >= 24000]
print("Drop rows with condition:\n", df2)

# Using loc[]
df2 = df.loc[df["Fee"] >= 24000 ]
print("Drop rows with condition:\n", df2)

Yields below output.


# Output:
# Drop rows with condition:
   Courses      Fee Duration  Discount
1  PySpark  25000.0     None    2300.0
3   Python  24000.0      NaN       NaN

3. Drop Rows Based on Multiple Conditions

Sometimes it may require you to drop the rows based on multiple conditions. You can just extend the usage of the above examples to do so.

In the below example, to select rows based on multiple conditions in a Pandas DataFrame. Specifically, it selects rows where the ‘Fee’ column is greater than or equal to 22000 and the ‘Discount’ column is equal to 2300.


# Delect rows based on multiple column value
df = pd.DataFrame(technologies)
df = df[ (df['Fee'] >= 22000) & (df['Discount'] == 2300)]
print("Drop rows with condition:\n", df)

Yields below output.


# Output:
# Drop rows with condition:
   Courses      Fee Duration  Discount
1  PySpark  25000.0     None    2300.0

This code creates a boolean mask by applying the specified conditions to the DataFrame and then uses the mask to select only the rows that satisfy both conditions.

4. Other Ways to Delete Rows from Pandas DataFrame

You can also delete rows by using query() method. Note that these methods actually filter the rows from pandas DataFrame, by negating this you can drop the rows.



# Delete rows using DataFrame.query()
df2=df.query("Courses == 'Spark'")

# Using variable
value='Spark'
df2=df.query("Courses == @value")

# Inpace
df.query("Courses == 'Spark'",inplace=True)

# Not equals, in & multiple conditions
df.query("Courses != 'Spark'")
df.query("Courses in ('Spark','PySpark')")
df.query("`Courses Fee` >= 23000")
df.query("`Courses Fee` >= 23000 and `Courses Fee` <= 24000")

# Other ways to Delete Rows
df.loc[df['Courses'] == value]
df.loc[df['Courses'] != 'Spark']
df.loc[df['Courses'].isin(values)]
df.loc[~df['Courses'].isin(values)]
df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]
df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000 )]

df[df["Courses"] == 'Spark'] 
df[df['Courses'].str.contains("Spark")]
df[df['Courses'].str.lower().str.contains("spark")]
df[df['Courses'].str.startswith("P")]

# Using lambda
df.apply(lambda row: row[df['Courses'].isin(['Spark','PySpark'])])
df.dropna()

5. Delete Rows Based on Inverse of Condition

If you need to drop() all rows which are not equal to a value given for a column. pandas offer a negation (~) operation to perform this feature. For E.x: df.drop(df1,inplace=True).


# Delect rows based on inverse of column values
df1 = df[~(df['Courses'] == "PySpark")].index 
df.drop(df1, inplace = True)
print("Drop rows with condition:\n", df)

Yields below output.


# Output:
# Drop rows with condition:
   Courses    Fee Duration  Discount
b  PySpark  25000   50days      2300
f  PySpark  25000   50days      2000

6. Complete Example

Below is a complete example of how to remove/delete/drop rows with conditions in Pandas DataFrame.


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python"],
    'Fee' :[22000,25000,np.nan,24000],
    'Duration':['30day',None,'55days',np.nan],
    'Discount':[1000,2300,1000,np.nan]
          }
df = pd.DataFrame(technologies)
print(df)

# Using drop() to remove rows
df.drop(df[df['Fee'] >= 24000].index, inplace = True)
print(df)

# Remove rows
df = pd.DataFrame(technologies)
df2 = df[df.Fee >= 24000]
print(df2)

# Reset index after deleting rows
df2 = df[df.Fee >= 24000].reset_index()
print(df2)

# If you have space in column name.
# Surround the column name with single quote
df2 = df[df['column name']]

# Using loc
df2 = df.loc[df["Fee"] >= 24000 ]
print(df2)

# Delect rows based on multiple column value
df2 = df[(df['Fee'] >= 22000) & (df['Discount'] == 2300)]
print(df2)

# Drop rows with None/NaN
df2 = df[df.Discount.notnull()]
print(df2)

Frequently Asked Questions on Drop Pandas Rows with Condition

How do I drop rows in a Pandas DataFrame based on a condition?

You can use the drop() method along with boolean indexing to drop rows based on a condition.

How can I drop rows in a Pandas DataFrame based on a specific condition?

To drop rows in a Pandas DataFrame based on a specific condition, you can use boolean indexing.

What if I want to drop rows where a specific column has missing values?

To drop rows where a specific column has missing values (NaN), you can use the dropna method.

Can I drop rows based on multiple conditions?

You can drop rows based on multiple conditions in a Pandas DataFrame using logical operators like & (AND) or | (OR) to combine the conditions.

What if I want to drop rows based on conditions for multiple columns using an OR condition?

To drop rows based on conditions for multiple columns using an OR condition, you can use the | (OR) logical operator to combine the conditions.

How can I drop rows based on conditions and reset the index?

If you want to drop rows based on conditions and reset the index of the DataFrame, you can use the reset_index method.

Is there a way to drop rows based on conditions and update the DataFrame in-place?

You can drop rows based on conditions and update the DataFrame in-place by using the inplace=True parameter with the drop method.

7. Conclusion

In this article, you have learned how to drop/delete/remove Pandas DataFrame rows with single and multiple conditions by using examples.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium