Use pandas.DataFrame.drop()
method to delete/remove rows with condition(s). In my earlier article, I have covered how to drop rows by index from DataFrame, and in this article, I will cover several examples of dropping rows with conditions, for example, string matching on a column value.
Alternatively, you can also achieve dropping rows by filtering rows and assigning them to another DataFrame.
1. Quick Examples of Drop Rows With Condition in Pandas
If you are in a hurry, below are some quick examples of pandas dropping/removing/deleting rows with condition(s).
# Quick Examples of drop rows with condition
# Using drop() to delete rows based on column value
df.drop(df[df['Fee'] >= 24000].index, inplace = True)
# Remove rows
df2 = df[df.Fee >= 24000]
# If you have space in column name
# Specify column name with in single quotes
df2 = df[df['column name']]
# Using loc
df2 = df.loc[df["Fee"] >= 24000 ]
# Delete rows based on multiple column value
df2 = df[ (df['Fee'] >= 22000) & (df['Discount'] == 2300)]
# Drop rows with None/NaN
df2 = df[df.Discount.notnull()]
Let’s create a DataFrame with a few rows and columns and execute some examples to learn how to drop the DataFrame rows. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create pandas DataFrame
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python"],
'Fee' :[22000,25000,np.nan,24000],
'Duration':['30day',None,'55days',np.nan],
'Discount':[1000,2300,1000,np.nan]
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000.0 30day 1000.0
1 PySpark 25000.0 None 2300.0
2 Hadoop NaN 55days 1000.0
3 Python 24000.0 NaN NaN
2. Using DataFrame.drop() to Drop Rows with Condition
drop()
method takes several params that help you to delete rows from DataFrame by checking condition. When condition expression satisfies it returns True which actually removes the rows.
# Using DataFrame.drop() to Drop Rows with Condition
df.drop(df[df['Fee'] >= 24000].index, inplace = True)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000.0 30day 1000.0
2 Hadoop NaN 55days 1000.0
After removing rows, it is always recommended to reset the row index.
2. Using loc[] to Drop Rows by Condition
Alternatively, you can also try another most used approach to drop rows by condition using loc[] and df[].
Note that these methods actually filter the data, by negating this you will get the desired output.
# Remove row
df2 = df[df.Fee >= 24000]
print(df2)
# Using loc[]
df2 = df.loc[df["Fee"] >= 24000 ]
print(df2)
Yields same output as above.
# Output:
Courses Fee Duration Discount
1 PySpark 25000.0 None 2300.0
3 Python 24000.0 NaN NaN
3. Drop Rows Based on Multiple Conditions
Sometimes it may require you to drop the rows based on multiple conditions. You can just extend the usage of the above examples to do so.
# Delect rows based on multiple column value
df = pd.DataFrame(technologies)
df = df[ (df['Fee'] >= 22000) & (df['Discount'] == 2300)]
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
1 PySpark 25000.0 None 2300.0
4. Other Ways to Delete Rows from pandas DataFrame
You can also delete rows by using query()
method. Note that these methods actually filter the rows from pandas DataFrame, by negating this you can drop the rows.
# Delete rows using DataFrame.query()
df2=df.query("Courses == 'Spark'")
# Using variable
value='Spark'
df2=df.query("Courses == @value")
# Inpace
df.query("Courses == 'Spark'",inplace=True)
# Not equals, in & multiple conditions
df.query("Courses != 'Spark'")
df.query("Courses in ('Spark','PySpark')")
df.query("`Courses Fee` >= 23000")
df.query("`Courses Fee` >= 23000 and `Courses Fee` <= 24000")
# Other ways to Delete Rows
df.loc[df['Courses'] == value]
df.loc[df['Courses'] != 'Spark']
df.loc[df['Courses'].isin(values)]
df.loc[~df['Courses'].isin(values)]
df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]
df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000 )]
df[df["Courses"] == 'Spark']
df[df['Courses'].str.contains("Spark")]
df[df['Courses'].str.lower().str.contains("spark")]
df[df['Courses'].str.startswith("P")]
# Using lambda
df.apply(lambda row: row[df['Courses'].isin(['Spark','PySpark'])])
df.dropna()
5. Delete Rows Based on Inverse of Condition
If you need to drop() all rows which are not equal to a value given for a column. pandas offer negation (~) operation to perform this feature. For E.x: df.drop(df1,inplace=True)
.
# Delect rows based on inverse of column values
df1 = df[~(df['Courses'] == "PySpark")].index
df.drop(df1, inplace = True)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
b PySpark 25000 50days 2300
f PySpark 25000 50days 2000
6. Complete Example
Below is a complete example of how to remove/delete/drop rows with conditions in pandas DataFrame.
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python"],
'Fee' :[22000,25000,np.nan,24000],
'Duration':['30day',None,'55days',np.nan],
'Discount':[1000,2300,1000,np.nan]
}
df = pd.DataFrame(technologies)
print(df)
# Using drop() to remove rows
df.drop(df[df['Fee'] >= 24000].index, inplace = True)
print(df)
# Remove rows
df = pd.DataFrame(technologies)
df2 = df[df.Fee >= 24000]
print(df2)
# Reset index after deleting rows
df2 = df[df.Fee >= 24000].reset_index()
print(df2)
# If you have space in column name.
# Surround the column name with single quote
df2 = df[df['column name']]
# Using loc
df2 = df.loc[df["Fee"] >= 24000 ]
print(df2)
# Delect rows based on multiple column value
df2 = df[(df['Fee'] >= 22000) & (df['Discount'] == 2300)]
print(df2)
# Drop rows with None/NaN
df2 = df[df.Discount.notnull()]
print(df2)
7. Conclusion
In this article, you have learned how to drop/delete/remove pandas DataFrame rows with single and multiple conditions by using examples.
Happy Learning !!
Related Articles
- How to Drop Rows From Pandas DataFrame Examples
- Drop Single & Multiple Columns From Pandas DataFrame
- Drop N Rows From DataFrame
- Get the Row Count From Pandas DataFrame
- Change Column Data Type On Pandas DataFrame
- Pandas apply() Function to Single & Multiple Column(s)
- Pandas Drop First Column From DataFrame
- Pandas Drop Last Column From DataFrame
- Pandas Drop Rows by Index