• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:14 mins read
You are currently viewing Pandas Drop Rows Based on Column Value

Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.

In my earlier article, I have covered how to drop rows by index label from DataFrame, and in this article, I will cover several examples of dropping rows based on column value.

Alternatively, you can also achieve dropping rows by filtering rows and assigning them to another DataFrame.

1. Quick Examples of Delete Pandas Rows Based on Column Value

If you are in a hurry, below are some quick examples of Pandas deleting rows based on column value.


# Quick Examples of dropping rows based on column value:

# Exmple 1: Using drop() to delete rows based on column value
df.drop(df[df['Fee'] >= 24000].index, inplace = True)

# Exmple 2: Remove rows
df2 = df[df.Fee >= 24000]

# Exmple 3: If you have space in column name
# Specify column name with in single quotes
df2 = df[df['column name']]

# Exmple 4: Using loc
df2 = df.loc[df["Fee"] >= 24000 ]

# Exmple 5: Delect rows based on multiple column value
df2 = df[ (df['Fee'] >= 22000) & (df['Discount'] == 2300)]

# Exmple 6: Drop rows with None/NaN
df2 = df[df.Discount.notnull()]

Let’s create a DataFrame with a few rows and columns and execute some examples to learn how to drop DataFrame row. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


# Create pandas DataFrame
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python"],
    'Fee' :[22000,25000,np.nan,24000],
    'Duration':['30day',None,'55days',np.nan],
    'Discount':[1000,2300,1000,np.nan]
          }
df = pd.DataFrame(technologies)
print("DataFrame:\n", df)

Yields below output.

pandas drop rows value

2. The Delete Rows Based on Column Values

drop() method takes several parameters that help you delete rows from DataFrame by checking column values. When the expression is satisfied it returns True which actually removes the rows.


# Delete rows using drop()
df.drop(df[df['Fee'] >= 24000].index, inplace = True)
print("Drop rows based on column value:\n", df)

Yields below output.

pandas drop rows value

After removing rows, it is always recommended to reset the row index.

3. Using loc[]

Alternatively, you can also try another most used approach to drop rows based on column values using loc[] and df[].

Note that these methods actually filter the data, by negating this you will get the desired output.


# Remove row
df2 = df[df.Fee >= 24000]
print("Drop rows based on column value:\n", df)

# Using loc[]
df2 = df.loc[df["Fee"] >= 24000 ]
print("Drop rows based on column value:\n", df)

Yields the same output as above.


# Output:
# Drop rows based on column value:
   Courses      Fee Duration  Discount
1  PySpark  25000.0     None    2300.0
3   Python  24000.0      NaN       NaN

4. Delete Rows Based on Multiple Column Values

Sometimes it may require you to delete the rows based on matching values of multiple columns.


# Delect rows based on multiple column value
df = pd.DataFrame(technologies)
df = df[ (df['Fee'] >= 22000) & (df['Discount'] == 2300)]
print("Drop rows based on multiple column values:\n", df)

Yields below output.


# Output:
# Drop rows based on multiple column values:
   Courses      Fee Duration  Discount
1  PySpark  25000.0     None    2300.0

5. Delete Rows Based on None or NaN Column Values

When you have None or NaN values on columns, you may need to remove NaN values before you apply some calculations. you can do this using notnull() function.

Note: With None or NaN values you cannot use == or != operators.


# Drop rows with None/NaN values
df2 = df[df.Discount.notnull()]
print("Drop rows based on column value:\n", df)

Yields below output


# Output:
# Drop rows based on column value:
   Courses      Fee Duration  Discount
0    Spark  22000.0    30day    1000.0
1  PySpark  25000.0     None    2300.0
2   Hadoop      NaN   55days    1000.0

6. Using query()

You can also remove rows by using the query() method. Note that these methods actually filter the rows from pandas DataFrame, by negating this you can drop the rows.


# Delete rows using DataFrame.query()
df2=df.query("Courses == 'Spark'")

# Using variable
value='Spark'
df2=df.query("Courses == @value")

# Inpace
df.query("Courses == 'Spark'",inplace=True)

# Not equals, in & multiple conditions
df.query("Courses != 'Spark'")
df.query("Courses in ('Spark','PySpark')")
df.query("`Courses Fee` >= 23000")
df.query("`Courses Fee` >= 23000 and `Courses Fee` <= 24000")

# Other ways to Delete Rows
df.loc[df['Courses'] == value]
df.loc[df['Courses'] != 'Spark']
df.loc[df['Courses'].isin(values)]
df.loc[~df['Courses'].isin(values)]
df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]
df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000 )]

df[df["Courses"] == 'Spark'] 
df[df['Courses'].str.contains("Spark")]
df[df['Courses'].str.lower().str.contains("spark")]
df[df['Courses'].str.startswith("P")]

# Using lambda
df.apply(lambda row: row[df['Courses'].isin(['Spark','PySpark'])])
df.dropna()

7. Based on the Inverse of Column Values

If you need to drop() all rows that are not equal to a value given for a column. Pandas offer a negation (~) operation to perform this feature. For E.x: df.drop(df1,inplace=True).


# Delect rows based on inverse of column values
df1 = df[~(df['Courses'] == "PySpark")].index 
df.drop(df1, inplace = True)
print("Drop rows based on column value:\n", df)

Yields below output.


# Output:
# Drop rows based on column value:
   Courses    Fee Duration  Discount
b  PySpark  25000   50days      2300
f  PySpark  25000   50days      2000

8. Complete Example


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python"],
    'Fee' :[22000,25000,np.nan,24000],
    'Duration':['30day',None,'55days',np.nan],
    'Discount':[1000,2300,1000,np.nan]
          }
df = pd.DataFrame(technologies)
print(df)

# Using drop() to remove rows
df.drop(df[df['Fee'] >= 24000].index, inplace = True)
print(df)

# Remove rows
df = pd.DataFrame(technologies)
df2 = df[df.Fee >= 24000]
print(df2)

# Reset index after deleting rows
df2 = df[df.Fee >= 24000].reset_index()
print(df2)

# If you have space in column name.
# Surround the column name with single quote
df2 = df[df['column name']]

# Using loc
df2 = df.loc[df["Fee"] >= 24000 ]
print(df2)

# Delect rows based on multiple column value
df2 = df[(df['Fee'] >= 22000) & (df['Discount'] == 2300)]
print(df2)

# Drop rows with None/NaN
df2 = df[df.Discount.notnull()]
print(df2)

Frequently Asked Questions on Drop Rows

How do I drop rows based on a specific column value in Pandas?

You can use the DataFrame.drop() method along with boolean indexing to drop rows based on a specific column value. For example, df.drop(df[df['Specified_column'] >= 24000].index, inplace = True)

How can I drop rows based on multiple conditions?

You can use logical operators such as & (and) and | (or) to combine multiple conditions when dropping rows. For example, df = df[(df['specified_column1'] != 1000) & (df['specified_column2'] < 1000)]

What if I want to drop rows with missing (NaN) values in a specific column?

You can use the DataFrame.dropna() method to drop rows with missing values in a specific column. For example, df = df.dropna(subset=['specified_column'])

Conclusion

In this article, you have learned how to delete DataFrame rows based on column value using different ways of Pandas.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium