• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:17 mins read
You are currently viewing Pandas Drop Rows From DataFrame Examples

By using pandas.DataFrame.drop() method you can drop/remove/delete rows from DataFrame. axis param is used to specify what axis you would like to remove. By default axis = 0 meaning to remove rows. Use axis=1 or columns param to remove columns. By default, Pandas return a copy DataFrame after deleting rows, used inpalce=True to remove from existing referring DataFrame.

Advertisements

Related: Drop DataFrame Rows by Checking Conditions

In this article, I will cover how to remove rows by labels, indexes, and ranges and how to drop inplace and None, Nan & Null values with examples. if you have duplicate rows, use drop_duplicates() to drop duplicate rows from pandas DataFrame

1. Pandas.DataFrame.drop() Syntax – Drop Rows & Columns


# Pandas DaraFrame drop() Syntax
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
  • labels – Single label or list-like. It’s used with axis param.
  • axis – Default sets to 0. 1 to drop columns and 0 to drop rows.
  • index – Use to specify rows. Accepts single label or list-like.
  • columns – Use to specify columns. Accepts single label or list-like.
  • level – int or level name, optional, use for Multiindex.
  • inplace – Default False, returns a copy of DataFrame. When used True, it drops the column inplace (current DataFrame) and returns None.
  • errors – {‘ignore’, ‘raise’}, default ‘raise’.

Let’s create a DataFrame, run some examples, and explore the output. Note that our DataFrame contains index labels for rows which I am going to use to demonstrate removing rows by labels.


# Create a DataFrame
import pandas as pd
import numpy as np

technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python"],
    'Fee' :[20000,25000,26000,22000],
    'Duration':['30day','40days',np.nan, None],
    'Discount':[1000,2300,1500,1200]
               }

indexes=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=indexes)
print(df)

Yields below output.

pandas drop rows

2. pandas Drop Rows From DataFrame Examples

By default drop() method removes rows (axis=0) from DataFrame. Let’s see several examples of how to remove rows from DataFrame.

2.1 Drop rows by Index Labels or Names

One of the Panda’s advantages is you can assign labels/names to rows, similar to column names. If you have DataFrame with row labels (index labels), you can specify what rows you want to remove by label names.


# Drop rows by Index Label
df = pd.DataFrame(technologies,index=indexes)
df1 = df.drop(['r1','r2'])
print("Drop rows from DataFrame:\n", df1)

Yields below output.

pandas drop rows

Alternatively, you can also write the same statement by using the field name 'index'.


# Delete Rows by Index Labels
df1 = df.drop(index=['r1','r2'])

And by using labels and axis as below.


# Delete Rows by Index Labels & axis
df1 = df.drop(labels=['r1','r2'])
df1 = df.drop(labels=['r1','r2'],axis=0)

Notes:

2.2 Drop Rows by Index Number (Row Number)

Similarly by using drop() method you can also remove rows by index position from pandas DataFrame. drop() method doesn’t have a position index as a param, hence we need to get the row labels from the index and pass these to the drop method. We will use df.index it to get row labels for the indexes we want to delete.

  • df.index.values returns all row labels as a list.
  • df.index[[1,3]] gets you row labels for the 2nd and 3rd rows, bypassing these to drop() method removes these rows. Note that in Python, the list index starts from zero.

# Delete Rows by Index numbers
df = pd.DataFrame(technologies,index=indexes)
df1=df.drop(df.index[[1,3]])
print(df1)

Yields the same output as section 2.1. In order to drop the first row, you can use df.drop(df.index[0]), and to drop the last row use df.drop(df.index[-1]).


# Removes First Row
df=df.drop(df.index[0])

# Removes Last Row
df=df.drop(df.index[-1])

2.3 Delete Rows by Index Range

You can also remove rows by specifying the index range. The below example removes all rows starting 3rd row.


# Delete Rows by Index Range
df = pd.DataFrame(technologies,index=indexes)
df1=df.drop(df.index[2:])
print(df1)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
r1    Spark  20000    30day      1000
r2  PySpark  25000   40days      2300

2.4 Delete Rows when you have Default Index

By default, pandas assign a sequence number to all rows also called index, row index starts from zero and increments by 1 for every row. If you are not using custom index labels, pandas DataFrame assigns sequence numbers as Index. To remove rows with the default index, you can try below.


# Remove rows when you have default index.
df = pd.DataFrame(technologies)
df1 = df.drop(0)
df3 = df.drop([0, 3])
df4 = df.drop(range(0,2))

Note that df.drop(-1) doesn’t remove the last row as the -1 index is not present in DataFrame. You can still use df.drop(df.index[-1]) it to remove the last row.

2.5 Remove DataFrame Rows Inplace

All examples you have seen above return a copy of DataFrame after removing rows. In case if you want to remove rows inplace from referring DataFrame use inplace=True. By default, inplace param is set to False.


# Delete Rows inplace
df = pd.DataFrame(technologies,index=indexes)
df.drop(['r1','r2'],inplace=True)
print(df)

2.6 Drop Rows by Checking Conditions

Most of the time we would also need to remove DataFrame rows based on some conditions (column value), you can do this by using loc[] and iloc[] methods.


# Delete Rows by Checking Conditions
df = pd.DataFrame(technologies)
df1 = df.loc[df["Discount"] >=1500 ]
print(df1)

Yields below output.


# Output:
   Courses    Fee Duration  Discount
1  PySpark  25000   40days      2300
2   Hadoop  26000      NaN      1500

2.7 Drop Rows that NaN/None/Null Values

While working with analytics you would often be required to clean up the data that has None, Null & np.NaN values. By using df.dropna() you can remove NaN values from DataFrame.


# Delete rows with Nan, None & Null Values
df = pd.DataFrame(technologies,index=indexes)
df2=df.dropna()
print(df2)

This removes all rows that have None, Null & NaN values on any columns.


# Output:
    Courses    Fee Duration  Discount
r1    Spark  20000    30day      1000
r2  PySpark  25000   40days      2300

2.8 Remove Rows by Slicing DataFrame

You can also drop a list of DataFrame rows by slicing. Remember index starts from zero.


# Remove Rows by Slicing DataFrame
df2=df[4:]     # Returns rows from 4th row
df2=df[1:-1]   # Removes first and last row
df2=df[2:4]    # Return rows between 2 and 4

Related: You can also remove first N rows from pandas DataFrame and remove last N Rows from pands DataFrame

1. How can I drop rows with missing values from a DataFrame?

A. You can use the dropna() method to remove rows containing missing values (NaN).

2. How can I drop specific rows by index in a DataFrame?

A. You can use the drop() method with the index labels you want to remove.

3. How can I drop rows based on a condition in a DataFrame?

A. You can use boolean indexing to filter rows based on a condition and create a new DataFrame without the rows that don’t meet the condition.

4. How can I drop duplicate rows from a DataFrame?

A. You can use the drop_duplicates() method to remove duplicate rows based on the values in one or more columns.

5. How can I drop rows based on a custom condition or function?

A. You can use the drop() method with a custom condition or function to drop rows based on your specific criteria

Conclusion

In this pandas drop rows article you have learned how to drop/remove pandas DataFrame rows using drop() method. By default drop() deletes rows (axis = 0), if you want to delete columns either you have to use axis =1 or columns=labels param.

Happy Learning !!

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply