Pandas – Drop Rows From DataFrame Examples

By using pandas.DataFrame.drop() method you can drop/remove/delete rows and columns from DataFrame. axis param is used to specify what axis you would like to remove. By default axis = 0 meaning to remove rows. Use axis=1 or columns param to remove columns. pandas return a copy DataFrame after deleting rows, use inpalce=True to remove from existing referring DataFrame.

In this article, I will cover how to remove rows by labels, by indexes, by ranges and how to drop inplace and None, Nan & Null values with examples.

1. Pandas.DataFrame.drop() Syntax – Drop Rows & Columns


# pandas DaraFrame drop() Syntax
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
  • labels – Single label or list-like. It’s used with axis param.
  • axis – Default set’s to 0. 1 to drop columns and 0 to drop rows.
  • index – Use to specify rows. Accepts single label or list-like.
  • columns – Use to specify columns. Accepts single label or list-like.
  • level – int or level name, optional, use for Multiindex.
  • inplace – Default False, returns a copy of DataFrame. When used True, it drop’s column inplace (current DataFrame) and returns None.
  • errors – {‘ignore’, ‘raise’}, default ‘raise’

Let’s create a DataFrame, run some examples and explore the output. Note that our DataFrame contains index labels for rows which I am going to use to demonstrate removing rows by labels.


import pandas as pd
import numpy as np

technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python"],
    'Fee' :[20000,25000,26000,22000],
    'Duration':['30day','40days',np.nan, None],
    'Discount':[1000,2300,1500,1200]
               }

indexes=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=indexes)
print(df)

Yields below output


    Courses    Fee Duration  Discount
r1    Spark  20000    30day      1000
r2  PySpark  25000   40days      2300
r3   Hadoop  26000      NaN      1500
r4   Python  22000     None      1200

2. pandas Drop Rows From DataFrame Examples

By default drop() method removes rows (axis=0) from DataFrame. Let’s see several examples of how to remove rows from DataFrame.

2.1 Drop rows by Index Labels or Names

One of the pandas advantages is you can have labels to rows, similar to column names. If you have DataFrame with row labels (index labels), you can specify what rows you wanted to remove by label names.


# Drop rows by Index Label
df = pd.DataFrame(technologies,index=indexes)
df1 = df.drop(['r1','r2'])
print(df1)

Yields below output.


   Courses    Fee Duration  Discount
r3  Hadoop  26000      NaN      1500
r4  Python  22000     None      1200

Alternatively, you can also write the same statement by using the field name 'index'.


# Delete Rows by Index Labels
df1 = df.drop(index=['r1','r2'])

And by using labels and axis as below.


# Delete Rows by Index Labels & axis
df1 = df.drop(labels=['r1','r2'])
df1 = df.drop(labels=['r1','r2'],axis=0)

Notes:

2.2 Drop Rows by Index Number (Row Number)

Similarly by using drop() method you can also remove rows by index position from pandas DataFrame. drop() method doesn’t have position index as a param, hence we need to get the row labels from the index and pass these to the drop method. We will use df.index to get us row labels for the indexes we wanted to delete.

  • df.index.values returns all row labels as list.
  • df.index[[1,3]] get’s you row labels for 2nd and 3rd rows, by passing these to drop() method removes these rows. Note that in python list index starts from zero.

# Delete Rows by Index numbers
df = pd.DataFrame(technologies,index=indexes)
df1=df.drop(df.index[[1,3]])
print(df1)

Yields the same output as section 2.1. In order to remove the first row, you can use df.drop(df.index[0]), and to remove the last row use df.drop(df.index[-1]).


# Removes First Row
df=df.drop(df.index[0])

# Removes Last Row
df=df.drop(df.index[-1])

2.3 Delete Rows by Index Range

You can also remove rows by specifying the index range. The below example removes all rows starting 3rd row.


# Delete Rows by Index Range
df = pd.DataFrame(technologies,index=indexes)
df1=df.drop(df.index[2:])
print(df1)

Yields below output.


    Courses    Fee Duration  Discount
r1    Spark  20000    30day      1000
r2  PySpark  25000   40days      2300

2.4 Delete Rows when you have Default Indexs

By default pandas assign a sequence number to all rows also called index, row index starts from zero and increments by 1 for every row. If you are not using custom index labels then pandas DataFrame assigns sequence numbers as Index. To remove rows with the default index, you can try below.


# Remove rows when you have default index.
df = pd.DataFrame(technologies)
df1 = df.drop(0)
df3 = df.drop([0, 3])
df4 = df.drop(range(0,2))

Note that df.drop(-1) doesn’t remove the last row as -1 index not present in DataFrame. You can still use df.drop(df.index[-1]) to remove the last row.

2.5 Remove DataFrame Rows inplace

All examples you have seen above return a copy DataFrame after removing rows. In case if you wanted to remove rows inplace from referring DataFrame use inplace=True. By default inplace param is set to False.


# Delete Rows inplace
df = pd.DataFrame(technologies,index=indexes)
df.drop(['r1','r2'],inplace=True)
print(df)

2.6 Drop Rows by Checking Conditions

Most of the time we would also need to remove rows based on some conditions, you can do this by using loc[] and iloc[] methods.


# Delete Rows by Checking Conditions
df = pd.DataFrame(technologies)
df1 = df.loc[df["Discount"] >=1500 ]
print(df1)

Yields below output.


   Courses    Fee Duration  Discount
1  PySpark  25000   40days      2300
2   Hadoop  26000      NaN      1500

2.7 Drop Rows that has NaN/None/Null Values

While working with analytics you would often be required to clean up the data that has None, Null & np.NaN values. By using df.dropna() you can remove all None, Null, NaN values from all columns.


# Delete rows with Nan, None & Null Values
df = pd.DataFrame(technologies,index=indexes)
df2=df.dropna()
print(df2)

This removes all rows that have None, Null & NaN values on any columns.


    Courses    Fee Duration  Discount
r1    Spark  20000    30day      1000
r2  PySpark  25000   40days      2300

2.8 Remove Rows by Slicing DataFrame

You can also remove DataFrame rows by slicing. Remember index starts from zero.


df2=df[4:]     # Returns rows from 4th row
df2=df[1:-1]   # Removes first and last row
df2=df[2:4]    # Return rows between 2 and 4

3. pandas remove Columns from DataFrame Examples

I have a separate article dedicated to explaining how to drop() columns from pandas DataFrame. Below I have just covered some examples as it also uses the drop() method.

3.1 Delete Column By Name

This example removes a Fee column from a DataFrame. When removing columns you have to specify either axis=1 or labels.


# Delete Column by Name
df = pd.DataFrame(technologies,index=indexes)
df2=df.drop(["Fee"], axis = 1)
print(df2)

Yields below output.


    Courses Duration  Discount
r1    Spark    30day      1000
r2  PySpark   40days      2300
r3   Hadoop      NaN      1500
r4   Python     None      1200

Alternatively, you can also try using labels.


# Drop by using labels & axis
df2=df.drop(labels=["Fee"], axis = 1)
print(df2)

# Drop by using columns
df2=df.drop(columns=["Fee"])
print(df2)

3.2 Delete Column By Index

In order to remove the column by Index, first, we should get the DataFrame column names as a list by using df.columns and then pick the column by index. Note that the index starts from 0 in Python. On below example df.columns[1] represents the second column on DataFrame which is Fee.


# Drop column by index.
df = pd.DataFrame(technologies,index=indexes)
df2=df.drop(df.columns[[1]], axis = 1)
print(df2)

This yields the same output as above.

3.3 Remove Multiple Columns from List

Sometimes you may need to remove multiple columns from a list. you can easily do this as below.


# Remove columns from List
lisCol = ["Courses","Fee"]
df2=df.drop(lisCol, axis = 1)
print(df2)

3.4 Remove Column inplace

To remove columns inplace use inplace=True.


# Remove columns in place
lisCol = ["Courses","Fee"]
df2=df.drop(lisCol, axis = 1,inplace=True)
print(df2)

Happy Learning !!

Conclusion

In this article you have learned how to drop/remove pandas DataFrame rows & columns using drop() method. By default drop() deletes rows (axis = 0), if you wanted to delete columns either you have to use axis =1 or columns=labels param.

Also Read

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply