• Post author:
  • Post category:Pandas
  • Post last modified:June 3, 2024
  • Reading time:15 mins read
You are currently viewing Pandas Drop Rows with NaN Values in DataFrame

To drop rows with NaN (null) values in a Pandas DataFrame, you can use the dropna() function. Python doesn’t support Null hence any missing data is represented as None or NaN. NaN stands for Not A Number and serves as one of the typical methods for indicating missing values within datasets. None/NaN values are one of the major problems in Data Analysis hence before we process either you need to drop rows that have NaN values or replace NaN with empty for Strings and replace NaN with zero for numeric columns.

Advertisements

Key Points –

  • Use the dropna() function in Pandas to remove rows containing NaN/None values from a DataFrame.
  • numpy.nan is Not a Number (NaN), which is of Python build-in numeric type float (floating point).
  • None is of NoneType and it is an object in Python.
  • Specify the axis parameter as 0 to drop rows with NaN values.
  • The dropna() function returns a new DataFrame with NaN-containing rows removed.
  • Use additional parameters like subset to specify columns to consider for NaN removal, and how to control the criteria for dropping rows.

Quick Examples of Dropping Rows with NaN Values

Below are quick examples of dropping rows with nan values.


# Quick examples of drop rows with nan values
  
# Example 1: Drop all rows with NaN values
df2=df.dropna()
df2=df.dropna(axis=0)

# Example 2: Reset index after drop
df2=df.dropna().reset_index(drop=True)

# Example 3: Drop row that has all NaN values
df2=df.dropna(how='all')

# Example 4: Drop rows that has NaN values on selected columns
df2=df.dropna(subset=['Courses','Fee'])

# Example 5: With threshold, 
# Keep only the rows with at least 2 non-NA values.
df2=df.dropna(thresh=2)

# Example 6: Drop Rows with NaN Values inplace
df.dropna(inplace=True)

To run some examples of drop rows with NaN values in Pandas DataFrame, let’s create a Pandas DataFrame using data from a dictionary.


import pandas as pd
import numpy as np
technologies = ({
     'Courses':["Spark",'Java',"Hadoop",'Python',np.nan],
     'Fee' :[20000,np.nan,26000,24000,np.nan],
     'Duration':['30days',np.nan,'35days','40days',np.nan],
     'Discount':[1000,np.nan,2500,None,np.nan]
               })
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

Pandas Drop Rows NaN

Drop Rows with NaN Values

You can use the dropna() method to remove rows with NaN (Not a Number) and None values from Pandas DataFrame. By default, it removes any row containing at least one NaN value and returns the copy of the DataFrame after removing rows. If you want to remove from the existing DataFrame, you should use inplace=True.

with NaN values in a Pandas DataFrame.


# Drop all rows that have NaN/None values
df2 = df.dropna()
print("After dropping the rows with NaN Values:\n", df2)

Yields below output.

Pandas Drop Rows NaN

Related: you can use the dropna(axis=1) to drop all columns with NaN values from DataFrame.

Post-dropping rows with NaN, sometimes you may be required to reset the index, you can do so using the DataFrame.reset_index() method.


# Reset index after drop
df2 = df.dropna().reset_index(drop=True)
print("Reset the index after dropping:\n", df2)

Yields below output.


# Output:
# Reset the index after dropping
  Courses      Fee Duration  Discount
0   Spark  20000.0   30days    1000.0
1  Hadoop  26000.0   35days    2500.0

Drop NaNs for all Columns in the DataFrame

Similarly, you can use how parameter of the dropna() function to specify which rows to drop based on NaN values. By default, the Param how=any specifies all rows with NaN/None values on any element are removed.

You can use how='all' to remove rows that have all NaN/None values in a row(data is missing for all elements in a row).


# Drop rows that has all NaN values
df2 = df.dropna(how='all')
print(" After dropping the rows which have all NaN values:\n", df2)

Yields below output.


# Output:
# After dropping the rows which have all NaN values:
  Courses      Fee Duration  Discount
0   Spark  20000.0   30days    1000.0
1    Java      NaN      NaN       NaN
2  Hadoop  26000.0   35days    2500.0
3  Python  24000.0   40days       NaN

Drop NaN Values on Selected Columns from List

Sometimes you may be required to drop rows only when selected columns have NaN/None values in DataFrame, you can achieve this by using subset param. This parameter takes a list of label names.


# Drop rows that has NaN values on selected columns
df2=df.dropna(subset=['Courses','Fee'])
print("After dropping rows based on specified columns:\n", df2)

Yields below output.


# Output:
# After dropping rows based on specified columns:
  Courses      Fee Duration  Discount
0   Spark  20000.0   30days    1000.0
2  Hadoop  26000.0   35days    2500.0
3  Python  24000.0   40days       NaN

Drop Rows with NaN Values inplace

As you can see, by default dropna() method doesn’t drop rows from the original DataFrame; instead, it returns a copy of the DataFrame. If you intend to modify the existing DataFrame directly, you can set inplace=True.


# Drop Rows with NaN Values inplace
df.dropna(inplace=True)
print("After dropping the rows with NaN values:\n", df)

# Output:
# After dropping the rows with NaN values:
#   Courses      Fee Duration  Discount
# 0   Spark  20000.0   30days    1000.0
# 2  Hadoop  26000.0   35days    2500.0

Complete Example of Drop Rows with NaN Values


import pandas as pd
import numpy as np
technologies = ({
     'Courses':["Spark",'Java',"Hadoop",'Python',np.nan],
     'Fee' :[20000,np.nan,26000,24000,np.nan],
     'Duration':['30days',np.nan,'35days','40days',np.nan],
     'Discount':[1000,np.nan,2500,None,np.nan]
               })
df = pd.DataFrame(technologies)
print(df)

# Drop all rows with NaN values
df2=df.dropna()
print(df2)
df2=df.dropna(axis=0)

# Reset index after drop
df2=df.dropna().reset_index(drop=True)
print(df2)

# Drop row that has all NaN values
df2=df.dropna(how='all')
print(df2)

# Drop rows that has null on selected columns
df2=df.dropna(subset=['Courses','Fee'])
print(df2)

# Drop Rows with NaN Values inplace
df.dropna(inplace=True)
print(df)

FAQ on Drop Rows with NaN Values

How do I drop rows with NaN values in a Pandas DataFrame?

You can use the dropna() method to remove rows with NaN values in a Pandas DataFrame. By default, it removes any of the rows having at least one NaN value. For instance, df.dropna().

What is the syntax for using the dropna() function to remove rows with NaN values?

df.dropna() which returns a new DataFrame with rows that don’t have NaN values.

How can I drop rows with NaN values in a specific column?

You can use the subset parameter of the dropna() method to specify a subset of columns to consider for NaN removal. For example, df.dropna(subset=['specified_column'])

How can I drop rows based on multiple columns with NaN values?

You can specify multiple columns in the subset parameter. For example, df.dropna(subset=['specified_column1', 'specified_column2'])

How can I drop rows if all values in a row are NaN?

You can use the how parameter with the value ‘all’ to drop rows where all values are NaN. For example, df.dropna(how='all')

Conclusion

In this article, I have explained dropping rows with NaN/None values in pandas DataFrame using DataFrame.dropna(). Also learned to remove rows only when all values are NaN/None, remove only when selected columns have NaN values, and remove using the inplace parameter.

Happy Learning !!

References