To drop rows with NaN (null) values in a Pandas DataFrame, you can use the dropna()
function. Python doesn’t support Null hence any missing data is represented as None or NaN. NaN stands for Not A Number and serves as one of the typical methods for indicating missing values within datasets. None/NaN values are one of the major problems in Data Analysis hence before we process either you need to drop rows that have NaN values or replace NaN with empty for Strings and replace NaN with zero for numeric columns.
Key Points –
- Use the dropna() function in Pandas to remove rows containing
NaN/None
values from a DataFrame. numpy.nan
is Not a Number (NaN), which is of Python build-in numeric type float (floating point).None
is of NoneType and it is an object in Python.- Specify the axis parameter as 0 to drop rows with NaN values.
- The
dropna()
function returns a new DataFrame with NaN-containing rows removed. - Use additional parameters like
subset
to specify columns to consider for NaN removal, andhow
to control the criteria for dropping rows.
Quick Examples of Dropping Rows with NaN Values
Below are quick examples of dropping rows with nan values.
# Quick examples of drop rows with nan values
# Example 1: Drop all rows with NaN values
df2=df.dropna()
df2=df.dropna(axis=0)
# Example 2: Reset index after drop
df2=df.dropna().reset_index(drop=True)
# Example 3: Drop row that has all NaN values
df2=df.dropna(how='all')
# Example 4: Drop rows that has NaN values on selected columns
df2=df.dropna(subset=['Courses','Fee'])
# Example 5: With threshold,
# Keep only the rows with at least 2 non-NA values.
df2=df.dropna(thresh=2)
# Example 6: Drop Rows with NaN Values inplace
df.dropna(inplace=True)
To run some examples of drop rows with NaN values in Pandas DataFrame, let’s create a Pandas DataFrame using data from a dictionary.
import pandas as pd
import numpy as np
technologies = ({
'Courses':["Spark",'Java',"Hadoop",'Python',np.nan],
'Fee' :[20000,np.nan,26000,24000,np.nan],
'Duration':['30days',np.nan,'35days','40days',np.nan],
'Discount':[1000,np.nan,2500,None,np.nan]
})
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
Drop Rows with NaN Values
You can use the dropna()
method to remove rows with NaN (Not a Number) and None values from Pandas DataFrame. By default, it removes any row containing at least one NaN value and returns the copy of the DataFrame after removing rows. If you want to remove from the existing DataFrame, you should use inplace=True
.
with NaN values in a Pandas DataFrame.
# Drop all rows that have NaN/None values
df2 = df.dropna()
print("After dropping the rows with NaN Values:\n", df2)
Yields below output.
Related:
you can use the dropna(axis=1)
to drop all columns with NaN values from DataFrame.
Post-dropping rows with NaN, sometimes you may be required to reset the index, you can do so using the DataFrame.reset_index() method.
# Reset index after drop
df2 = df.dropna().reset_index(drop=True)
print("Reset the index after dropping:\n", df2)
Yields below output.
# Output:
# Reset the index after dropping
Courses Fee Duration Discount
0 Spark 20000.0 30days 1000.0
1 Hadoop 26000.0 35days 2500.0
Drop NaNs for all Columns in the DataFrame
Similarly, you can use how
parameter of the dropna()
function to specify which rows to drop based on NaN values. By default, the Param how=any
specifies all rows with NaN/None values on any element are removed.
You can use how='all'
to remove rows that have all NaN/None values in a row(data is missing for all elements in a row).
# Drop rows that has all NaN values
df2 = df.dropna(how='all')
print(" After dropping the rows which have all NaN values:\n", df2)
Yields below output.
# Output:
# After dropping the rows which have all NaN values:
Courses Fee Duration Discount
0 Spark 20000.0 30days 1000.0
1 Java NaN NaN NaN
2 Hadoop 26000.0 35days 2500.0
3 Python 24000.0 40days NaN
Drop NaN Values on Selected Columns from List
Sometimes you may be required to drop rows only when selected columns have NaN/None values in DataFrame, you can achieve this by using subset
param. This parameter takes a list of label names.
# Drop rows that has NaN values on selected columns
df2=df.dropna(subset=['Courses','Fee'])
print("After dropping rows based on specified columns:\n", df2)
Yields below output.
# Output:
# After dropping rows based on specified columns:
Courses Fee Duration Discount
0 Spark 20000.0 30days 1000.0
2 Hadoop 26000.0 35days 2500.0
3 Python 24000.0 40days NaN
Drop Rows with NaN Values inplace
As you can see, by default dropna()
method doesn’t drop rows from the original DataFrame; instead, it returns a copy of the DataFrame. If you intend to modify the existing DataFrame directly, you can set inplace=True
.
# Drop Rows with NaN Values inplace
df.dropna(inplace=True)
print("After dropping the rows with NaN values:\n", df)
# Output:
# After dropping the rows with NaN values:
# Courses Fee Duration Discount
# 0 Spark 20000.0 30days 1000.0
# 2 Hadoop 26000.0 35days 2500.0
Complete Example of Drop Rows with NaN Values
import pandas as pd
import numpy as np
technologies = ({
'Courses':["Spark",'Java',"Hadoop",'Python',np.nan],
'Fee' :[20000,np.nan,26000,24000,np.nan],
'Duration':['30days',np.nan,'35days','40days',np.nan],
'Discount':[1000,np.nan,2500,None,np.nan]
})
df = pd.DataFrame(technologies)
print(df)
# Drop all rows with NaN values
df2=df.dropna()
print(df2)
df2=df.dropna(axis=0)
# Reset index after drop
df2=df.dropna().reset_index(drop=True)
print(df2)
# Drop row that has all NaN values
df2=df.dropna(how='all')
print(df2)
# Drop rows that has null on selected columns
df2=df.dropna(subset=['Courses','Fee'])
print(df2)
# Drop Rows with NaN Values inplace
df.dropna(inplace=True)
print(df)
FAQ on Drop Rows with NaN Values
You can use the dropna()
method to remove rows with NaN values in a Pandas DataFrame. By default, it removes any of the rows having at least one NaN value. For instance, df.dropna()
.
df.dropna()
which returns a new DataFrame with rows that don’t have NaN values.
You can use the subset
parameter of the dropna()
method to specify a subset of columns to consider for NaN removal. For example, df.dropna(subset=['specified_column'])
You can specify multiple columns in the subset
parameter. For example, df.dropna(subset=['specified_column1', 'specified_column2'])
You can use the how
parameter with the value ‘all’ to drop rows where all values are NaN. For example, df.dropna(how='all')
Conclusion
In this article, I have explained dropping rows with NaN/None values in pandas DataFrame using DataFrame.dropna()
. Also learned to remove rows only when all values are NaN/None, remove only when selected columns have NaN values, and remove using the inplace
parameter.
Happy Learning !!
Related Articles
- Pandas Drop Rows by Index
- Delete Last Row From Pandas DataFrame
- Pandas – Drop List of Rows From DataFrame
- Pandas Drop Last N Rows From DataFrame
- Pandas – Drop the First Three Rows
- How to drop duplicate rows from DataFrame?
- Drop Pandas rows based on condition
- Pandas Drop Rows Based on Column Value