To filter rows with NaN values from a specific column in a Pandas DataFrame, whether the column contains strings, floats, datetimes, etc., by using the dropna()
and notnull()
methods. Python doesn’t support Null, so any missing data is represented as None or NaN. NaN stands for Not A Number
and is a common way to represent missing values in data.
In this article, I will explain how to filter out rows with NAN values from the Pandas DataFrame column with some examples.
Quick Examples Filter out Rows NaN Value from Column
If you are in a hurry, below are some quick examples of how to ignore rows with NAN from Pandas DataFrame.
# Quick examples filter out rows NaN value from column
# Example 1: Using DataFrame.dropna() method
# To drop all rows that have NAN/none
df2=df.dropna()
# Example 2: Filter out NAN data selection column
# By DataFrame.dropna()
df2 = df.dropna(thresh=2)
# Example 3: Pandas find columns with nan to update
df2=df[df.Duration.notnull()]
# Example 4: Drop rows that has all NaN values
df2=df.dropna(how='all')
# Example 5: Using reset_index() method
df2=df.dropna().reset_index(drop=True)
# Example 6: Two columns by using subset parameter
df2=df.dropna(subset=['Courses','Fee'])
# Example 7: Filter NAN Data selection column of strings by not operator
df2 = df[~pd.isnull(df['Courses'])]
To run some examples of filter rows with NAN value from pandas DataFrame column, let’s create Pandas DataFrame using data from a dictionary.
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark","Java"],
'Fee' :[22000,25000,np.nan,np.nan,np.nan,np.nan],
'Duration':['30days',np.nan,'30days','N/A', np.nan,np.nan]
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
Filter Out NAN Rows Using DataFrame.Dropna()
To filter rows with NaN values from a DataFrame using the DataFrame.dropna()
method. By default, this method returns a copy of the DataFrame with the rows removed. To modify the existing DataFrame directly, use the inplace=True
parameter.
# Using DataFrame.dropna() method drop all rows that have NAN/none.
df2=df.dropna()
print("After filtering the rows with NAN values:\n", df2)
Yields below output.
You can use axis=0
as a parameter to remove rows with NaN values. For instance, df.dropna(axis=0)
will remove rows with NaN values along the rows axis (axis=0).
Filter Out NAN Rows Using DataFrame.dropna()
Filter out NAN rows (Data selection) by using DataFrame.dropna()
method. The dropna()
function is also possible to drop rows with NaN values df.dropna(thresh=2)
it will drop all rows where there are at least two non-NaN.
# Filter out NAN data selection column
# By DataFrame.dropna()
df2 = df.dropna(thresh=2)
print(df2)
Yields below output.
# Output:
Courses Fee Duration
0 Spark 22000.0 30days
1 PySpark 25000.0 NaN
2 Spark NaN 30days
3 Python NaN N/A
Using DataFrame.notnull() Method
The DataFrame.notnull()
method is used to detect non-missing values for an array-like object. This method takes a scalar or array-like object and indicates whether values are valid.
# Pandas find columns with nan to update.
df2=df[df.Duration.notnull()]
print(df2)
Yields below output.
# Output:
Courses Fee Duration
0 Spark 22000.0 30days
2 Spark NaN 30days
3 Python NaN N/A
Filter out Rows with NAN by using reset_index() Method
reset_index()
method is used to generate a new DataFrame or Series with the index reset. This is useful when the index needs to be treated as a column or it can be reset to the default before another operation.
# Using reset_index() Method.
df2=df.dropna().reset_index(drop=True)
print(df2)
# OutPut:
# Courses Fee Duration
# 0 Spark 22000.0 30days
Filter Rows with NaN Values on Selected Columns from List
In this section, let’s see how to drop rows only when selected columns have NaN/None values in DataFrame, you can achieve this by using subset
parameter. The subset parameter is simply selecting particular rows and columns of data from a DataFrame (or Series).
# Two columns by using subset parameter.
df2=df.dropna(subset=['Courses','Fee'])
print(df2)
Yields below output.
# Output:
Courses Fee Duration
0 Spark 22000.0 30days
1 PySpark 25000.0 NaN
Alternatively, Use how='all'
to remove rows that have all NaN/None values in a row(data is missing for all elements in a row)
# Drop rows that has all NaN values.
df2=df.dropna(how='all')
print(df2)
Yields below output.
# Output:
Courses Fee Duration
0 Spark 22000.0 30days
1 PySpark 25000.0 NaN
2 Spark NaN 30days
3 Python NaN N/A
4 PySpark NaN NaN
5 Java NaN NaN
Filter NAN Data Selection Column of Strings by not operator
Filter NAN Data Selection column of strings by not(~) operator is used to to negate the statement.
# Filter NAN Data selection column of strings by not operator.
df2 = df[~pd.isnull(df['Courses'])]
print(df2)
Yields output same as above.
Complete Examples to Filter Out Rows with NAN Value
Below is a complete example to filter out rows with NAN values from the DataFrame.
# Complete examples to filter out rows with NAN value
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark","Java"],
'Fee' :[22000,25000,np.nan,np.nan,np.nan,np.nan],
'Duration':['30days',np.nan,'30days','N/A', np.nan,np.nan]
}
df = pd.DataFrame(technologies)
print(df)
# Using DataFrame.dropna() method drop all rows that have NAN/none.
df2=df.dropna()
print(df2)
# Filter out NAN data selection column by DataFrame.dropna().
df2 = df.dropna(thresh=2)
print(df2)
# Pandas find columns with nan to update.
df2=df[df.Duration.notnull()]
print(df2)
# Drop rows that has all NaN values.
df2=df.dropna(how='all')
print(df2)
# Using reset_index() Method.
df2=df.dropna().reset_index(drop=True)
print(df2)
# Two columns by using subset parameter.
df2=df.dropna(subset=['Courses','Fee'])
print(df2)
# Filter NAN Data selection column of strings by not operator.
df2 = df[~pd.isnull(df['Courses'])]
print(df2)
Conclusion
In this article, I have explained filter nan rows from pandas DataFrame by using DataFrame.dropna()
, DataFrame.notnull()
methods. Also learned how to filter rows under different conditions: exclusively when all values are NaN/None, only when specific columns contain NaN values, all while applying the inplace parameter.
Related Articles
- Pandas Series filter() Function
- How to Use NOT IN Filter in Pandas
- Pandas Filter DataFrame Rows on Dates
- Pandas Filter Rows Using IN Like SQL
- Pandas Filter DataFrame by Multiple Conditions
- pandas DataFrame filter() – Usage & Examples
- Pandas Filter DataFrame by Substring criteria
- Apply Multiple Filters to Pandas DataFrame or Series
- Pandas Operator Chaining to Filter DataFrame Rows
- Pandas Drop Columns with NaN or None Values
- Pandas Drop Rows with NaN Values in DataFram
- Pandas Replace Values based on Condition
- Pandas Replace Column value in DataFrame
- Remove NaN From Pandas Series
- Pandas Replace Blank/Empty String with NaN values
- Pandas – Replace NaN Values with Zero in a Column
- Count NaN Values in Pandas DataFrame
- Pandas DataFrame.fillna() function explained
Pandas Series.fillna() function explained