• Post author:
  • Post category:Pandas
  • Post last modified:November 22, 2024
  • Reading time:16 mins read
You are currently viewing Pandas Filter Rows with NAN Value from DataFrame Column

To filter rows with NaN values from a specific column in a Pandas DataFrame, whether the column contains strings, floats, datetimes, etc., by using the dropna() and notnull() methods. Python doesn’t support Null, so any missing data is represented as None or NaN. NaN stands for Not A Number and is a common way to represent missing values in data.

Advertisements

In this article, I will explain how to filter out rows with NAN values from the Pandas DataFrame column with some examples.

Key Points –

  • Use the isna() or isnull() functions to identify NaN values in a DataFrame column.
  • To filter out rows with NaN values, combine isna() with the DataFrame’s ~ (negation) operator or use dropna().
  • The dropna() method can remove rows containing NaN values, either across all columns or specific columns, using the subset parameter.
  • By default, dropna() removes rows where any column contains NaN; use how='all' to only drop rows where all specified columns are NaN.
  • Decide whether to modify the DataFrame in place using the inplace=True parameter or return a new DataFrame with the NaN-filtered rows.

Quick Examples Filter out Rows NaN Value from Column

If you are in a hurry, below are some quick examples of how to ignore rows with NAN from Pandas DataFrame.


# Quick examples filter out rows NaN value from column

# Example 1: Using DataFrame.dropna() method 
# To drop all rows that have NAN/none
df2=df.dropna()

# Example 2: Filter out NAN data selection column 
# By DataFrame.dropna()
df2 = df.dropna(thresh=2)

# Example 3: Pandas find columns with nan to update
df2=df[df.Duration.notnull()]

# Example 4: Drop rows that has all NaN values
df2=df.dropna(how='all')

# Example 5: Using reset_index() method
df2=df.dropna().reset_index(drop=True)

# Example 6: Two columns by 
# Using subset parameter
df2=df.dropna(subset=['Courses','Fee'])

# Example 7: Filter NAN Data 
# Selection column of strings by not operator
df2 = df[~pd.isnull(df['Courses'])]

To run some examples of filter rows with NAN value from pandas DataFrame column, let’s create Pandas DataFrame using data from a dictionary.


# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Python","PySpark","Java"],
    'Fee' :[22000,25000,np.nan,np.nan,np.nan,np.nan],
    'Duration':['30days',np.nan,'30days','N/A', np.nan,np.nan]
          }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

pandas filter rows nan

Filter Out NAN Rows Using DataFrame.Dropna()

To filter rows with NaN values from a DataFrame using the DataFrame.dropna() method. By default, this method returns a copy of the DataFrame with the rows removed. To modify the existing DataFrame directly, use the inplace=True parameter.


# Using DataFrame.dropna() method 
# Drop all rows that have NAN/none
df2=df.dropna()
print("After filtering the rows with NAN values:\n", df2)

Yields below output.

pandas filter rows nan

You can use axis=0 as a parameter to remove rows with NaN values. For instance, df.dropna(axis=0) will remove rows with NaN values along the rows axis (axis=0).

Filter Out NAN Rows Using DataFrame.dropna()

Alternatively, filter out NAN rows (Data selection) by using DataFrame.dropna() method. The dropna() function is also possible to drop rows with NaN values df.dropna(thresh=2) it will drop all rows where there are at least two non-NaN.


# Filter out NAN data selection column 
# By DataFrame.dropna()
df2 = df.dropna(thresh=2)
print(df2)

Yields below output.


# Output:
   Courses      Fee Duration
0    Spark  22000.0   30days
1  PySpark  25000.0      NaN
2    Spark      NaN   30days
3   Python      NaN      N/A

Using DataFrame.notnull() Method

The DataFrame.notnull() method is used to detect non-missing values for an array-like object. This method takes a scalar or array-like object and indicates whether values are valid.


# Pandas find columns with nan to update
df2 = df[df.Duration.notnull()]
print(df2)

Yields below output.


# Output:
  Courses      Fee Duration
0   Spark  22000.0   30days
2   Spark      NaN   30days
3  Python      NaN      N/A

Filter out Rows with NAN by using reset_index() Method

Similarly, reset_index() method is used to generate a new DataFrame or Series with the index reset. This is useful when the index needs to be treated as a column or it can be reset to the default before another operation.


# Using reset_index() method
df2=df.dropna().reset_index(drop=True)
print(df2)

# OutPut:
#   Courses      Fee Duration
# 0   Spark  22000.0   30days

Filter Rows with NaN Values on Selected Columns from List

In this section, let’s see how to drop rows only when selected columns have NaN/None values in DataFrame, you can achieve this by using subset parameter. The subset parameter is simply selecting particular rows and columns of data from a DataFrame (or Series).


# Two columns by using subset parameter
df2=df.dropna(subset=['Courses','Fee'])
print(df2)

Yields below output.


# Output:
   Courses      Fee Duration
0    Spark  22000.0   30days
1  PySpark  25000.0      NaN

Use how=all to remove rows where every value is NaN or None (i.e., when all elements in the row are missing).


# Drop rows that has all NaN values.
df2=df.dropna(how='all')
print(df2)

Yields below output.


# Output:
   Courses      Fee Duration
0    Spark  22000.0   30days
1  PySpark  25000.0      NaN
2    Spark      NaN   30days
3   Python      NaN      N/A
4  PySpark      NaN      NaN
5     Java      NaN      NaN

Filter NAN Data Selection Column of Strings by not operator

Filter NAN Data Selection column of strings by not (~) (bitwise NOT) operator is used to negate the statement.


# Filter NAN Data selection column of strings by not operator.
df2 = df[~pd.isnull(df['Courses'])]
print(df2)

Yields output same as above.

Complete Examples to Filter Out Rows with NAN Value

Below is a complete example to filter out rows with NAN values from the DataFrame.


# Complete examples to filter out rows with NAN value
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Spark","Python","PySpark","Java"],
    'Fee' :[22000,25000,np.nan,np.nan,np.nan,np.nan],
    'Duration':['30days',np.nan,'30days','N/A', np.nan,np.nan]
          }
df = pd.DataFrame(technologies)
print(df)

# Using DataFrame.dropna() method 
# Drop all rows that have NAN/none
df2=df.dropna()
print(df2)

# Filter out NAN data selection column by DataFrame.dropna()
df2 = df.dropna(thresh=2)
print(df2)

# Pandas find columns with nan to update
df2=df[df.Duration.notnull()]
print(df2)

# Drop rows that has all NaN values
df2=df.dropna(how='all')
print(df2)

# Using reset_index() method
df2=df.dropna().reset_index(drop=True)
print(df2)

# Two columns by using subset parameter
df2=df.dropna(subset=['Courses','Fee'])
print(df2)

# Filter NAN Data selection column of strings by not operator
df2 = df[~pd.isnull(df['Courses'])]
print(df2)

FAQ on Pandas Filter Rows with NAN Value from DataFrame Column

How do I filter rows with NaN values in a specific column?

To filter rows where a specific column contains NaN values, you can use isna() combined with boolean indexing.

How can I filter rows with NaN values in multiple columns?

You can filter rows where multiple columns contain NaN values using isna() on multiple columns.

How do I filter rows where NaN values are not present in a column?

To filter rows where a column does not contain NaN values, you can use the notna() method.

Can I filter rows based on NaN values in the entire DataFrame?

You can use isna() on the whole DataFrame and filter rows that have NaN values anywhere.

How do I filter rows where NaN values are in a specific column and replace them?

To filter rows where NaN values are present in a specific column and replace them, you can use the isna() function to identify the rows with NaN values and then use the fillna() method to replace those NaN values.

Conclusion

In this article, I have explained filter nan rows from pandas DataFrame by using DataFrame.dropna(), DataFrame.notnull() methods. Also learned how to filter rows under different conditions: exclusively when all values are NaN/None, only when specific columns contain NaN values, all while applying the inplace parameter.

References

Leave a Reply