• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:16 mins read
You are currently viewing Pandas Drop Rows with NaN Values in DataFrame

Use the dropna() function to drop rows with NaN/None values in Pandas DataFrame. Python doesn’t support Null hence any missing data is represented as None or NaN. NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. None/NaN values are one of the major problems in Data Analysis hence before we process either you need to drop rows that have NaN values or replace NaN with empty for Strings and replace NaN with zero for numeric columns.

Pandas Drop Rows with NaN Key Points

  • dropna() is used to drop rows with NaN/None values from DataFrame.
  • numpy.nan is Not a Number (NaN), which is of Python build-in numeric type float (floating point).
  • None is of NoneType and it is an object in Python.

1. Quick Examples of Drop Rows with NaN Values

If you are in a hurry, below are some quick examples of how to drop rows with nan values in DataFrame.


# Below are the quick examples
  
# Example 1: Drop all rows with NaN values
df2=df.dropna()
df2=df.dropna(axis=0)

# Example 2: Reset index after drop
df2=df.dropna().reset_index(drop=True)

# Example 3: Drop row that has all NaN values
df2=df.dropna(how='all')

# Example 4: Drop rows that has NaN values on selected columns
df2=df.dropna(subset=['Courses','Fee'])

# Example 5: With threshold, 
# Keep only the rows with at least 2 non-NA values.
df2=df.dropna(thresh=2)

# Example 6: Drop Rows with NaN Values inplace
df.dropna(inplace=True)

Now, let’s create a DataFrame with a few rows and columns and execute some examples to learn using drop rows with nan values. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
import numpy as np
technologies = ({
     'Courses':["Spark",'Java',"Hadoop",'Python',np.nan],
     'Fee' :[20000,np.nan,26000,24000,np.nan],
     'Duration':['30days',np.nan,'35days','40days',np.nan],
     'Discount':[1000,np.nan,2500,None,np.nan]
               })
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

Pandas Drop Rows NaN

2. Drop Rows with NaN Values

You can use the dropna() method to remove rows with NaN (Not a Number) and None values from Pandas DataFrame. By default, it removes any row containing at least one NaN value and returns the copy of the DataFrame after removing rows. If you want to remove from the existing DataFrame, you should use inplace=True.

with NaN values in a Pandas DataFrame.


# Drop all rows that have NaN/None values
df2 = df.dropna()
print("After dropping the rows with NaN Values:\n", df2)

Yields below output.

Pandas Drop Rows NaN

Related: you can use the dropna(axis=1) to drop all columns with NaN values from DataFrame.

Post-dropping rows with NaN, sometimes you may be required to reset the index, you can do so using the DataFrame.reset_index() method.


# Reset index after drop
df2 = df.dropna().reset_index(drop=True)
print("Reset the index after dropping:\n", df2)

Yields below output.


# Output:
# Reset the index after dropping
  Courses      Fee Duration  Discount
0   Spark  20000.0   30days    1000.0
1  Hadoop  26000.0   35days    2500.0

3. Drop NaNs for all Columns in the DataFrame

Similarly, you can use how parameter of the dropna() function to specify which rows to drop based on NaN values. By default, the Param how=any specifies all rows with NaN/None values on any element are removed.

You can use how='all' to remove rows that have all NaN/None values in a row(data is missing for all elements in a row).


# Drop rows that has all NaN values
df2 = df.dropna(how='all')
print(" After dropping the rows which have all NaN values:\n", df2)

Yields below output.


# Output:
# After dropping the rows which have all NaN values:
  Courses      Fee Duration  Discount
0   Spark  20000.0   30days    1000.0
1    Java      NaN      NaN       NaN
2  Hadoop  26000.0   35days    2500.0
3  Python  24000.0   40days       NaN

4. Drop NaN Values on Selected Columns from List

Sometimes you may be required to drop rows only when selected columns have NaN/None values in DataFrame, you can achieve this by using subset param. This parameter takes a list of label names.


# Drop rows that has NaN values on selected columns
df2=df.dropna(subset=['Courses','Fee'])
print("After dropping rows based on specified columns:\n", df2)

Yields below output.


# Output:
# After dropping rows based on specified columns:
  Courses      Fee Duration  Discount
0   Spark  20000.0   30days    1000.0
2  Hadoop  26000.0   35days    2500.0
3  Python  24000.0   40days       NaN

5. Drop Rows with NaN Values inplace

As you can see, by default dropna() method doesn’t drop rows from the existing DataFrame, instead, it returns a copy of the DataFrame. If you want to drop from the existing DataFrame use inplace=True.


# Drop Rows with NaN Values inplace
df.dropna(inplace=True)
print("After dropping the rows with NaN values:\n", df)

# Output:
# After dropping the rows with NaN values:
#   Courses      Fee Duration  Discount
# 0   Spark  20000.0   30days    1000.0
# 2  Hadoop  26000.0   35days    2500.0

6. Complete Example of Drop Rows with NaN Values

Below is a complete example of how to remove rows with NaN values from DataFrame.


import pandas as pd
import numpy as np
technologies = ({
     'Courses':["Spark",'Java',"Hadoop",'Python',np.nan],
     'Fee' :[20000,np.nan,26000,24000,np.nan],
     'Duration':['30days',np.nan,'35days','40days',np.nan],
     'Discount':[1000,np.nan,2500,None,np.nan]
               })
df = pd.DataFrame(technologies)
print(df)

# Drop all rows with NaN values
df2=df.dropna()
print(df2)
df2=df.dropna(axis=0)

# Reset index after drop
df2=df.dropna().reset_index(drop=True)
print(df2)

# Drop row that has all NaN values
df2=df.dropna(how='all')
print(df2)

# Drop rows that has null on selected columns
df2=df.dropna(subset=['Courses','Fee'])
print(df2)

# Drop Rows with NaN Values inplace
df.dropna(inplace=True)
print(df)

Frequently Asked Questions on Drop Rows with NaN Values

How do I drop rows with NaN values in a Pandas DataFrame?

You can use the dropna() method to remove rows with NaN values in a Pandas DataFrame. By default, it removes any of the rows having at least one NaN value. For example, df.dropna().

What is the syntax for using the dropna() function to remove rows with NaN values?

df.dropna() which returns a new DataFrame with rows that don’t have NaN values.

How can I drop rows with NaN values in a specific column?

You can use the subset parameter of the dropna() method to specify a subset of columns to consider for NaN removal. For example, df.dropna(subset=['specified_column'])

How can I drop rows based on multiple columns with NaN values?

You can specify multiple columns in the subset parameter. For example, df.dropna(subset=['specified_column1', 'specified_column2'])

How can I drop rows if all values in a row are NaN?

You can use the how parameter with the value ‘all’ to drop rows where all values are NaN. For example, df.dropna(how='all')

How can I drop rows in place without creating a new DataFrame?

You can use the inplace=True parameter to modify the DataFrame in place. For example, df.dropna(inplace = True)

Conclusion

In this article, you have learned how to drop rows with NaN/None values in pandas DataFrame using DataFrame.dropna(). Also learned how to remove rows only when all values are NaN/None, remove only when selected columns have NaN values, and remove using the inplace param.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium