Pandas Drop Rows with NaN Values in DataFrame

Use dropna() function to drop rows with NaN/None values in pandas DataFrame. Python doesn’t support Null hence any missing data is represented as None or NaN. NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. None/NaN values are one of the major problems in Data Analysis hence before we process either you need to drop rows that have NaN values or replace NaN with empty for Strings and replace NaN with zero for numeric columns.

pandas Drop Rows with NaN key Points

  • dropna() is used to drop rows with NaN/None values from DataFrame.
  • numpy.nan is Not a Number (NaN), which is of Python build-in numeric type float (floating point).
  • None is of NoneType and it is an object in Python.

1. Quick Examples of Drop Rows with NaN Values

If you are in a hurry, below are some quick examples of how to drop rows with nan values in DataFrame.


# Below are the quick examples
  
# Drop all rows with NaN values
df2=df.dropna()
df2=df.dropna(axis=0)

# Reset index after drop
df2=df.dropna().reset_index(drop=True)

# Drop row that has all NaN values
df2=df.dropna(how='all')

# Drop rows that has NaN values on selected columns
df2=df.dropna(subset=['Courses','Fee'])

# With threshold, 
# Keep only the rows with at least 2 non-NA values.
df2=df.dropna(thresh=2)

# Drop Rows with NaN Values inplace
df.dropna(inplace=True)

Now, let’s create a DataFrame with a few rows and columns and execute some examples to learn using drop rows with nan values. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
import numpy as np
technologies = ({
     'Courses':["Spark",'Java',"Hadoop",'Python',np.nan],
     'Fee' :[20000,np.nan,26000,24000,np.nan],
     'Duration':['30days',np.nan,'35days','40days',np.nan],
     'Discount':[1000,np.nan,2500,None,np.nan]
               })
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
  Courses      Fee Duration  Discount
0   Spark  20000.0   30days    1000.0
1    Java      NaN      NaN       NaN
2  Hadoop  26000.0   35days    2500.0
3  Python  24000.0   40days       NaN
4     NaN      NaN      NaN       NaN

2. Drop Rows with NaN Values

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True.


# Drop all rows that have NaN/None values
df2=df.dropna()
print(df2)

Yields below output.


# Output:
  Courses      Fee Duration  Discount
0   Spark  20000.0   30days    1000.0
2  Hadoop  26000.0   35days    2500.0

Alternatively, you can also use axis=0 as a param to remove rows with NaN, for example df.dropna(axis=0). Use dropna(axis=1) to drop all columns with NaN values from DataFrame.

Post dropping rows with NaN, sometimes you may require to reset the index, you can do so using DataFrame.reset_index() method.


# Reset index after drop
df2=df.dropna().reset_index(drop=True)
print(df2)

Yields below output.


# Output:
  Courses      Fee Duration  Discount
0   Spark  20000.0   30days    1000.0
1  Hadoop  26000.0   35days    2500.0

3. Drop NaN’s for all Columns in DataFrame

Use how param to specify how you wanted to remove rows. By default how=any which specified to remove rows when NaN/None is present on any element (missing data on any element)

Use how='all' to remove rows that have all NaN/None values in a row(data is missing for all elements in a row)


# Drop rows that has all NaN values
df2=df.dropna(how='all')
print(df2)

Yields below output.


# Output:
  Courses      Fee Duration  Discount
0   Spark  20000.0   30days    1000.0
1    Java      NaN      NaN       NaN
2  Hadoop  26000.0   35days    2500.0
3  Python  24000.0   40days       NaN

4. Drop NaN Values on Selected Columns from List

Sometimes you may be required to drop rows only when selected columns have NaN/None values in DataFrame, you can achieve this by using subset param. This param takes a list of label names.


# Drop rows that has NaN values on selected columns
df2=df.dropna(subset=['Courses','Fee'])
print(df2)

Yields below output.


# Output:
  Courses      Fee Duration  Discount
0   Spark  20000.0   30days    1000.0
2  Hadoop  26000.0   35days    2500.0
3  Python  24000.0   40days       NaN

5. Drop Rows with NaN Values inplace

As you have seen, by default dropna() method doesn’t drop rows from the existing DataFrame, instead, it returns a copy of the DataFrame. If you wanted to drop from the existing DataFrame use inplace=True.


# Drop Rows with NaN Values inplace
df.dropna(inplace=True)
print(df)

6. Complete Example of Drop Rows with NaN Values

Below is a complete example of how to remove rows with NaN values from DataFrame.


import pandas as pd
import numpy as np
technologies = ({
     'Courses':["Spark",'Java',"Hadoop",'Python',np.nan],
     'Fee' :[20000,np.nan,26000,24000,np.nan],
     'Duration':['30days',np.nan,'35days','40days',np.nan],
     'Discount':[1000,np.nan,2500,None,np.nan]
               })
df = pd.DataFrame(technologies)
print(df)

# Drop all rows with NaN values
df2=df.dropna()
print(df2)
df2=df.dropna(axis=0)

# Reset index after drop
df2=df.dropna().reset_index(drop=True)
print(df2)

# Drop row that has all NaN values
df2=df.dropna(how='all')
print(df2)

# Drop rows that has null on selected columns
df2=df.dropna(subset=['Courses','Fee'])
print(df2)

# Drop Rows with NaN Values inplace
df.dropna(inplace=True)
print(df)

Conclusion

In this article, you have learned how to drop rows with NaN/None values in pandas DataFrame using DataFrame.dropna(). Also learned how to remove rows only when all values are NaN/None, removing only when selected columns have NaN values and remove using inplace param.

Happy Learning !!

References

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing Pandas Drop Rows with NaN Values in DataFrame