• Post author:
  • Post category:Pandas
  • Post last modified:April 30, 2024
  • Reading time:14 mins read
You are currently viewing Pandas Drop Columns with NaN or None Values

To drop columns in a Pandas DataFrame that contain NaN or None values, you can use the dropna() method along with the axis=1 argument to specify that you’re operating on columns. In this article, I will explain how to drop columns with NaN or None values in pandas DataFramw with examples.

Advertisements

Key Points –

  • The dropna() method in Pandas DataFrame is used to eliminate rows or columns with missing values (NaN or None).
  • To drop columns specifically, you use the axis=1 argument within the dropna() method. This signifies that you’re operating on columns rather than rows.
  • By default, dropna() drops any column containing at least one NaN or None value. You can change this behavior using additional parameters such as thresh to specify a minimum number of non-null values required to keep a column.
  • If you want to modify the original DataFrame in place, you can use the inplace=True parameter. Otherwise, a new DataFrame with dropped columns will be returned.

Quick Examples of Drop Columns with NaN Values

If you are in a hurry, below are some quick examples of how to drop columns with NaN or None values in Pandas DataFrame.


# Quick examples of drop columns with NaN values

# Example 1: Drop all columns with NaN values
df2=df.dropna(axis=1)

# Example 2: Drop columns that has all NaN values
df2=df.dropna(axis=1,how='all')

# Example 3: With threshold
df2=df.dropna(axis=1,thresh=2)

# Example 4: Drop columns with NaN Values inplace
df.dropna(axis=1,inplace=True)

To run some examples of pandas drop columns with NaN or None values, let’s create Pandas DataFrame using data from a dictionary.


import pandas as pd
import numpy as np
technologies = ({
     'Courses':["Spark",'Java',"Hadoop",'Python','PHP'],
     'Fee' :[20000,np.nan,26000,24000,25000],
     'Duration':['30days',np.nan,'35days','40days',np.nan],
     'Discount':[np.nan,np.nan,None,None,np.nan]
               })
df = pd.DataFrame(technologies)
print(df)

Yields below output.

pandas drop columns NaN

Using DataFrame.dropna() to Drop Columns with NaN Values

The DataFrame.dropna() method in pandas is used to remove missing (NaN) values from a DataFrame. When applied with axis=1, it drops columns containing NaN values.

Note that by default, the DataFrame.dropna() method returns a new DataFrame with the specified columns (or rows) removed, leaving the original DataFrame unchanged. However, if you want to modify the existing DataFrame directly, you should set the inplace parameter to True.


# Drop all columns with NaN values
df2=df.dropna(axis=1)
print("After dropping columns with NaN Values:", df2)

Yields below output.

pandas drop columns NaN

Alternatively, you can also use axis=1 as a param to remove columns with NaN, for example df.dropna(axis=1). Use dropna(axis=0) to drop rows with NaN values from pandas DataFrame.

Drop Columns with all NaN values in the DataFrame

Use how param to specify how you want to remove columns. By default how=any which specified to remove columns when NaN/None is present on any element (missing data on any element)

You can use the how='all' parameter in the dropna() method to drop columns with all NaN values, meaning that data is missing for all elements in a column.


# Drop columns that has all NaN values
df2=df.dropna(axis=1,how='all')
print(df2)

Yields below output.


# Output:
  Courses      Fee Duration
0   Spark  20000.0   30days
1    Java      NaN      NaN
2  Hadoop  26000.0   35days
3  Python  24000.0   40days
4     PHP  25000.0      NaN

Drop Columns with NaN Values inplace of DataFrame

As you have seen, by default dropna() method doesn’t drop columns from the existing DataFrame, instead, it returns a copy of the DataFrame. If you want to drop from the existing DataFrame use inplace=True.


# Drop columns with NaN Values inplace
df.dropna(axis=1,inplace=True)
print("After dropping columns with NaN Values:", df2)

Yields below output.


After dropping columns with NaN Values:  
   Courses
0   Spark
1    Java
2  Hadoop
3  Python
4     PHP

Complete Example of Drop Columns with NaN Values


import pandas as pd
import numpy as np
technologies = ({
     'Courses':["Spark",'Java',"Hadoop",'Python','PHP'],
     'Fee' :[20000,np.nan,26000,24000,25000],
     'Duration':['30days',np.nan,'35days','40days',np.nan],
     'Discount':[np.nan,np.nan,None,None,np.nan]
               })
df = pd.DataFrame(technologies)
print(df)

# Drop all columns with NaN values
df2=df.dropna(axis=1)
print(df2)
# Drop columns that has all NaN values
df2=df.dropna(axis=1,how='all')
print(df2)

# Drop columns with NaN Values inplace
df.dropna(axis=1,inplace=True)
print(df)

FAQs on Drop Columns with NaN Values

Why should I drop columns with NaN values?

Dropping columns with NaN values is often done to clean and simplify the dataset. NaN values can introduce inconsistencies and errors in data analysis and modeling processes.

How do I check for NaN values in my dataset?

You can use functions like isnull() or isna() to identify NaN values. For example, the combination of df.isnull().any() returns columns with NaN values.

How can I drop columns with NaN values using pandas?

You can use the dropna() method to drop the columns with NaN values. For example, df.dropna(axis=1) will drop columns with any NaN values.

How Can I drop only specific columns with NaN values?

You can specify the columns with NaN values to drop using subset parameter of dropna() function. For example, df.dropna(subset=['column1', 'column2'])

Does dropping columns with NaN values affect the original dataset?

By default, the dropna() method does not modify the original DataFrame. If you want to modify the original DataFrame in place, you can use the inplace=True parameter.

Conclusion

In this article, you have learned how to DataFrame.dropna() method in pandas is a powerful tool for handling missing values in a DataFrame. By default, it returns a copy of the DataFrame after removing rows or columns with any NaN values. However, you can customize its behavior by using parameters such as axis, how, and inplace.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply