To drop columns in a Pandas DataFrame that contain NaN or None values, you can use the dropna()
method along with the axis=1
argument to specify that you’re operating on columns. In this article, I will explain the drop columns with NaN or None values in Pandas DataFrame.
Key Points –
- The
dropna()
method in Pandas DataFrame is used to eliminate rows or columns with missing values (NaN or None). - To drop columns specifically, you use the
axis=1
argument within thedropna()
method. This signifies that you’re operating on columns rather than rows. - By default,
dropna()
drops any column containing at least one NaN or None value. You can change this behavior using additional parameters such asthresh
specifying a minimum number of non-null values required to keep a column. - To modify the original DataFrame directly, use the
inplace=True
parameter; otherwise, a new DataFrame with dropped columns will be returned.
Quick Examples of Drop Columns with NaN Values
Following are quick examples of drop columns with NaN or None values in Pandas DataFrame.
# Quick examples of drop columns with NaN values
# Example 1: Drop all columns with NaN values
df2=df.dropna(axis=1)
# Example 2: Drop columns that has all NaN values
df2=df.dropna(axis=1,how='all')
# Example 3: With threshold
df2=df.dropna(axis=1,thresh=2)
# Example 4: Drop columns with NaN Values inplace
df.dropna(axis=1,inplace=True)
Now, let’s create a DataFrame with a few rows and columns, execute these examples, and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
import pandas as pd
import numpy as np
technologies = ({
'Courses':["Spark",'Java',"Hadoop",'Python','PHP'],
'Fee' :[20000,np.nan,26000,24000,25000],
'Duration':['30days',np.nan,'35days','40days',np.nan],
'Discount':[np.nan,np.nan,None,None,np.nan]
})
df = pd.DataFrame(technologies)
print(df)
Yields below output.
Drop Columns with NaN Values Using DataFrame.dropna()
The DataFrame.dropna() method in pandas is used to remove missing (NaN) values from a DataFrame. When applied with axis=1
, it drops columns containing NaN values.
Note that by default, the DataFrame.dropna()
method creates and returns a new DataFrame after removing the specified columns or rows, while leaving the original DataFrame unchanged. However, to modify the existing DataFrame directly, you should set the inplace
parameter to True
.
# Drop all columns with NaN values
df2=df.dropna(axis=1)
print("After dropping columns with NaN Values:", df2)
Yields below output.
You can also specify axis=1 as a parameter to remove columns with NaN values, as shown in df.dropna(axis=1)
. Conversely, dropna(axis=0) is used to eliminate rows with NaN values from a Pandas DataFrame.
Drop Columns with all NaN values in the DataFrame
Use how
param to specify how you want to remove columns. By default how=any
which specified to remove columns when NaN/None is present on any element (missing data on any element)
To drop columns that have all NaN values, you can use the dropna()
method with the axis=1
argument and set how='all'
. This ensures that only columns with all NaN values are dropped.
# Drop columns that has all NaN values
df2=df.dropna(axis=1,how='all')
print(df2)
Yields below output.
# Output:
Courses Fee Duration
0 Spark 20000.0 30days
1 Java NaN NaN
2 Hadoop 26000.0 35days
3 Python 24000.0 40days
4 PHP 25000.0 NaN
Drop Columns with NaN Values inplace of DataFrame
As you have seen, by default dropna()
method doesn’t drop columns from the existing DataFrame, instead, it returns a copy of the DataFrame. If you want to drop from the existing DataFrame use inplace=True
.
# Drop columns with NaN Values inplace
df.dropna(axis=1,inplace=True)
print("After dropping columns with NaN Values:", df2)
Yields below output.
After dropping columns with NaN Values:
Courses
0 Spark
1 Java
2 Hadoop
3 Python
4 PHP
Complete Example of Drop Columns with NaN Values
import pandas as pd
import numpy as np
technologies = ({
'Courses':["Spark",'Java',"Hadoop",'Python','PHP'],
'Fee' :[20000,np.nan,26000,24000,25000],
'Duration':['30days',np.nan,'35days','40days',np.nan],
'Discount':[np.nan,np.nan,None,None,np.nan]
})
df = pd.DataFrame(technologies)
print(df)
# Drop all columns with NaN values
df2=df.dropna(axis=1)
print(df2)
# Drop columns that has all NaN values
df2=df.dropna(axis=1,how='all')
print(df2)
# Drop columns with NaN Values inplace
df.dropna(axis=1,inplace=True)
print(df)
FAQs on Drop Columns with NaN Values
Dropping columns with NaN values is often done to clean and simplify the dataset. NaN values can introduce inconsistencies and errors in data analysis and modeling processes.
You can use functions like isnull()
or isna()
to identify NaN values. For example, the combination of df.isnull().any()
returns columns with NaN values.
You can use the dropna()
method to drop the columns with NaN values. For example, df.dropna(axis=1)
will drop columns with any NaN values.
You can specify the columns with NaN values to drop using subset
parameter of dropna()
function. For example, df.dropna(subset=['column1', 'column2'])
By default, the dropna()
method does not modify the original DataFrame. If you want to modify the original DataFrame in place, you can use the inplace=True
parameter.
Conclusion
In this article, I have explained the DataFrame.dropna()
method in pandas is a powerful tool for handling missing values in a DataFrame. By default, it returns a copy of the DataFrame after removing rows or columns with any NaN values. However, you can customize its behavior by using parameters such as axis
, how
, and inplace
.
Happy Learning !!
Related Articles
- Pandas Drop First/Last N Columns From DataFrame
- How to drop the Pandas column by index?
- Drop Pandas first column from DataFrame.
- Drop the last column from the DataFrame
- Drop multiple columns by index
- Pandas Replace Values based on Condition
- Pandas Replace Column value in DataFrame
- Remove NaN From Pandas Series
- Pandas Drop Multiple Columns From DataFrame
- How to Drop Duplicate Columns in Pandas DataFrame?