pandas.DataFrame.dropna()
is used to drop/remove columns with NaN
/None
values. Python doesn’t support Null hence any missing data is represented as None or NaN values. NaN stands for Not A Number and is one of the common ways to represent the missing values in the data. None/NaN values are one of the major problems in Data Analysis hence before we processing either you need to remove columns that have NaN values or replace NaN with empty for String and replace NaN with zero for numeric columns.
Take Away:
pandas.DataFrame.dropna()
is used to drop columns withNaN
/None
values from DataFrame.numpy.nan
is Not a Number (NaN), which is of Python build-in numeric type float (floating point).None
is of NoneType and it is an object in Python.
1. Quick Examples of Drop Columns with NaN Values
If you are in a hurry, below are some quick examples of how to drop columns with nan values in pandas DataFrame.
# Below are a quick example
# Drop all columns with NaN values
df2=df.dropna(axis=1)
# Drop columns that has all NaN values
df2=df.dropna(axis=1,how='all')
# With threshold
df2=df.dropna(axis=1,thresh=2)
# Drop columns with NaN Values inplace
df.dropna(axis=1,inplace=True)
Now, let’s create a DataFrame with a few rows and columns and execute some examples to learn using drop columns with nan values. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
import pandas as pd
import numpy as np
technologies = ({
'Courses':["Spark",'Java',"Hadoop",'Python','PHP'],
'Fee' :[20000,np.nan,26000,24000,25000],
'Duration':['30days',np.nan,'35days','40days',np.nan],
'Discount':[np.nan,np.nan,None,None,np.nan]
})
df = pd.DataFrame(technologies)
print(df)
Yields below output.
Courses Fee Duration Discount
0 Spark 20000.0 30days NaN
1 Java NaN NaN NaN
2 Hadoop 26000.0 35days NaN
3 Python 24000.0 40days NaN
4 PHP 25000.0 NaN NaN
2. Using DataFrame.dropna() to Drop Columns with NaN Values
By using pandas.DataFrame.dropna() method you can drop columns with Nan (Not a Number) or None values from DataFrame. Note that by default it returns the copy of the DataFrame after removing columns. If you wanted to remove from the existing DataFrame, you should use inplace=True
.
# Drop all columns with NaN values
df2=df.dropna(axis=1)
print(df2)
Yields below output.
Courses
0 Spark
1 Java
2 Hadoop
3 Python
4 PHP
Alternatively, you can also use axis=1
as a param to remove columns with NaN, for example df.dropna(axis=1)
. Use dropna(axis=0) to drop rows with NaN values from pandas DataFrame.
3. Drop Columns with all NaN values in DataFrame
Use how
param to specify how you wanted to remove columns. By default how=any
which specified to remove columns when NaN/None is present on any element (missing data on any element)
Use how='all'
to remove columns that have all NaN/None values (data is missing for all elements in a column)
# Drop columns that has all NaN values
df2=df.dropna(axis=1,how='all')
print(df2)
Yields below output.
Courses Fee Duration
0 Spark 20000.0 30days
1 Java NaN NaN
2 Hadoop 26000.0 35days
3 Python 24000.0 40days
4 PHP 25000.0 NaN
4. Drop Columns with NaN Values inplace of DataFrame
As you have seen, by default dropna()
method doesn’t drop columns from the existing DataFrame, instead, it returns a copy of the DataFrame. If you wanted to drop from the existing DataFrame use inplace=True
.
# Drop columns with NaN Values inplace
df.dropna(axis=1,inplace=True)
print(df)
5. Complete Example of Drop Columns with NaN Values
import pandas as pd
import numpy as np
technologies = ({
'Courses':["Spark",'Java',"Hadoop",'Python','PHP'],
'Fee' :[20000,np.nan,26000,24000,25000],
'Duration':['30days',np.nan,'35days','40days',np.nan],
'Discount':[np.nan,np.nan,None,None,np.nan]
})
df = pd.DataFrame(technologies)
print(df)
# Drop all columns with NaN values
df2=df.dropna(axis=1)
print(df2)
# Drop columns that has all NaN values
df2=df.dropna(axis=1,how='all')
print(df2)
# Drop columns with NaN Values inplace
df.dropna(axis=1,inplace=True)
print(df)
Conclusion
In this article, you have learned how to drop columns with NaN/None values in pandas DataFrame using DataFrame.dropna()
. Also learned how to remove columns only when all values are NaN/None, removing only when selected Columns have NaN values and remove using inplace param.
Happy Learning !!
Related Articles
- Replace NaN with Blank/Empty String in Pandas DataFrame
- Replace NaN Values with Zeroes in a Column in Pandas DataFrame
- Add an Empty Column to a Pandas DataFrame
- Pandas Check If DataFrame is Empty | Examples
- Get the Row Count From Pandas DataFrame
- Pandas DataFrame count() Function
- How to Convert Pandas DataFrame to List?
- Sort Pandas DataFrame by Date (Datetime)
- How to Count Duplicates in Pandas DataFrame
- Pandas DataFrame isna() function.