We can count the NaN values in Pandas DataFrame using the isna()
function and with the sum()
function. NaN
stands for Not A Number and is one of the common ways to represent the missing value in the data. In pandas handling missing data is very important before you process it.
None/NaN values are one of the major problems in Data Analysis hence before we process either you need to remove columns that have NaN values or replace NaN with empty for String or replace NaN with zero for numeric columns based on your need. In this article, I will explain how to count the NaN values of a specific column/row of DataFrame or the whole DataFrame using the isna() function with the sum() function.
1. Quick Examples of Count NaN Values in Pandas DataFrame
# Below are the quick examples
# Example 1: Count the NaN values in single column
nan_count = df['Fee'].isna().sum()
# Example 2: Count NaN values in multiple columns of DataFrame
nan_count = df.isna().sum()
# Example 3: Count NaN values of whole DataFrame
nan_count = df.isna().sum().sum()
# Example 4: Count the NaN values in single row
nan_count = df.loc[['r1']].isna().sum().sum()
# Example 5: Count the NaN values in multiple rows
nan_count = df.isna().sum(axis = 1)
Now, let’s create a DataFrame with a few rows and columns using Python Dictionary. Our DataFrame contains the column names Courses
, Fee
, Duration
, and Discount
and has some NaN values on a string and integer columns.
# Create pandas DataFrame
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark", np.nan, "PySpark", np.nan, "Hadoop"],
'Fee' :[np.nan, 20000, np.nan, 25000, np.nan],
'Duration':[np.nan,'40days','35days', np.nan, np.nan],
'Discount':[np.nan, 1000, np.nan, np.nan, 1500]
}
df = pd.DataFrame(technologies, index = ['r1', 'r2', 'r3', 'r4', 'r5'])
print(df)
Yields below output. Note that in Pandas nan can be defined by using NumPy np.nan
.
# Output:
Courses Fee Duration Discount
r1 Spark NaN NaN NaN
r2 NaN 20000.0 40days 1000.0
r3 PySpark NaN 35days NaN
r4 NaN 25000.0 NaN NaN
r5 Hadoop NaN NaN 1500.0
2. Pandas Count NaN in a Column
In Pandas DataFrame.isna()
function is used to check the missing values and sum() is used to count the NaN values in a column. In this example, I will count the NaN values of a single column from DataFrame using the below syntax. Let’s apply these functions and count the NaN vales. For example,
# Count the NaN values in single column
nan_count = df['Fee'].isna().sum()
print(nan_count)
# Output:
# 3
3. Count NaN Value in All Columns of Pandas DataFrame
You can also get or find the count of NaN values of all columns in a Pandas DataFrame using the isna() function with sum() function. df.isna().sum()
this syntax returns the number of NaN values in all columns of a pandas DataFrame in Python.
# Count NaN values in multiple columns of DataFrame
nan_count = df.isna().sum()
print(nan_count )
# Output:
# Courses 2
# Fee 3
# Duration 3
# Discount 3
# dtype: int64
4. Count NaN Value in the Whole Pandas DataFrame
If we want to count the total number of NaN values in the whole DataFrame, we can use df.isna().sum().sum()
, it will return the total number of NaN values in the entire DataFrame.
# Count NaN values of whole DataFrame
nan_count = df.isna().sum().sum()
print(nan_count )
# Output:
# 11
5. Pandas Count NaN Values in Single Row
So far, we have learned how to count the NaN values in a single/all columns of DataFrame and the whole DataFrame using isna() function with sum(). Now, we will learn how to count the NaN values in a single row of DataFrame.
In order to count NaN values in a single row first, we select the particular row by using Pandas.DataFrame.loc[] attribute and then apply isna() and the sum() functions.
# Count the NaN values in single row
nan_count = df.loc[['r1']].isna().sum().sum()
print(nan_count)
# Output:
# 3
6. Pandas Count NaN Values in All Rows
Using the above functions we can also count the NaN values of all rows. By default sum() function adds all column values whereas to get rows count we have to pass the axis
param as '1'
into the sum() function, and it will add all row values.
If you want drop rows with NaN values in a DataFrame, you can drop using drop() function.
# Count the NaN values in multiple rows
nan_count = df.isna().sum(axis = 1)
print(nan_count)
# Output:
# r1 3
# r2 1
# r3 2
# r4 3
# r5 2
# dtype: int64
7. Conclusion
In this article, I have explained how to count the NaN values of a specific column/row of Pandas DataFrame or the entire DataFrame using the isna() function with the sum() function with several examples.