Count NaN Values in Pandas DataFrame

We can count the NaN values in Pandas DataFrame using the isna() function and with the sum() function. NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. In pandas handling missing data is very important before you process it.

None/NaN values are one of the major problems in Data Analysis hence before we process either you need to remove columns that have NaN values or replace NaN with empty for String or replace NaN with zero for numeric columns based on your need. In this article, I will explain how to count the NaN values of a specific column/row of DataFrame or the whole DataFrame using the isna() function with the sum() function.

1. Quick Examples of Count NaN Values in Pandas DataFrame


# Below are the quick examples

# Example 1: Count the NaN values in single column
nan_count = df['Fee'].isna().sum()

# Example 2: Count NaN values in multiple columns of DataFrame
nan_count = df.isna().sum() 

# Example 3: Count NaN values of whole DataFrame
nan_count = df.isna().sum().sum()

# Example 4: Count the NaN values in single row
nan_count = df.loc[['r1']].isna().sum().sum()

# Example 5: Count the NaN values in multiple rows
nan_count = df.isna().sum(axis = 1)

Now, let’s create a DataFrame with a few rows and columns using Python Dictionary. Our DataFrame contains the column names CoursesFeeDuration, and Discount and has some NaN values on a string and integer columns.


# Create pandas DataFrame
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark", np.nan, "PySpark", np.nan, "Hadoop"],
    'Fee' :[np.nan, 20000, np.nan, 25000, np.nan],
    'Duration':[np.nan,'40days','35days', np.nan, np.nan],
    'Discount':[np.nan, 1000, np.nan, np.nan, 1500]
               }
df = pd.DataFrame(technologies, index = ['r1', 'r2', 'r3', 'r4', 'r5'])
print(df)

Yields below output. Note that in Pandas nan can be defined by using NumPy np.nan.


# Output:
    Courses      Fee Duration  Discount
r1    Spark      NaN      NaN       NaN
r2      NaN  20000.0   40days    1000.0
r3  PySpark      NaN   35days       NaN
r4      NaN  25000.0      NaN       NaN
r5   Hadoop      NaN      NaN    1500.0

2. Pandas Count NaN in a Column

In Pandas DataFrame.isna() function is used to check the missing values and sum() is used to count the NaN values in a column. In this example, I will count the NaN values of a single column from DataFrame using the below syntax. Let’s apply these functions and count the NaN vales. For example,


# Count the NaN values in single column
nan_count = df['Fee'].isna().sum()
print(nan_count)

# Output:
# 3

3. Count NaN Value in All Columns of Pandas DataFrame

You can also get or find the count of NaN values of all columns in a Pandas DataFrame using the isna() function with sum() function. df.isna().sum() this syntax returns the number of NaN values in all columns of a pandas DataFrame in Python.


# Count NaN values in multiple columns of DataFrame
nan_count = df.isna().sum()
print(nan_count )

# Output:
# Courses     2
# Fee         3
# Duration    3
# Discount    3
# dtype: int64

4. Count NaN Value in the Whole Pandas DataFrame

If we want to count the total number of NaN values in the whole DataFrame, we can use df.isna().sum().sum(), it will return the total number of NaN values in the entire DataFrame.


# Count NaN values of whole DataFrame
nan_count = df.isna().sum().sum()
print(nan_count )

# Output:
# 11

5. Pandas Count NaN Values in Single Row

So far, we have learned how to count the NaN values in a single/all columns of DataFrame and the whole DataFrame using isna() function with sum(). Now, we will learn how to count the NaN values in a single row of DataFrame.

In order to count NaN values in a single row first, we select the particular row by using Pandas.DataFrame.loc[] attribute and then apply isna() and the sum() functions.


# Count the NaN values in single row
nan_count = df.loc[['r1']].isna().sum().sum()
print(nan_count)

# Output:
# 3

6. Pandas Count NaN Values in All Rows

Using the above functions we can also count the NaN values of all rows. By default sum() function adds all column values whereas to get rows count we have to pass the axis param as '1' into the sum() function, and it will add all row values.

If you want drop rows with NaN values in a DataFrame, you can drop using drop() function.


# Count the NaN values in multiple rows
nan_count = df.isna().sum(axis = 1)
print(nan_count)

# Output:
# r1    3
# r2    1
# r3    2
# r4    3
# r5    2
# dtype: int64

7. Conclusion

In this article, I have explained how to count the NaN values of a specific column/row of Pandas DataFrame or the entire DataFrame using the isna() function with the sum() function with several examples.

References

Leave a Reply