You are currently viewing How can I check for NaN values in Python?

How do I check for NaN values in Python? Handling data, especially when it contains missing or undefined values, is a common challenge in data analysis and programming. In Python, these missing data points are represented as NaN, short for “Not-a-Number”. Checking and managing NaN values is a crucial aspect of ensuring the accuracy and reliability of your data-driven projects.

Advertisements

Despite Python not having a “null” keyword, it provides powerful tools through libraries like NumPy and Pandas to help you identify and deal with these NaN values effectively. In this article, we will explore various techniques and Python libraries that will empower you to confidently check for and manage NaN values, ensuring your data analysis processes remain robust and dependable.

1. Quick Examples – Check NAN Values in Python

These examples provide a quick overview of how to check for NaN values using pandas and NumPy. These examples will be explained in more detail later on.


# Example 1: Using pandas 'isna()' function
import pandas as pd

# Check for NaN values using 'isna()'
nan_check_df = df.isna()
print("Example 1 - Using 'isna()':")
print(nan_check_df)

# Example 2: Using NumPy 'isnan()' function
import numpy as np
# Create a NumPy array with NaN values
arr = np.array([1.0, np.nan, 3.0, 4.0, np.nan])
# Check for NaN values using 'isnan()'
nan_check_arr = np.isnan(arr)

# Example 3: Using pandas 'isnull()' function
# (Equivalent to 'isna()' - we'll discuss the difference later)
df = pd.DataFrame(data)

# Check for NaN values using 'isnull()'
nan_check_df = df.isnull()

2. Introduction & Creating Variables with NaN values

In Python, NaN (Not-a-Number) values are represented using the float('nan') construct. This creates a special floating-point value that indicates a missing or undefined numerical value. Similarly, you can also use numpy.nan to set null values.

See the following different ways to create NaN values in Python.


# Examples to Create NaN values in Python
import math
import numpy as np

# Create a NaN value using float('nan')
# This is from math library
nan_value = float('nan')

# Create a NumPy array with NaN values
nan_array = np.array([np.nan, 1, 2, np.nan])

These missing or undefined data points can affect data integrity, and disrupt mathematical operations, leading to unreliable insights.

  1. Preserving Data Integrity: NaN values can corrupt datasets, rendering entire analyses unreliable.
  2. Mathematical Consistency: NaNs can propagate through calculations, introducing errors and unintended outcomes.
  3. Statistical Precision: They can skew summary statistics, distorting the overall insights.
  4. Machine Learning Reliability: Many algorithms struggle to process data containing NaNs, negatively impacting model performance.
  5. Visualization Accuracy: NaNs can influence the rendering and interpretation of visual data representations.

2. Check for NaN values in Python

You can use Python math.isnan() to check if the value of the variable is NaN. It returns true if the value is NaN otherwise, false.


# Import math
import math

value = 5.20
if math.isnan(value):
    print("Value is NaN")

3. Check for NaN values in Numpy Array

NumPy, short for Numerical Python, is a powerful library for numerical computing in Python. It provides support for creating and manipulating arrays and matrices of data. In real-world data analysis and scientific computing, it’s common to encounter missing or undefined values represented as NaN (Not-a-Number).

Before we deep dive into the methods, let’s create a sample NumPy array containing NaN values:


# Creating a sample Numpy Array with NaN values
import numpy as np

# Create a sample NumPy array with NaN values
np_array = np.array([1.0, 2.0, np.nan, 4.0, np.nan])
print(np_array)
# Output
# [ 1.  2. nan  4. nan]

3.1 Using np.isnan() to Check if Value is NaN

The simplest way to check for NaN values in a NumPy array is by using the np.isnan() function. This function returns a boolean array with True where the input array contains NaN and False where it does not.


# Checking for NaN values 
nan_indices = np.isnan(np_array)
print(nan_indices)

# Output:
# [False False  True False  True]

3.2 Using np.isnan() with Element-wise Comparison

NaNs are the only values that satisfy the condition x != x, where x is a NumPy array element. So another approach is to perform an element-wise comparison to check for NaN values.


# Check for NaN values using Comparision
nan_mask = np_array != np_array
print(nan_mask)

# Output:
# [False False  True False  True]

3.3 Using np.isnan() and np.where()

You can use np.where() in combination with np.isnan() to check the indices of NaN values in the array. It will return an array of the inices of the NaN value.


# Using np.isnan and np.where
nan_indices = np.where(np.isnan(np_array))
print(nan_indices)
# Output:
# (array([2, 4], dtype=int64),)

3.4 Using np.isnan() and Boolean Indexing

Boolean indexing refers to the practice of using Boolean (True or False) values to select or filter elements from a data structure like a NumPy array, a pandas DataFrame, or a list. To extract the actual NaN values from the array, you can use Boolean indexing.


# Using the is.nan with Boolean indexing 
nan_values = np_array[np.isnan(np_array)]
print(nan_values)

# Output:
# [nan nan]

4. Check for NaN values in Pandas DataFrame

In Pandas, NaN values are often encountered when working with data, and it’s essential to identify and handle them properly. Before we dive into the methods to detect NaN values, let’s create a sample DataFrame. We’ll use a NumPy array to generate the data and then convert it into a Pandas DataFrame:


# Creating a Pandas DataFrame with NaN values
import numpy as np
import pandas as pd

# Create a NumPy array
data = np.array([[1.0, 2.0, np.nan],
                 [4.0, np.nan, 6.0],
                 [7.0, 8.0, 9.0]])

# Convert the NumPy array into a DataFrame
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
print(df)

# Output:
#      A    B    C
# 0  1.0  2.0  NaN
# 1  4.0  NaN  6.0
# 2  7.0  8.0  9.0

Now that we have our sample DataFrame df, let’s explore the different methods to check for NaN values.

4.1 Using df.isna() or df.isnull()

Pandas provides two similar methods, isna() and isnull(), that can be used interchangeably to check for NaN values. These methods return a DataFrame of the same shape as the original, with True at locations where NaN values exist and False where data is present.


# CHeck NaN values using df.isna()
nan_check = df.isna()
print(nan_check)
# Output:
#        A      B      C
# 0  False  False   True
# 1  False   True  False
# 2  False  False  False

4.2 Using notna() or notnull()

You can also the notna() or notnull() methods to check for non-NaN values. These methods return a DataFrame with True where data is present and False where NaN values exist. This can be mainly used if you want to find not null values.


# Using notna() function
non_nan_check = df.notna()
print(non_nan_check)
# Output:
#       A      B      C
# 0  True   True  False
# 1  True  False   True
# 2  True   True   True

4.3 Using any() to Check Columns for NaN

You can also check if any NaN values exist in specific columns. To do this, you can use the any() function after applying isna() or isnull() to your DataFrame. This will return a Series with True for columns containing NaN values and False otherwise.


# Using any() to check for NaN values
nan_in_columns = df.isna().any()
print(nan_in_columns)

# Output:
# A    False
# B     True
# C     True
# dtype: bool

5. Handling NaN Values

Dealing with missing values is crucial because they can skew results, lead to inaccurate conclusions, and affect the performance of machine learning models. There are multiple methods to handle NaN values.

One of them is Mean or median imputation. It involves replacing NaN values in a dataset with the mean or median value of the respective feature. This is a straightforward method but can be useful for handling missing numerical data.


# Handling NaN values
import pandas as pd

# Sample DataFrame with NaN values
data = {'A': [1, 2, 3, None, 5],
        'B': [5, 2, None, 8, 4]}
df = pd.DataFrame(data)

# Mean imputation
mean_imputed_df = df.fillna(df.mean())

# Median imputation
median_imputed_df = df.fillna(df.median())

6. Summary and Conclusion

In this article, we have learned the techniques to check for NaN values in Python. If you’re dealing with missing data in a small dataset or grappling with large-scale data analysis, this skill will serve as a valuable asset in your data science toolkit. If you have any questions or would like to share your insights, please don’t hesitate to reach out in the comment section below.

Happy Coding!!!