Pandas – Replace NaN Values with Zero in a Column

Use pandas.DataFrame.fillna() or pandas.DataFrame.replace() methods to replace NaN or None values with Zero (0) in a column of string or integer type. NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. Sometimes None is also used to represent missing values.

In this article, I will explain how to replace NaN values with zero in a column of a pandas DataFrame using different ways.

Take Away:
  • Datafame.fillna() is used to replace NaN/None with any values.
  • DataFrame.replace() does find and replace. It finds NaN values and repalces with a specific value.
  • NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. Sometimes None also used.
  • numpy.nan is use to specify a NaN value. NaN is a type of float.

1. Quick Examples of Replace NaN with Zero

If you are in a hurry, below are some quick examples of how to replace nan values with zeros in pandas DataFrame.


# Below are quick example
#Repalce NaN with zero on all columns 
df2 = df.fillna(0)

#Repalce inplace 
df.fillna(0,inplace=True)

# Replace on single column
df["Fee"] = df["Fee"].fillna(0)

# Replace on multiple columns
df[["Fee","Duration"]] = df[["Fee","Duration"]].fillna(0)

# Using replace()
df["Fee"] = df["Fee"].replace(np.nan, 0)

# Using replace()
df2 = df.replace(np.nan, 0)

Now, let’s create a DataFrame with a few rows and columns and execute some examples to learn replace nan values with zero in a column. Our DataFrame contains the column names Courses, Fee, Duration, and Discount and has some NaN values on a string and integer columns.


# Create pandas DataFrame
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop"],
    'Fee' :[20000,25000, np.nan],
    'Duration':[np.nan,'40days','35days'],
    'Discount':[1000,np.nan,1500]
               }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses      Fee Duration  Discount
0    Spark  20000.0      NaN    1000.0
1  PySpark  25000.0   40days       NaN
2   Hadoop      NaN   35days    1500.0

2. Replace NaN Values with Zero on pandas DataFrame

Use the DataFrame.fillna(0) method to replace NaN/None values with the 0 value. It doesn’t change the object data but returns a new DataFrame.


# Repalce NaN with zero on all columns 
df2 = df.fillna(0)
print(df2)

Yields below output.


   Courses      Fee Duration  Discount
0    Spark  20000.0        0    1000.0
1  PySpark  25000.0   40days       0.0
2   Hadoop      0.0   35days    1500.0

You can do replace on current DataFrame object itself by using inplace param.


#Repalce NaN with zero inplace
df = pd.DataFrame(technologies)
df.fillna(0,inplace=True)
print(df)

3. Replace NaN Values with Zero on a Single or Multiple Columns

Sometimes you may need to update NaN values with 0 on single or multiple columns of DataFrame, let’s see with an example.


# Replace on single column
df = pd.DataFrame(technologies)
df["Fee"] = df["Fee"].fillna(0)
print(df)

Yields below output. This replaces NaN with zero on the Fee column.


   Courses      Fee Duration  Discount
0    Spark  20000.0      NaN    1000.0
1  PySpark  25000.0   40days       NaN
2   Hadoop      0.0   35days    1500.0

You can do the same for multiple columns.


# Replace on multiple columns
df = pd.DataFrame(technologies)
df[["Fee","Duration"]] = df[["Fee","Duration"]].fillna(0)
print(df)

Yields below output.


   Courses      Fee Duration  Discount
0    Spark  20000.0        0    1000.0
1  PySpark  25000.0   40days       NaN
2   Hadoop      0.0   35days    1500.0

4. Replace NaN Values with Zeroes Using replace()

Alternatively, you can also use DataFrame.replace() method to update NaN values with zero. This method takes a minimum of two params; first, a value you wanted to replace (np.nan in our case), and second a value you wanted to replace with (zero in our case). This works the same as fillna() method.


# Using replace
df = pd.DataFrame(technologies)
df["Fee"] = df["Fee"].replace(np.nan, 0)
print(df)

Yields below output.


   Courses      Fee Duration  Discount
0    Spark  20000.0      NaN    1000.0
1  PySpark  25000.0   40days       NaN
2   Hadoop      0.0   35days    1500.0

5. Using DataFrame.replace() on All Columns

You can also use df.replace(np.nan,0) to replace all NaN values with zero.


# Using replace()
df = pd.DataFrame(technologies)
df2 = df.replace(np.nan, 0)
print(df2)

This replaces all columns of DataFrame with zero for Nan values.

6. Complete Example For Replace NaN Values with Zeroes in a Column


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop"],
    'Fee' :[20000,25000, np.nan],
    'Duration':[np.nan,'40days','35days'],
    'Discount':[1000,np.nan,1500]
               }
df = pd.DataFrame(technologies)
print(df)

#Repalce NaN with zero on all columns 
df2 = df.fillna(0)
print(df2)

#Repalce inplace 
df = pd.DataFrame(technologies)
df.fillna(0,inplace=True)
print(df)

# Replace on single column
df = pd.DataFrame(technologies)
df["Fee"] = df["Fee"].fillna(0)
print(df)

# Replace on multiple columns
df = pd.DataFrame(technologies)
df[["Fee","Duration"]] = df[["Fee","Duration"]].fillna(0)
print(df)

# Using replace()
df = pd.DataFrame(technologies)
df["Fee"] = df["Fee"].replace(np.nan, 0)
print(df)

# Using replace()
df = pd.DataFrame(technologies)
df2 = df.replace(np.nan, 0)
print(df2)

Conclusion

In this article, you have learned how to replace NaN values with zeroes in a column of a pandas DataFrame using DataFrame.fillna(), DataFrame.replace() method. Also, you have learned how to replace NaN values with zeroes on single and multiple columns with examples.

Happy Learning !!

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

Pandas – Replace NaN Values with Zero in a Column