pandas.DataFrame.fillna() – Explained by Examples

  • Post author:
  • Post category:Pandas
  • Post last modified:November 28, 2023
  • Reading time:11 mins read

pandas.DataFrame.fillna() method is used to fill column (one or multiple columns) containing NA/NaN/None with 0, empty, blank, or any specified values etc. NaN is considered a missing value. When you dealing with machine learning, handling missing values is very important, not handling these will result in a side effect with an incorrect result.

CSV files are received from third-party sources, most of the time it has null values for blanks/empty. By using pandas.read_csv() we can load the CSV file into DataFrame and pandas converts all null values into NaN in DataFrame.

Either you can drop rows with NaN values using pandas.DataFrame.dropna() or handle NaN by filling with specific values using the fillna() method.

pandas fillna Key Points

  • It is used to fill NaN values with specified values (0, blank, e.t.c).
  • If you want to consider  infinity (inf and -inf) to be “NA” in computations, you can set pandas.options.mode.use_inf_as_na = True.
  • Besides NaN, pandas None also considered as missing.

1. Quick Examples of pandas fillna()

Below are bow quick examples and usage of the pandas fillna() method.


# Below are quick examples.

# Example 1: Fillna() on all columns
df2=df.fillna('None')

# Example 2: Fillna() on once column
df2['Discount'] =  df['Discount'].fillna(0)

# Example 3: Fillna() on multiple columns
df2[['Discount','Fee']] =  df[['Discount','Fee']].fillna(0)

# Example 4: Fillna() on multiple columns with different values
df2 =  df.fillna(value={'Discount':0,'Fee':10000})

# Example 5: Fill with limit
df2=df.fillna(value={'Discount':0,'Fee':0},limit=1)

2. pandas.DataFrame.fillna() Syntax

Below is the syntax of pandas.DataFrame.fillna() method. This takes parameters value, method, axis, inplace, limit, and downcast and returns a new DataFrame. When inplace=True is used, it returns None as the replace happens on the existing DataFrame object.


# Syntax of pandas.DataFrame.fillna()
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
  • valueTakes either scalar, dict, Series, or DataFrame but not list.
  • methodTakes one of these values {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}. Default None.
  • axis0 or ‘index’, 1 or ‘columns’. Used to specify the axis to fill the values.
  • inplaceDefault False. When used True, it updates the existing DataFrame object.
  • limitSpecify how many fills should happen. This is the maximum number of consecutive NaN values replaced with specified values.
  • downcastIt takes a dict of key-value pair that specifies data type to downcast. Like Float64 to int64, date to string e.t.c

Let’s create a DataFrame


# Create DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame(({
     'Courses':["Spark",'Java',"Scala",'Python'],
     'Fee' :[20000,np.nan,26000,24000],
     'Duration':['30days','40days', pd.NA,'40days'],
     'Discount':[1000,np.nan,2500,None]
               }))
print("Create DataFrame:\n", df)

Yields below output.

pandas dataframe fillna

3. Pandas fillna NaN with None Value

fillna() method is used to fill NaN/NA values on a specified column or on an entire DataaFrame with any given value. You can specify modifications using inplace, or limit how many fillings to perform, or choose an axis whether to fill on rows/columns etc. The Below example fills all NaN values with the None value.


# Fillna to replace all NaN
df2 = df.fillna('None')
print("After replacing all NAN/NA values with None:\n", df2)

Yields below output.

pandas dataframe fillna

In order to update the existing DataFrame use df.fillna('None', inplace=True). You can also use pandas.DataFrame.replace() method to replace NaN with 0 value. similarly, you can also replace NaN with blank or empty string.

4. Pandas fillna on One Column

The above example filled all NaN values on the entire DataFrame. Sometimes you would need to replace just on one column, you can do so by selecting the DataFrame column for the fillna() method.


# Fillna on one column
df2['Discount'] =  df['Discount'].fillna('0')
print(df2)

# Outputs:
#  Courses      Fee Duration Discount
# 0   Spark  20000.0   30days   1000.0
# 1    Java     None   40days        0
# 2   Scala  26000.0     None   2500.0
# 3  Python  24000.0   40days        0

5. fillna on Multiple Columns

Use the pandas fillna() method to fill a specified value on multiple DataFrame columns, the below example updates columns Discount and Fee with 0 for NaN values.


# Fillna() on multiple columns
df2[['Discount','Fee']] =  df[['Discount','Fee']].fillna('0')
print(df2)

# Outputs:
  Courses      Fee Duration Discount
0   Spark  20000.0   30days   1000.0
1    Java        0   40days        0
2   Scala  26000.0     None   2500.0
3  Python  24000.0   40days        0

Now, let’s see how to fill different values for each column. The below example updates a column Discount with 0 and column Fee with 10000 for NaN values.


# Fillna() on multiple columns
df2 =  df.fillna(value={'Discount':'0','Fee':10000})
print(df2)

# Outputs:
#  Courses      Fee Duration Discount
# 0   Spark  20000.0   30days   1000.0
# 1    Java  10000.0   40days        0
# 2   Scala  26000.0      NaN   2500.0
# 3  Python  24000.0   40days        0

6. Fill with limit param

To control how to NaN values to fill use the limit param. Compare the below result with the above one to see the differences.


# Fill with limit
df2=df.fillna(value={'Discount':0,'Fee':0},limit=1)
print(df2)

# Outputs:
#  Courses      Fee Duration  Discount
# 0   Spark  20000.0   30days    1000.0
# 1    Java      0.0   40days       0.0
# 2   Scala  26000.0      NaN    2500.0
# 3  Python  24000.0   40days       NaN

7. Complete Example of pandas fillna


import pandas as pd
import numpy as np
df = pd.DataFrame(({
     'Courses':["Spark",'Java',"Scala",'Python'],
     'Fee' :[20000,np.nan,26000,24000],
     'Duration':['30days','40days',np.nan,'40days'],
     'Discount':[1000,np.nan,2500,None]
               }))
print(df)

# Fillna() on all columns
df2=df.fillna('None')
print(df2)

# Fillna() on once column
df2['Discount'] =  df['Discount'].fillna(0)
print(df2)

# Fillna() on multiple columns
df2[['Discount','Fee']] =  df[['Discount','Fee']].fillna(0)
print(df2)

# Fillna() on multiple columns
df2 =  df.fillna(value={'Discount':0,'Fee':10000})
print(df2)

# Fill with limit
df2=df.fillna(value={'Discount':0,'Fee':0},limit=1)
print(df2)

Conclusion

In this article, you have learned the DataFrame fillna() method to fill one column, or multiple columns containing NaN with a specified value. Also learned to replace different values for each column.

Happy Learning !!

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply