pandas.DataFrame.fillna() – Explained by Examples

Spread the love

pandas.DataFrame.fillna() method is used to fill column (one or multiple columns) contains NA/NaN/None with 0, empty, blank or any specified values e.t.c. NaN is considered a missing value. When you dealing with machine learning, handling missing values is very important, not handling these will result in a side effect with an incorrect result.

CSV files receive from third-party sources, most of the time it has null values for blanks/empty. By using pandas.read_csv() we can load the CSV file into DataFrame and pandas converts all null values into NaN in DataFrame.

Either you can drop rows with NaN values using pandas.DataFrame.dropna() or handle NaN by filling with specific values using fillna() method.

pandas fillna Key Points

  • It is used to fill NaN values with specified values (0, blank, e.t.c).
  • If you want to consider  infinity (inf and -inf) to be “NA” in computations, you can set pandas.options.mode.use_inf_as_na = True.
  • Besides NaN, pandas None also considers as missing.

Related: pandas Drop Rows & Columns with NaN using dropna()

1. Quick Examples of pandas fillna()

Below are bow quick examples and usage of pandas fillna() method.


# fillna() on all columns
df2=df.fillna('None')

# fillna() on once column
df2['Discount'] =  df['Discount'].fillna(0)

# fillna() on multiple columns
df2[['Discount','Fee']] =  df[['Discount','Fee']].fillna(0)

# fillna() on multiple columns with different values
df2 =  df.fillna(value={'Discount':0,'Fee':10000})

# fill with limit
df2=df.fillna(value={'Discount':0,'Fee':0},limit=1)

2. pandas.DataFrame.fillna() Syntax

Below is the syntax of pandas.DataFrame.fillna() method. This takes parameters value, method, axis, inplace, limit, and downcast and returns a new DataFrame. When inplace=True is used, it returns None as the replace happens on the existing DataFrame object.


# Syntax of pandas.DataFrame.fillna()
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
  • valueTakes either scalar, dict, Series, or DataFrame but not list.
  • methodTakes one of these values {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}. Default None.
  • axis0 or ‘index’, 1 or ‘columns’. Used to specifiy axis to fill the values.
  • inplaceDefault False. When used True, it updates existing DataFrame object.
  • limitSpecify how many fills should happen. This is the maximum number of consecutive NaN values replaced with specified value.
  • downcastIt takes a dict of key-value pair that specifies data type to downcast . Like Float64 to int64, date to string e.t.c

Let’s create a DataFrame


# Create DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame(({
     'Courses':["Spark",'Java',"Scala",'Python'],
     'Fee' :[20000,np.nan,26000,24000],
     'Duration':['30days','40days','NA','40days'],
     'Discount':[1000,np.nan,2500,None]
               }))
print(df)
pandas fillna

3. pandas fillna NaN with None Value

fillna() method is used to fill NaN/NA values on a specified column or on an entire DataaFrame with any given value. You can specify modify using inplace, or limit how many filling to perform or choose an axis whether to fill on rows/column etc. The Below example fills all NaN values with None value.


# fillna to replace all NaN
df2=df.fillna('None')
print(df2)

# Outputs
#  Courses      Fee Duration Discount
#0   Spark  20000.0   30days   1000.0
#1    Java     None   40days     None
#2   Scala  26000.0     None   2500.0
#3  Python  24000.0   40days     None

In order to update the existing DataFrame use df.fillna('None', inplace=True). You can also use pandas.DataFrame.replace() method to replace NaN with 0 value. similarly, you can also replace NaN with blank or empty string.

4. pandas fillna on One Column

The above example filled all NaN values on the entire DataFrame. some times you would need to replace just on one column, you can do so by selecting the DataFrame column for fillna() method.


# fillna on one column
df2['Discount'] =  df['Discount'].fillna('0')
print(df2)

# Outputs
#  Courses      Fee Duration Discount
#0   Spark  20000.0   30days   1000.0
#1    Java     None   40days        0
#2   Scala  26000.0     None   2500.0
#3  Python  24000.0   40days        0

5. fillna on Multiple Columns

Use pandas fillna() method to fill a specified value on multiple DataFrame columns, the below example update columns Discount and Fee with 0 for NaN values.


# fillna() on multiple columns
df2[['Discount','Fee']] =  df[['Discount','Fee']].fillna('0')
print(df2)

#Outputs
  Courses      Fee Duration Discount
0   Spark  20000.0   30days   1000.0
1    Java        0   40days        0
2   Scala  26000.0     None   2500.0
3  Python  24000.0   40days        0

Now, let’s see how to fill different value for each column. The below example updates column Discount with 0 and column Fee with 10000 for NaN values.


# fillna() on multiple columns
df2 =  df.fillna(value={'Discount':'0','Fee':10000})
print(df2)

# Outputs
#  Courses      Fee Duration Discount
#0   Spark  20000.0   30days   1000.0
#1    Java  10000.0   40days        0
#2   Scala  26000.0      NaN   2500.0
#3  Python  24000.0   40days        0

6. Fill with limit param

To control how to NaN values to fill use the limit param. Compare the below result with the above one to see the differences.


# fill with limit
df2=df.fillna(value={'Discount':0,'Fee':0},limit=1)
print(df2)

# Outputs
#  Courses      Fee Duration  Discount
#0   Spark  20000.0   30days    1000.0
#1    Java      0.0   40days       0.0
#2   Scala  26000.0      NaN    2500.0
#3  Python  24000.0   40days       NaN

7. Complete Example of pandas fillna


import pandas as pd
import numpy as np
df = pd.DataFrame(({
     'Courses':["Spark",'Java',"Scala",'Python'],
     'Fee' :[20000,np.nan,26000,24000],
     'Duration':['30days','40days',np.nan,'40days'],
     'Discount':[1000,np.nan,2500,None]
               }))
print(df)

# fillna() on all columns
df2=df.fillna('None')
print(df2)

# fillna() on once column
df2['Discount'] =  df['Discount'].fillna(0)
print(df2)

# fillna() on multiple columns
df2[['Discount','Fee']] =  df[['Discount','Fee']].fillna(0)
print(df2)

# fillna() on multiple columns
df2 =  df.fillna(value={'Discount':0,'Fee':10000})
print(df2)

# fill with limit
df2=df.fillna(value={'Discount':0,'Fee':0},limit=1)
print(df2)

Conclusion

In this article, you have learned DataFrame fillna() method to fill one column, multiple columns containing NaN with a specified value. Also learned to replace different values for each column.

Happy Learning !!

References

Naveen (NNK)

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing pandas.DataFrame.fillna() – Explained by Examples