Site icon Spark By {Examples}

pandas.DataFrame.fillna() – Explained by Examples

pandas dataframe fillna

pandas.DataFrame.fillna() method is used to fill column (one or multiple columns) containing NA/NaN/None with 0, empty, blank, or any specified values etc. NaN is considered a missing value. When you dealing with machine learning, handling missing values is very important, not handling these will result in a side effect with an incorrect result.

CSV files are received from third-party sources, most of the time it has null values for blanks/empty. By using pandas.read_csv() we can load the CSV file into DataFrame and pandas converts all null values into NaN in DataFrame.

Either you can drop rows with NaN values using pandas.DataFrame.dropna() or handle NaN by filling with specific values using the fillna() method.

pandas fillna Key Points

1. Quick Examples of pandas fillna()

Below are bow quick examples and usage of the pandas fillna() method.


# Below are quick examples.

# Example 1: Fillna() on all columns
df2=df.fillna('None')

# Example 2: Fillna() on once column
df2['Discount'] =  df['Discount'].fillna(0)

# Example 3: Fillna() on multiple columns
df2[['Discount','Fee']] =  df[['Discount','Fee']].fillna(0)

# Example 4: Fillna() on multiple columns with different values
df2 =  df.fillna(value={'Discount':0,'Fee':10000})

# Example 5: Fill with limit
df2=df.fillna(value={'Discount':0,'Fee':0},limit=1)

2. pandas.DataFrame.fillna() Syntax

Below is the syntax of pandas.DataFrame.fillna() method. This takes parameters value, method, axis, inplace, limit, and downcast and returns a new DataFrame. When inplace=True is used, it returns None as the replace happens on the existing DataFrame object.


# Syntax of pandas.DataFrame.fillna()
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)

Let’s create a DataFrame


# Create DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame(({
     'Courses':["Spark",'Java',"Scala",'Python'],
     'Fee' :[20000,np.nan,26000,24000],
     'Duration':['30days','40days', pd.NA,'40days'],
     'Discount':[1000,np.nan,2500,None]
               }))
print("Create DataFrame:\n", df)

Yields below output.

pandas dataframe fillna

3. Pandas fillna NaN with None Value

fillna() method is used to fill NaN/NA values on a specified column or on an entire DataaFrame with any given value. You can specify modifications using inplace, or limit how many fillings to perform, or choose an axis whether to fill on rows/columns etc. The Below example fills all NaN values with the None value.


# Fillna to replace all NaN
df2 = df.fillna('None')
print("After replacing all NAN/NA values with None:\n", df2)

Yields below output.

pandas dataframe fillna

In order to update the existing DataFrame use df.fillna('None', inplace=True). You can also use pandas.DataFrame.replace() method to replace NaN with 0 value. similarly, you can also replace NaN with blank or empty string.

4. Pandas fillna on One Column

The above example filled all NaN values on the entire DataFrame. Sometimes you would need to replace just on one column, you can do so by selecting the DataFrame column for the fillna() method.


# Fillna on one column
df2['Discount'] =  df['Discount'].fillna('0')
print(df2)

# Outputs:
#  Courses      Fee Duration Discount
# 0   Spark  20000.0   30days   1000.0
# 1    Java     None   40days        0
# 2   Scala  26000.0     None   2500.0
# 3  Python  24000.0   40days        0

5. fillna on Multiple Columns

Use the pandas fillna() method to fill a specified value on multiple DataFrame columns, the below example updates columns Discount and Fee with 0 for NaN values.


# Fillna() on multiple columns
df2[['Discount','Fee']] =  df[['Discount','Fee']].fillna('0')
print(df2)

# Outputs:
  Courses      Fee Duration Discount
0   Spark  20000.0   30days   1000.0
1    Java        0   40days        0
2   Scala  26000.0     None   2500.0
3  Python  24000.0   40days        0

Now, let’s see how to fill different values for each column. The below example updates a column Discount with 0 and column Fee with 10000 for NaN values.


# Fillna() on multiple columns
df2 =  df.fillna(value={'Discount':'0','Fee':10000})
print(df2)

# Outputs:
#  Courses      Fee Duration Discount
# 0   Spark  20000.0   30days   1000.0
# 1    Java  10000.0   40days        0
# 2   Scala  26000.0      NaN   2500.0
# 3  Python  24000.0   40days        0

6. Fill with limit param

To control how to NaN values to fill use the limit param. Compare the below result with the above one to see the differences.


# Fill with limit
df2=df.fillna(value={'Discount':0,'Fee':0},limit=1)
print(df2)

# Outputs:
#  Courses      Fee Duration  Discount
# 0   Spark  20000.0   30days    1000.0
# 1    Java      0.0   40days       0.0
# 2   Scala  26000.0      NaN    2500.0
# 3  Python  24000.0   40days       NaN

7. Complete Example of pandas fillna


import pandas as pd
import numpy as np
df = pd.DataFrame(({
     'Courses':["Spark",'Java',"Scala",'Python'],
     'Fee' :[20000,np.nan,26000,24000],
     'Duration':['30days','40days',np.nan,'40days'],
     'Discount':[1000,np.nan,2500,None]
               }))
print(df)

# Fillna() on all columns
df2=df.fillna('None')
print(df2)

# Fillna() on once column
df2['Discount'] =  df['Discount'].fillna(0)
print(df2)

# Fillna() on multiple columns
df2[['Discount','Fee']] =  df[['Discount','Fee']].fillna(0)
print(df2)

# Fillna() on multiple columns
df2 =  df.fillna(value={'Discount':0,'Fee':10000})
print(df2)

# Fill with limit
df2=df.fillna(value={'Discount':0,'Fee':0},limit=1)
print(df2)

Conclusion

In this article, you have learned the DataFrame fillna() method to fill one column, or multiple columns containing NaN with a specified value. Also learned to replace different values for each column.

Happy Learning !!

References

Exit mobile version