pandas.DataFrame.fillna() method is used to fill column (one or multiple columns) containing NA/NaN/None with 0, empty, blank, or any specified values etc. NaN is considered a missing value. When you dealing with machine learning, handling missing values is very important, not handling these will result in a side effect with an incorrect result.
CSV files are received from third-party sources, most of the time it has null values for blanks/empty. By using pandas.read_csv() we can load the CSV file into DataFrame and pandas converts all null values into NaN in DataFrame.
Either you can drop rows with NaN values using pandas.DataFrame.dropna() or handle NaN by filling with specific values using the fillna() method.
pandas fillna Key Points
- It is used to fill NaN values with specified values (0, blank, e.t.c).
- If you want to consider infinity (
inf
and-inf
) to be “NA” in computations, you can setpandas.options.mode.use_inf_as_na = True
. - Besides NaN, pandas None also considered as missing.
1. Quick Examples of pandas fillna()
Below are bow quick examples and usage of the pandas fillna() method.
# Below are quick examples.
# Example 1: Fillna() on all columns
df2=df.fillna('None')
# Example 2: Fillna() on once column
df2['Discount'] = df['Discount'].fillna(0)
# Example 3: Fillna() on multiple columns
df2[['Discount','Fee']] = df[['Discount','Fee']].fillna(0)
# Example 4: Fillna() on multiple columns with different values
df2 = df.fillna(value={'Discount':0,'Fee':10000})
# Example 5: Fill with limit
df2=df.fillna(value={'Discount':0,'Fee':0},limit=1)
2. pandas.DataFrame.fillna() Syntax
Below is the syntax of pandas.DataFrame.fillna() method. This takes parameters value, method, axis, inplace, limit, and downcast and returns a new DataFrame. When inplace=True is used, it returns None as the replace happens on the existing DataFrame object.
# Syntax of pandas.DataFrame.fillna()
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
value
– Takes either scalar, dict, Series, or DataFrame but not list.method
– Takes one of these values {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}. Default None.axis
– 0 or ‘index’, 1 or ‘columns’. Used to specify the axis to fill the values.inplace
– Default False. When used True, it updates the existing DataFrame object.limit
– Specify how many fills should happen. This is the maximum number of consecutive NaN values replaced with specified values.downcast
– It takes a dict of key-value pair that specifies data type to downcast. Like Float64 to int64, date to string e.t.c
Let’s create a DataFrame
# Create DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame(({
'Courses':["Spark",'Java',"Scala",'Python'],
'Fee' :[20000,np.nan,26000,24000],
'Duration':['30days','40days', pd.NA,'40days'],
'Discount':[1000,np.nan,2500,None]
}))
print("Create DataFrame:\n", df)
Yields below output.
3. Pandas fillna NaN with None Value
fillna()
method is used to fill NaN/NA values on a specified column or on an entire DataaFrame with any given value. You can specify modifications using inplace
, or limit
how many fillings to perform, or choose an axis whether to fill on rows/columns etc. The Below example fills all NaN values with the None value.
# Fillna to replace all NaN
df2 = df.fillna('None')
print("After replacing all NAN/NA values with None:\n", df2)
Yields below output.
In order to update the existing DataFrame use df.fillna('None', inplace=True)
. You can also use pandas.DataFrame.replace() method to replace NaN with 0 value. similarly, you can also replace NaN with blank or empty string.
4. Pandas fillna on One Column
The above example filled all NaN values on the entire DataFrame. Sometimes you would need to replace just on one column, you can do so by selecting the DataFrame column for the fillna() method.
# Fillna on one column
df2['Discount'] = df['Discount'].fillna('0')
print(df2)
# Outputs:
# Courses Fee Duration Discount
# 0 Spark 20000.0 30days 1000.0
# 1 Java None 40days 0
# 2 Scala 26000.0 None 2500.0
# 3 Python 24000.0 40days 0
5. fillna on Multiple Columns
Use the pandas fillna() method to fill a specified value on multiple DataFrame columns, the below example updates columns Discount
and Fee
with 0 for NaN values.
# Fillna() on multiple columns
df2[['Discount','Fee']] = df[['Discount','Fee']].fillna('0')
print(df2)
# Outputs:
Courses Fee Duration Discount
0 Spark 20000.0 30days 1000.0
1 Java 0 40days 0
2 Scala 26000.0 None 2500.0
3 Python 24000.0 40days 0
Now, let’s see how to fill different values for each column. The below example updates a column Discount
with 0 and column Fee
with 10000 for NaN values.
# Fillna() on multiple columns
df2 = df.fillna(value={'Discount':'0','Fee':10000})
print(df2)
# Outputs:
# Courses Fee Duration Discount
# 0 Spark 20000.0 30days 1000.0
# 1 Java 10000.0 40days 0
# 2 Scala 26000.0 NaN 2500.0
# 3 Python 24000.0 40days 0
6. Fill with limit param
To control how to NaN values to fill use the limit
param. Compare the below result with the above one to see the differences.
# Fill with limit
df2=df.fillna(value={'Discount':0,'Fee':0},limit=1)
print(df2)
# Outputs:
# Courses Fee Duration Discount
# 0 Spark 20000.0 30days 1000.0
# 1 Java 0.0 40days 0.0
# 2 Scala 26000.0 NaN 2500.0
# 3 Python 24000.0 40days NaN
7. Complete Example of pandas fillna
import pandas as pd
import numpy as np
df = pd.DataFrame(({
'Courses':["Spark",'Java',"Scala",'Python'],
'Fee' :[20000,np.nan,26000,24000],
'Duration':['30days','40days',np.nan,'40days'],
'Discount':[1000,np.nan,2500,None]
}))
print(df)
# Fillna() on all columns
df2=df.fillna('None')
print(df2)
# Fillna() on once column
df2['Discount'] = df['Discount'].fillna(0)
print(df2)
# Fillna() on multiple columns
df2[['Discount','Fee']] = df[['Discount','Fee']].fillna(0)
print(df2)
# Fillna() on multiple columns
df2 = df.fillna(value={'Discount':0,'Fee':10000})
print(df2)
# Fill with limit
df2=df.fillna(value={'Discount':0,'Fee':0},limit=1)
print(df2)
Conclusion
In this article, you have learned the DataFrame fillna() method to fill one column, or multiple columns containing NaN with a specified value. Also learned to replace different values for each column.
Happy Learning !!
Related Articles
- Pandas Convert Column to Int in DataFrame
- Pandas Series.fillna() function explained
- pandas DataFrame replace() – by Examples
- Pandas Convert Float to Integer in DataFrame
- Pandas – What is a DataFrame Explained With Examples
- Pandas Series.fillna() function explained
- Pandas – Check Any Value is NaN in DataFrame
- Pandas Drop Columns with NaN or None Values
- Pandas Drop Rows with NaN Values in DataFram
- Pandas Replace Values based on Condition
- Pandas Replace Column value in DataFrame
- Remove NaN From Pandas Series
- Pandas Replace Blank/Empty String with NaN values
- Pandas – Replace NaN Values with Zero in a Column
- Count NaN Values in Pandas DataFrame
- Pandas Set Index Name to DataFrame
- Convert Pandas Index to List
References
- https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.notna.html