In pandas, the bfill()
method is used to fill missing values in a DataFrame or Series by performing a backward fill. This method replaces NaN
or missing values with the next non-missing value in the column (or row, if specified by the axis). If there is no valid value available to fill the missing data, the NaN
will remain unchanged.
In this article, I will explain the bfill()
method in pandas DataFrame, including its syntax, parameters, and usage to fill missing values using backward fill. This method returns a new DataFrame or Series with the missing values filled. If the inplace
parameter is set to True
, the method fills the missing values directly in the original DataFrame or Series and returns None
.
Key Points –
bfill()
fills missing values (NaN) in a DataFrame or Series by using the next non-null value found in the direction specified.- The
axis
parameter determines the direction of the fill;axis=0
(default) fills downwards along rows, whileaxis=1
fills horizontally along columns. - The
inplace
parameter controls whether the operation modifies the original DataFrame (True) or returns a new DataFrame with the filled values (False). - The
limit
parameter specifies the maximum number of consecutive NaN values to fill, preventing overfilling beyond a set number. bfill()
only propagates non-null values backward; it does not perform interpolation or fill based on a calculated average or trend.
Pandas DataFrame bfill() Introduction
Let’s know the syntax of the bfill() method.
# Syntax of DataFrame bfill() method
DataFrame.bfill(axis=None, inplace=False, limit=None, downcast=None)
Parameters of the DataFrame bfill()
Following are the parameters of the DataFrame bfill() method.
axis
– {0 or ‘index’, 1 or ‘columns’}, default 0. Specifies the axis along which to fill missing values.0
orindex
fills down along rows, and1
orcolumns
fills across columns.inplace
– bool, defaultFalse
. IfTrue
, performs the operation in place and modifies the original DataFrame. IfFalse
, returns a new DataFrame with the filled values.limit
– int, optional. The maximum number of consecutiveNaN
values to fill. If not specified, allNaN
values will be filled.downcast
– dict, defaultNone
. A dictionary containing rules to downcast the filled values to a specific data type.
Return Value
It returns an object of the same type as the caller with missing values filled.
Usage of Pandas DataFrame bfill() Method
The bfill()
method in pandas is used to fill missing values (NaN) in a DataFrame or Series by propagating the next valid observation backward.
Now, let’s create a Pandas DataFrame using data from a dictionary.
import pandas as pd
import numpy as np
# Creating a sample DataFrame
data = {
'A': [2, np.nan, 4, np.nan, 6],
'B': [np.nan, 3, np.nan, 5, np.nan],
'C': [1, 7, np.nan, np.nan, 8]
}
df = pd.DataFrame(data)
print("Original DataFrame:\n",df)
Yields below output.
To backward fill missing values (NaN
) in the DataFrame you created, you can use the bfill()
method. This method will fill each NaN
with the next non-null value found in that column.
# Performing backward fill on the DataFrame
df2 = df.bfill()
print("DataFrame after backward fill:\n", df2)
Yields below output.
Fill Missing Values Along with Columns
Alternatively, to fill missing values along columns using the backward fill method, you can use the bfill()
method with its default settings (axis=0
). This will fill each NaN
in a column with the next non-null value found down the column.
# Backward fill missing values along columns
df2 = df.bfill(axis=0)
print("DataFrame after backward fill along columns:\n", df2)
Here,
Column A
– TheNaN
at index 1 is filled with the value4.0
from index 2, and theNaN
at index 3 is filled with6.0
from index 4.Column B
– TheNaN
at index 0 is filled with the value3.0
from index 1, and theNaN
at index 2 is filled with5.0
from index 3.Column C
– TheNaN
at index 2 is filled with the value8.0
from index 4, and theNaN
at index 3 is also filled with8.0
.
Yields the same output as above.
Fill Missing Values Along with Rows
To fill missing values along rows using the backward fill method, you need to set the axis
parameter to 1 (or columns
) in the bfill()
method. This will fill each NaN
with the next non-null value found in the row.
# Backward fill missing values along rows
df2 = df.bfill(axis=1)
print("DataFrame after backward fill along rows:\n", df2)
# Output:
# DataFrame after backward fill along rows:
# A B C
# 0 2.0 1.0 1.0
# 1 3.0 3.0 7.0
# 2 4.0 NaN NaN
# 3 5.0 5.0 NaN
# 4 6.0 8.0 8.0
Here,
Row 0
– The NaN in columnB
is filled with the next valid value 1.0 from columnC
.Row 1
– The missing value in columnA
is filled with the value from columnB
(3.0).Row 2
– The missing value in columnB
remains NaN since there’s no non-null value to its right.Row 3
– The missing value in columnA
is filled with the value from columnB
(5.0), but the missing value in columnC
remains NaN as there is no value to the right to fill.Row 4
– The missing value in columnB
is filled with the value from columnC
(8.0).
Backward Fill with Limit
Similarly, to fill missing values using backward fill with a limit in pandas, you can use the bfill()
method with the limit
parameter. The limit
parameter specifies the maximum number of consecutive NaN
values to fill in each column or row.
# Backward fill missing values along columns with a limit
df2 = df.bfill(limit=1)
print("DataFrame after backward fill with limit=1:\n", df2)
# Output:
# DataFrame after backward fill with limit=1:
# A B C
# 0 2.0 3.0 1.0
# 1 4.0 3.0 7.0
# 2 4.0 5.0 NaN
# 3 6.0 5.0 8.0
# 4 6.0 NaN 8.0
Here,
Column A
– The missing value at index 1 is filled with the next valid value (4.0) because the limit is 1. The missing value at index 3 is filled with the next valid value (6.0).Column B
– The missing value at index 0 is filled with the next valid value (3.0) because the limit is 1. The missing value at index 2 is filled with the next valid value (5.0).Column C
– The missing value at index 2 remainsNaN
because the limit of 1 prevents the second backward fill. The missing value at index 3 is filled with the next valid value (8.0).
Inplace Backward Fill
To perform an in-place backward fill on a pandas DataFrame, you can use the bfill()
method with the inplace=True
parameter. This will modify the original DataFrame directly and fill missing values with the next valid observation without returning a new DataFrame.
# Backward fill missing values in place
df.bfill(inplace=True)
print("DataFrame after in-place backward fill:\n", df)
# Output:
# DataFrame after in-place backward fill:
# A B C
# 0 2.0 3.0 1.0
# 1 4.0 3.0 7.0
# 2 4.0 5.0 8.0
# 3 6.0 5.0 8.0
# 4 6.0 NaN 8.0
Backward Fill on a Series
Finally, you can use the bfill()
method to perform backward fill on a pandas Series similarly to how you would on a DataFrame. This method will fill the missing values in the Series by propagating the next valid observation backward.
import pandas as pd
import numpy as np
# Creating a sample Series with missing values
s = pd.Series([2, np.nan, 4, np.nan, 6])
print("Original Series:\n", s)
# Performing backward fill on the Series
filled_s = s.bfill()
print("Series after backward fill:\n", filled_s)
# Output:
# Original Series:
# 0 2.0
# 1 NaN
# 2 4.0
# 3 NaN
# 4 6.0
# dtype: float64
# Series after backward fill:
# 0 2.0
# 1 4.0
# 2 4.0
# 3 6.0
# 4 6.0
# dtype: float64
FAQ on Pandas DataFrame bfill() Method
The bfill()
method in pandas is used to fill missing values in a DataFrame or Series using backward fill. It replaces NaN
or missing values with the next valid observation down the column (or row, depending on the axis specified).
The bfill()
method can be used on a pandas Series in the same way as on a DataFrame. It will fill missing values in the Series by propagating the next valid value backward.
When using bfill()
with inplace=True
, the method modifies the original DataFrame or Series directly and does not return a new object.
The limit
parameter specifies the maximum number of consecutive NaN
values to fill. For example, limit=2
will fill up to 2 consecutive NaN
values but will leave any additional NaN
values unchanged.
bfill()
can be used in conjunction with other filling methods like ffill()
(forward fill). You can chain methods or use them separately depending on your data filling requirements.
Conclusion
In conclusion, the bfill (backward fill) method in pandas is a powerful and convenient tool for handling missing data in DataFrames and Series. It works by propagating the next valid observation backward to fill NaN values, ensuring that data gaps are properly managed.
Happy Learning!!
Related Articles
- Pandas DataFrame eval() Function
- Pandas DataFrame mode() Method
- Pandas DataFrame mad() Method
- Pandas DataFrame copy() Function
- Pandas DataFrame cov() Method
- Pandas DataFrame ffill() Method
- Pandas DataFrame max() Function
- Pandas DataFrame any() Method
- Pandas DataFrame round() Method
- Pandas DataFrame min() Method
- Pandas DataFrame div() Function
- Pandas DataFrame corrwith() Method