• Post author:
  • Post category:Pandas
  • Post last modified:September 2, 2024
  • Reading time:18 mins read
You are currently viewing Pandas DataFrame bfill() Method

In pandas, the bfill() method is used to fill missing values in a DataFrame or Series by performing a backward fill. This method replaces NaN or missing values with the next non-missing value in the column (or row, if specified by the axis). If there is no valid value available to fill the missing data, the NaN will remain unchanged.

Advertisements

In this article, I will explain the bfill() method in pandas DataFrame, including its syntax, parameters, and usage to fill missing values using backward fill. This method returns a new DataFrame or Series with the missing values filled. If the inplace parameter is set to True, the method fills the missing values directly in the original DataFrame or Series and returns None.

Key Points –

  • bfill() fills missing values (NaN) in a DataFrame or Series by using the next non-null value found in the direction specified.
  • The axis parameter determines the direction of the fill; axis=0 (default) fills downwards along rows, while axis=1 fills horizontally along columns.
  • The inplace parameter controls whether the operation modifies the original DataFrame (True) or returns a new DataFrame with the filled values (False).
  • The limit parameter specifies the maximum number of consecutive NaN values to fill, preventing overfilling beyond a set number.
  • bfill() only propagates non-null values backward; it does not perform interpolation or fill based on a calculated average or trend.

Pandas DataFrame bfill() Introduction

Let’s know the syntax of the bfill() method.


# Syntax of DataFrame bfill() method
DataFrame.bfill(axis=None, inplace=False, limit=None, downcast=None)

Parameters of the DataFrame bfill()

Following are the parameters of the DataFrame bfill() method.

  • axis – {0 or ‘index’, 1 or ‘columns’}, default 0. Specifies the axis along which to fill missing values. 0 or index fills down along rows, and 1 or columns fills across columns.
  • inplace – bool, default False. If True, performs the operation in place and modifies the original DataFrame. If False, returns a new DataFrame with the filled values.
  • limit – int, optional. The maximum number of consecutive NaN values to fill. If not specified, all NaN values will be filled.
  • downcast – dict, default None. A dictionary containing rules to downcast the filled values to a specific data type.

Return Value

It returns an object of the same type as the caller with missing values filled.

Usage of Pandas DataFrame bfill() Method

The bfill() method in pandas is used to fill missing values (NaN) in a DataFrame or Series by propagating the next valid observation backward.

Now, let’s create a Pandas DataFrame using data from a dictionary.


import pandas as pd
import numpy as np

# Creating a sample DataFrame
data = {
    'A': [2, np.nan, 4, np.nan, 6],
    'B': [np.nan, 3, np.nan, 5, np.nan],
    'C': [1, 7, np.nan, np.nan, 8]
}

df = pd.DataFrame(data)
print("Original DataFrame:\n",df)

Yields below output.

pandas bfill

To backward fill missing values (NaN) in the DataFrame you created, you can use the bfill() method. This method will fill each NaN with the next non-null value found in that column.


# Performing backward fill on the DataFrame
df2 = df.bfill()
print("DataFrame after backward fill:\n", df2)

Yields below output.

pandas bfill

Fill Missing Values Along with Columns

Alternatively, to fill missing values along columns using the backward fill method, you can use the bfill() method with its default settings (axis=0). This will fill each NaN in a column with the next non-null value found down the column.


# Backward fill missing values along columns
df2 = df.bfill(axis=0)
print("DataFrame after backward fill along columns:\n", df2)

Here,

  • Column A – The NaN at index 1 is filled with the value 4.0 from index 2, and the NaN at index 3 is filled with 6.0 from index 4.
  • Column B – The NaN at index 0 is filled with the value 3.0 from index 1, and the NaN at index 2 is filled with 5.0 from index 3.
  • Column C – The NaN at index 2 is filled with the value 8.0 from index 4, and the NaN at index 3 is also filled with 8.0.

Yields the same output as above.

Fill Missing Values Along with Rows

To fill missing values along rows using the backward fill method, you need to set the axis parameter to 1 (or columns) in the bfill() method. This will fill each NaN with the next non-null value found in the row.


# Backward fill missing values along rows
df2 = df.bfill(axis=1)
print("DataFrame after backward fill along rows:\n", df2)

# Output:
# DataFrame after backward fill along rows:
#      A    B    C
# 0  2.0  1.0  1.0
# 1  3.0  3.0  7.0
# 2  4.0  NaN  NaN
# 3  5.0  5.0  NaN
# 4  6.0  8.0  8.0

Here,

  • Row 0 – The NaN in column B is filled with the next valid value 1.0 from column C.
  • Row 1 – The missing value in column A is filled with the value from column B (3.0).
  • Row 2 – The missing value in column B remains NaN since there’s no non-null value to its right.
  • Row 3 – The missing value in column A is filled with the value from column B (5.0), but the missing value in column C remains NaN as there is no value to the right to fill.
  • Row 4 – The missing value in column B is filled with the value from column C (8.0).

Backward Fill with Limit

Similarly, to fill missing values using backward fill with a limit in pandas, you can use the bfill() method with the limit parameter. The limit parameter specifies the maximum number of consecutive NaN values to fill in each column or row.


# Backward fill missing values along columns with a limit
df2 = df.bfill(limit=1)
print("DataFrame after backward fill with limit=1:\n", df2)

# Output:
# DataFrame after backward fill with limit=1:
#      A    B    C
# 0  2.0  3.0  1.0
# 1  4.0  3.0  7.0
# 2  4.0  5.0  NaN
# 3  6.0  5.0  8.0
# 4  6.0  NaN  8.0

Here,

  • Column A – The missing value at index 1 is filled with the next valid value (4.0) because the limit is 1. The missing value at index 3 is filled with the next valid value (6.0).
  • Column B – The missing value at index 0 is filled with the next valid value (3.0) because the limit is 1. The missing value at index 2 is filled with the next valid value (5.0).
  • Column C – The missing value at index 2 remains NaN because the limit of 1 prevents the second backward fill. The missing value at index 3 is filled with the next valid value (8.0).

Inplace Backward Fill

To perform an in-place backward fill on a pandas DataFrame, you can use the bfill() method with the inplace=True parameter. This will modify the original DataFrame directly and fill missing values with the next valid observation without returning a new DataFrame.


# Backward fill missing values in place
df.bfill(inplace=True)
print("DataFrame after in-place backward fill:\n", df)

# Output:
# DataFrame after in-place backward fill:
#      A    B    C
# 0  2.0  3.0  1.0
# 1  4.0  3.0  7.0
# 2  4.0  5.0  8.0
# 3  6.0  5.0  8.0
# 4  6.0  NaN  8.0

Backward Fill on a Series

Finally, you can use the bfill() method to perform backward fill on a pandas Series similarly to how you would on a DataFrame. This method will fill the missing values in the Series by propagating the next valid observation backward.


import pandas as pd
import numpy as np

# Creating a sample Series with missing values
s = pd.Series([2, np.nan, 4, np.nan, 6])
print("Original Series:\n", s)

# Performing backward fill on the Series
filled_s = s.bfill()
print("Series after backward fill:\n", filled_s)

# Output:
# Original Series:
# 0    2.0
# 1    NaN
# 2    4.0
# 3    NaN
# 4    6.0
# dtype: float64

# Series after backward fill:
# 0    2.0
# 1    4.0
# 2    4.0
# 3    6.0
# 4    6.0
# dtype: float64

FAQ on Pandas DataFrame bfill() Method

What is the bfill() method used for in pandas?

The bfill() method in pandas is used to fill missing values in a DataFrame or Series using backward fill. It replaces NaN or missing values with the next valid observation down the column (or row, depending on the axis specified).

How can I use bfill() on a pandas Series?

The bfill() method can be used on a pandas Series in the same way as on a DataFrame. It will fill missing values in the Series by propagating the next valid value backward.

How do I use bfill() with inplace=True?

When using bfill() with inplace=True, the method modifies the original DataFrame or Series directly and does not return a new object.

What happens if I use bfill() with a limit parameter?

The limit parameter specifies the maximum number of consecutive NaN values to fill. For example, limit=2 will fill up to 2 consecutive NaN values but will leave any additional NaN values unchanged.

How can bfill() be used with other filling methods?

bfill() can be used in conjunction with other filling methods like ffill() (forward fill). You can chain methods or use them separately depending on your data filling requirements.

Conclusion

In conclusion, the bfill (backward fill) method in pandas is a powerful and convenient tool for handling missing data in DataFrames and Series. It works by propagating the next valid observation backward to fill NaN values, ensuring that data gaps are properly managed.

Happy Learning!!

Reference