The fill() function from the tidyr package in R is a valuable tool for efficiently handling missing values in datasets. It fills missing values by allowing non-missing values in either forward or backward directions. This function is especially helpful when dealing with time-series data, survey responses, or datasets with structured missingness.

Advertisements

In this article, I will explore the syntax, parameters, and use cases of the fill() function, demonstrating its utility in data transformation and cleaning.

R fill() Function

The fill() function is used to handle missing values (NA) in selected columns of a data frame by carrying forward or backward the most recent non-missing values. By default, it fills missing entries using the preceding values. Additionally, you can choose to fill in missing values in both directions simultaneously by specifying the “downup” or “updown” options.

Syntax of fill() Function

Following is the syntax of the fill() function.


# Syntax of fill()
fill(data, ..., .direction = c("down", "up", "downup", "updown"))

Parameters

  • data: The input data frame.
  • <strong>...</strong>: The columns to apply the filling operation. These can be specified as unquoted column names or a character vector.
  • .direction: The direction in which to fill the missing values. Options include:
    • "down": Fills downwards (default).
    • "up": Fills upwards.
    • "downup": Fills downwards first, then upwards.
    • "updown": Fills upwards first, then downwards.

Return Value

This function returns a modified data frame where missing values are filled based on specified direction.

Fill Missing Values in R

You can use the fill() function to handle missing values (NA) in a specified column of a data frame. By default, it replaces missing values with the preceding non-missing values. Let’s create a data frame with columns containing missing values and apply this function to a specific column to eliminate the missing values.


# Fill missing values using fill()
# Load the tidyr library
library(tidyr)

df <- data.frame(
  Student = c("Geetha", "Ram", "Sai"),
  History = c(89, 81, 78),
  Math = c(75, NA, 85),
  Science = c(85, NA, 90),
  Total = c(NA, 261, 253),
  Percentage = c("83%", "87%", "84%")
)
print("Original Data frame:")
print(df)

filled_df <- fill(df, Math)
print("Data After Filling Downward:")
print(filled_df)

Yields below output.

fill() in r

Fill Missing Values Upward Direction

To fill the missing values in a specified column of a data frame with the next non-missing values, set the .direction parameter to "up". This will replace the missing values in the column with the value from the next non-missing entry.


# Fill missing values upward
filled_df <- fill(df, Math, .direction = "up" )
print("Data After Filling Upward:")
print(filled_df)

Yields below output.

fill() in r

Fill Both Directions using downup or updown in R

The fill() function also supports bidirectional filling using the "downup" or "updown" options. This fills missing values in both directions.

Let’s use the "downup" direction to fill the missing values, starting with the downward direction followed by the upward direction.


# Fill missing values both down and upward
# Load the tidyr library
library(tidyr)

df <- data.frame(
  Student = c("Geetha", "Ram", "Sai", "Jhon"),
  History = c(89, 81, 78, 75),
  Math = c(75, NA, NA, 80),
  Science = c(85, NA, 90, 77)
)
  
print("Original Data frame:")
print(df)

filled_df <- fill(df, Math, .direction = "downup" )
print("Data After Filling downup:")
print(filled_df)

Yields below output.


# Output:
[1] "Data After Filling Downup:"
  Student History Math Science
1  Geetha      89   75      85
2     Ram      81   75      NA
3     Sai      78   75      90
4    Jhon      75   80      77

Fill Missing Values updown Direction

Let’s use the "updown" direction to fill the missing values, starting with the upward direction followed by the downward direction.


# Fill missing values bith up and downrd
# Load the tidyr library
library(tidyr)

filled_df <- fill(df, Math, .direction = "updown" )
print("Data After Filling updown:")
print(filled_df)
<!-- /wp:html -->

<!-- wp:paragraph -->
<p>Yields below output.</p>
<!-- /wp:paragraph -->

<!-- wp:html -->
<pre><code class="language-bash">
# Output:
[1] "Data After Filling Updown:"
  Student History Math Science
1  Geetha      89   75      85
2     Ram      81   80      NA
3     Sai      78   80      90
4    Jhon      75   80      77

Handling Multiple Columns

You can use the fill() function to handle missing values in multiple columns of a data frame in R. To do this, specify the column names as additional arguments to the function. It will fill the missing values in the specified columns with non-missing values based on the chosen direction.


# Handling Multiple Columns
# Load the tidyr library
library(tidyr)
df <- data.frame(
  Student = c("Geetha", "Ram", "Sai"),
  History = c(89, 81, 78),
  Math = c(75, NA, 85),
  Science = c(85, NA, 90),
  Total = c(NA, 261, 253),
  Percentage = c("83%", "87%", "84%")
)
print("Original Data frame:")
print(df)

filled_df <- fill(df, Math, Science)
print("Data After Filling Downward:")
print(filled_df)

Yields below output.


# Output:
[1] "Data After Filling Downward:"

> print(filled_df)
  Student History Math Science Total Percentage
1  Geetha      89   75      85    NA        83%
2     Ram      81   75      85   261        87%
3     Sai      78   85      90   253        84%

Frequently Asked Questions of fill()

What is the purpose of the fill() function in R?

The fill() function fills missing values in a data frame by carrying non-missing values forward or backward.

What package is required to use the fill() function?

The fill() function is part of the tidyr package. Load it with library(tidyr).

How does the .direction parameter work?

The .direction parameter determines the direction of filling: "down", "up", "downup", or "updown".

How can I apply fill() to specific columns only?

You can specify the columns to fill using their names in the ... argument.

How does <strong>fill()</strong> handle missing values at the start or end of a column?

Values at the start or end remain missing unless filled bidirectionally with "downup" or "updown".

Conclusion

In this article, I have explained the fill() function in R’s tidyr package is an indispensable tool for filling missing values in data frames. By customizing the direction and specifying columns, it provides a flexible and efficient solution for handling structured missing data. Whether working with time-series datasets or survey responses, fill() simplifies preprocessing and ensures data continuity.

Happy Learning!

References