The fill()
function from the tidyr package in R is a valuable tool for efficiently handling missing values in datasets. It fills missing values by allowing non-missing values in either forward or backward directions. This function is especially helpful when dealing with time-series data, survey responses, or datasets with structured missingness.
In this article, I will explore the syntax, parameters, and use cases of the fill()
function, demonstrating its utility in data transformation and cleaning.
R fill() Function
The fill()
function is used to handle missing values (NA
) in selected columns of a data frame by carrying forward or backward the most recent non-missing values. By default, it fills missing entries using the preceding values. Additionally, you can choose to fill in missing values in both directions simultaneously by specifying the “downup
” or “updown
” options.
Syntax of fill() Function
Following is the syntax of the fill() function.
# Syntax of fill()
fill(data, ..., .direction = c("down", "up", "downup", "updown"))
Parameters
data:
The input data frame.<strong>...</strong>:
The columns to apply the filling operation. These can be specified as unquoted column names or a character vector..direction:
The direction in which to fill the missing values. Options include:-
"down"
: Fills downwards (default). "up"
: Fills upwards."downup"
: Fills downwards first, then upwards."updown"
: Fills upwards first, then downwards.
-
Return Value
This function returns a modified data frame where missing values are filled based on specified direction.
Fill Missing Values in R
You can use the fill()
function to handle missing values (NA) in a specified column of a data frame. By default, it replaces missing values with the preceding non-missing values. Let’s create a data frame with columns containing missing values and apply this function to a specific column to eliminate the missing values.
# Fill missing values using fill()
# Load the tidyr library
library(tidyr)
df <- data.frame(
Student = c("Geetha", "Ram", "Sai"),
History = c(89, 81, 78),
Math = c(75, NA, 85),
Science = c(85, NA, 90),
Total = c(NA, 261, 253),
Percentage = c("83%", "87%", "84%")
)
print("Original Data frame:")
print(df)
filled_df <- fill(df, Math)
print("Data After Filling Downward:")
print(filled_df)
Yields below output.
Fill Missing Values Upward Direction
To fill the missing values in a specified column of a data frame with the next non-missing values, set the .direction
parameter to "up"
. This will replace the missing values in the column with the value from the next non-missing entry.
# Fill missing values upward
filled_df <- fill(df, Math, .direction = "up" )
print("Data After Filling Upward:")
print(filled_df)
Yields below output.
Fill Both Directions using downup or updown in R
The fill()
function also supports bidirectional filling using the "downup"
or "updown"
options. This fills missing values in both directions.
Let’s use the "downup"
direction to fill the missing values, starting with the downward direction followed by the upward direction.
# Fill missing values both down and upward
# Load the tidyr library
library(tidyr)
df <- data.frame(
Student = c("Geetha", "Ram", "Sai", "Jhon"),
History = c(89, 81, 78, 75),
Math = c(75, NA, NA, 80),
Science = c(85, NA, 90, 77)
)
print("Original Data frame:")
print(df)
filled_df <- fill(df, Math, .direction = "downup" )
print("Data After Filling downup:")
print(filled_df)
Yields below output.
# Output:
[1] "Data After Filling Downup:"
Student History Math Science
1 Geetha 89 75 85
2 Ram 81 75 NA
3 Sai 78 75 90
4 Jhon 75 80 77
Fill Missing Values updown Direction
Let’s use the "updown"
direction to fill the missing values, starting with the upward direction followed by the downward direction.
# Fill missing values bith up and downrd
# Load the tidyr library
library(tidyr)
filled_df <- fill(df, Math, .direction = "updown" )
print("Data After Filling updown:")
print(filled_df)
<!-- /wp:html -->
<!-- wp:paragraph -->
<p>Yields below output.</p>
<!-- /wp:paragraph -->
<!-- wp:html -->
<pre><code class="language-bash">
# Output:
[1] "Data After Filling Updown:"
Student History Math Science
1 Geetha 89 75 85
2 Ram 81 80 NA
3 Sai 78 80 90
4 Jhon 75 80 77
Handling Multiple Columns
You can use the fill()
function to handle missing values in multiple columns of a data frame in R. To do this, specify the column names as additional arguments to the function. It will fill the missing values in the specified columns with non-missing values based on the chosen direction.
# Handling Multiple Columns
# Load the tidyr library
library(tidyr)
df <- data.frame(
Student = c("Geetha", "Ram", "Sai"),
History = c(89, 81, 78),
Math = c(75, NA, 85),
Science = c(85, NA, 90),
Total = c(NA, 261, 253),
Percentage = c("83%", "87%", "84%")
)
print("Original Data frame:")
print(df)
filled_df <- fill(df, Math, Science)
print("Data After Filling Downward:")
print(filled_df)
Yields below output.
# Output:
[1] "Data After Filling Downward:"
> print(filled_df)
Student History Math Science Total Percentage
1 Geetha 89 75 85 NA 83%
2 Ram 81 75 85 261 87%
3 Sai 78 85 90 253 84%
Frequently Asked Questions of fill()
The fill()
function fills missing values in a data frame by carrying non-missing values forward or backward.
The fill()
function is part of the tidyr
package. Load it with library(tidyr)
.
The .direction
parameter determines the direction of filling: "down"
, "up"
, "downup"
, or "updown"
.
You can specify the columns to fill using their names in the ...
argument.
<strong>fill()</strong>
handle missing values at the start or end of a column? Values at the start or end remain missing unless filled bidirectionally with "downup"
or "updown"
.
Conclusion
In this article, I have explained the fill()
function in R’s tidyr
package is an indispensable tool for filling missing values in data frames. By customizing the direction and specifying columns, it provides a flexible and efficient solution for handling structured missing data. Whether working with time-series datasets or survey responses, fill()
simplifies preprocessing and ensures data continuity.
Happy Learning!