In R, the gather()
function from the tidyr
package is a powerful tool for reshaping data from a wide format to a long format. Tidy data is organized such that each variable is represented by a column, each observation by a row, and each type of observational unit forms a table. The tidyr
package in R provides functions to create and manipulate tidy data, with gather()
being one of the key functions for restructuring data. However, in more recent versions of tidyr
, the gather()
function has been replaced by the more concise and flexible pivot_longer()
.
In this article, I will explore how the gather()
function works, its functionality, and practical applications for transforming data from wide to long format. Using an example dataset, we will look at how to effectively use gather()
to tidy your data and discuss the advantages of transitioning to pivot_longer()
in newer R versions.
Installing tidyr Package
To use the gather()
function, first install and load the tidyr
package.
# Install and load tidyr package
install.packages("tidyr")
library(tidyr)
R gather() Function
The gather() function from the tidyr package is used to convert wide datasets into long datasets by gathering multiple columns in the wide format and collapses them into key-value pairs, resulting in a dataset with fewer columns and more rows.
Syntax of gather() Function
Following is the syntax of gather() Function
# Syntax of gather() Function
gather(data, key, value, ..., na.rm = FALSE, convert = FALSE)
Parameters
data:
The input data frame.key:
The name of the new column that contains the gathered column names.value:
The name of the new column that will contain the gathered values....:
Columns to gather. You can specify multiple columns separated by commas.na.rm:
Boolean value indicating whether to remove NA values. Default is FALSE.convert:
Boolean value indicating whether to automatically convert character columns to factors. Default is FALSE.
Return Value
The function returns a data frame reshaped from wide to long format.
R Gather Single Key-value Pair
You can use the gather()
function from the tidyr
package to transform a data frame from wide format to long format. This function collapses multiple columns and their values into new columns, rearranging the data into a long format.
# Reshape the data to a long format using gather
library(tidyr)
# Original data frame
df <- data.frame(
Student = c("Geetha", "Ram", "Sai"),
History = c(89, 81, 78),
Math = c(75, 88, 85),
Science = c(85, 92, 90)
)
print("Original Data:")
print(df)
# Reshape the data frame from wide to long
long_df <- gather(df, key = "Subject", value = "Score", History, Math, Science)
print("Transformed Data:")
print(long_df)
Yield below output.
R Gather Multiple Key-value Pairs
You can also use the gather()
function from the tidyr
package along with separate()
to reshape wide columns into a long format based on multiple key-value pairs. In this process, multiple columns are gathered into two new columns: one containing the column names and another containing the corresponding values, while keeping the grouping column intact.
Then you can use the separate()
function to split the column names column into two separate columns based on a specified delimiter.
# Gather Multiple key-value pairs
library(tidyr)
df <- data.frame(
Student = c("Geetha", "Ram", "Sai"),
History_Practical = c(75, 85, 80),
History_Written = c(85, 90, 87),
Math_Practical = c(88, 70, 90),
Math_Written = c(92, 91, 93)
)
print("Original Data:")
print(df)
# Convert to long format
long_df <- gather(df, key = "Subject_Type", value = "Score", -Student)
# Split Subject and Type
long_df <- separate(long_df, Subject_Type, into = c("Subject", "Type"), sep = "_")
print("Transformed Data:")
print(long_df)
Yields below output.
Using R pivot_longer() to Gather Long Format
The pivot_longer()
function offers a more concise and modern approach to handling multiple key-value pairs compared to the gather()
function.
In some cases, where your dataset needs reshaping multiple key-value pairs into separate columns, the gather()
function does not provide direct support. To achieve this, you can use gather()
in combination with other functions, such as separate()
.
# Reshape the data based on Multiple key-value pairs
library(tidyr)
df <- data.frame(
Student = c("Geetha", "Ram", "Sai"),
History_Practical = c(75, 85, 80),
History_Written = c(85, 90, 87),
Math_Practical = c(88, 70, 90),
Math_Written = c(92, 91, 93)
)
print("Original Data:")
print(df)
# Using pivot_longer
long_df <- pivot_longer(df, cols = -Student,
names_to = c("Subject", "Type"),
names_sep = "_",
values_to = "Score")
print("Transformed Data:")
print(long_df)
Yields below output.
# Output:
[1] "Transformed Data:"
# A tibble: 12 × 4
Student Subject Type Score
<chr> <chr> <chr> <dbl>
1 Geetha History Practical 75
2 Geetha History Written 85
3 Geetha Math Practical 88
4 Geetha Math Written 92
5 Ram History Practical 85
6 Ram History Written 90
7 Ram Math Practical 70
8 Ram Math Written 91
9 Sai History Practical 80
10 Sai History Written 87
11 Sai Math Practical 90
12 Sai Math Written 93
Frequently Asked Questions of R gather() Function
The gather()
function from the tidyr
package is used to convert data from a wide format to a long format. It takes multiple columns in the wide format and collapses them into key-value pairs, creating a new “key” column for the column names and a “value” column for the corresponding values.
The gather()
function has been replaced by the more flexible and modern pivot_longer()
function in newer versions of tidyr
. While both functions serve similar purposes (transforming data from wide to long format), pivot_longer()
is more intuitive and handles multiple key-value pairs more efficiently. It is recommended to use pivot_longer()
for new projects.
You can specify which columns to gather by either listing the column names explicitly or using the -
operator to exclude certain columns.
Missing values (NA) are retained during the gathering process. If you wish to remove rows with missing values after gathering, you can use the na.omit()
function or filter out NAs.
Once you have gathered the data using gather()
, you can use the separate()
function to split a combined column into two or more columns.
Conclusion
In this article, I have explained the gather()
function from the tidyr package is a versatile tool for reshaping data into long format, making it easier to analyze and visualize. Although it has been replaced by pivot_longer()
, understanding its functionality is valuable, especially when working with older codebases. By mastering functions like gather()
and pivot_longer()
, you can unlock the full potential of tidy data workflows in R.