In R, the gather() function from the tidyr package is a powerful tool for reshaping data from a wide format to a long format. Tidy data is organized such that each variable is represented by a column, each observation by a row, and each type of observational unit forms a table. The tidyr package in R provides functions to create and manipulate tidy data, with gather() being one of the key functions for restructuring data. However, in more recent versions of tidyr, the gather() function has been replaced by the more concise and flexible pivot_longer().

Advertisements

In this article, I will explore how the gather() function works, its functionality, and practical applications for transforming data from wide to long format. Using an example dataset, we will look at how to effectively use gather() to tidy your data and discuss the advantages of transitioning to pivot_longer() in newer R versions.

Installing tidyr Package

To use the gather() function, first install and load the tidyr package.


# Install and load tidyr package
install.packages("tidyr")
library(tidyr)

R gather() Function

The gather() function from the tidyr package is used to convert wide datasets into long datasets by gathering multiple columns in the wide format and collapses them into key-value pairs, resulting in a dataset with fewer columns and more rows.

Syntax of gather() Function

Following is the syntax of gather() Function


# Syntax of gather() Function
gather(data, key, value, ..., na.rm = FALSE, convert = FALSE)

Parameters

  • data: The input data frame.
  • key: The name of the new column that contains the gathered column names.
  • value: The name of the new column that will contain the gathered values.
  • ...: Columns to gather. You can specify multiple columns separated by commas.
  • na.rm: Boolean value indicating whether to remove NA values. Default is FALSE.
  • convert: Boolean value indicating whether to automatically convert character columns to factors. Default is FALSE.

Return Value

The function returns a data frame reshaped from wide to long format.

R Gather Single Key-value Pair

You can use the gather() function from the tidyr package to transform a data frame from wide format to long format. This function collapses multiple columns and their values into new columns, rearranging the data into a long format.


# Reshape the data to a long format using gather
library(tidyr)
# Original data frame

df <- data.frame(
  Student = c("Geetha", "Ram", "Sai"),
  History = c(89, 81, 78),
  Math = c(75, 88, 85),
  Science = c(85, 92, 90)
)

print("Original Data:")
print(df)

# Reshape the data frame from wide to long
long_df <- gather(df, key = "Subject", value = "Score", History, Math, Science)
print("Transformed Data:")
print(long_df)

Yield below output.

r gather

R Gather Multiple Key-value Pairs

You can also use the gather() function from the tidyr package along with separate() to reshape wide columns into a long format based on multiple key-value pairs. In this process, multiple columns are gathered into two new columns: one containing the column names and another containing the corresponding values, while keeping the grouping column intact.

Then you can use the separate() function to split the column names column into two separate columns based on a specified delimiter.


# Gather Multiple key-value pairs
library(tidyr)
df <- data.frame(
  Student = c("Geetha", "Ram", "Sai"),
  History_Practical = c(75, 85, 80),
  History_Written = c(85, 90, 87),
  Math_Practical = c(88, 70, 90),
  Math_Written = c(92, 91, 93)
)
print("Original Data:")
print(df)


# Convert to long format
long_df <- gather(df, key = "Subject_Type", value = "Score", -Student)

# Split Subject and Type
long_df <- separate(long_df, Subject_Type, into = c("Subject", "Type"), sep = "_")

print("Transformed Data:")
print(long_df)

Yields below output.

r gather

Using R pivot_longer() to Gather Long Format

The pivot_longer() function offers a more concise and modern approach to handling multiple key-value pairs compared to the gather() function.

In some cases, where your dataset needs reshaping multiple key-value pairs into separate columns, the gather() function does not provide direct support. To achieve this, you can use gather() in combination with other functions, such as separate().


# Reshape the data based on Multiple key-value pairs
library(tidyr)
df <- data.frame(
  Student = c("Geetha", "Ram", "Sai"),
  History_Practical = c(75, 85, 80),
  History_Written = c(85, 90, 87),
  Math_Practical = c(88, 70, 90),
  Math_Written = c(92, 91, 93)
)
print("Original Data:")
print(df)

# Using pivot_longer
long_df <- pivot_longer(df, cols = -Student, 
                        names_to = c("Subject", "Type"), 
                        names_sep = "_", 
                        values_to = "Score")

print("Transformed Data:")
print(long_df)

Yields below output.


# Output:
[1] "Transformed Data:"
# A tibble: 12 × 4
   Student Subject Type      Score
   <chr>   <chr>   <chr>     <dbl>
 1 Geetha  History Practical    75
 2 Geetha  History Written      85
 3 Geetha  Math    Practical    88
 4 Geetha  Math    Written      92
 5 Ram     History Practical    85
 6 Ram     History Written      90
 7 Ram     Math    Practical    70
 8 Ram     Math    Written      91
 9 Sai     History Practical    80
10 Sai     History Written      87
11 Sai     Math    Practical    90
12 Sai     Math    Written      93

Frequently Asked Questions of R gather() Function

What does the gather() function do?

The gather() function from the tidyr package is used to convert data from a wide format to a long format. It takes multiple columns in the wide format and collapses them into key-value pairs, creating a new “key” column for the column names and a “value” column for the corresponding values.

What is the difference between gather() and pivot_longer()?

The gather() function has been replaced by the more flexible and modern pivot_longer() function in newer versions of tidyr. While both functions serve similar purposes (transforming data from wide to long format), pivot_longer() is more intuitive and handles multiple key-value pairs more efficiently. It is recommended to use pivot_longer() for new projects.

How can I gather only specific columns?

You can specify which columns to gather by either listing the column names explicitly or using the - operator to exclude certain columns.

How do I handle missing values when using gather()?

Missing values (NA) are retained during the gathering process. If you wish to remove rows with missing values after gathering, you can use the na.omit() function or filter out NAs.

How do I split a gathered column into multiple columns?

Once you have gathered the data using gather(), you can use the separate() function to split a combined column into two or more columns.

Conclusion

In this article, I have explained the gather() function from the tidyr package is a versatile tool for reshaping data into long format, making it easier to analyze and visualize. Although it has been replaced by pivot_longer(), understanding its functionality is valuable, especially when working with older codebases. By mastering functions like gather() and pivot_longer(), you can unlock the full potential of tidy data workflows in R.

Related Article

References