You are currently viewing Explain R pivot_wider() Function with Examples

The pivot_wider() function of the tidyr package reshapes data frames from long to wide format by transforming rows into columns. As a modern and more versatile replacement for the now-deprecated spread() function, pivot_wider() is actively maintained and supports advanced use cases. While spread() remains available for backward compatibility, it is recommended to use pivot_wider() for new code.

Advertisements

In this article, I will explore the pivot_wider() function, its syntax, parameters, and practical use cases. Through detailed examples, I will demonstrate how to transform long-format data frames into wide-format structures for better usability and presentation.

Key Points-

  • Reshape long-format data into a wide format with pivot_wider().
  • Customize column names dynamically using names_from and their values with values_from.
  • Handle missing values with the values_fill parameter.
  • Resolve duplicate column names using names_repair.
  • Advanced control using options like names_sep, names_glue, and more.
  • Returns a tidy tibble in a wide format.
  • Replaces spread() with enhanced functionality and active development.

R pivot_wider() Function

The pivot_wider() function of the tidyr package transforms data from a long format to a wide format by reducing the number of rows in the original data frame. It converts specified column values into column names in the wide format, while another set of specified values becomes the corresponding column values.

Syntax of pivot_wider Function

The general syntax of the pivot_wider() function.


# Syntax of the pivot_wider() function
pivot_wider(data, 
            id_cols = NULL, 
            names_from = NULL, 
            values_from = NULL, 
            names_prefix = "", 
            names_sep = "_", 
            values_fill = NULL, 
            values_fn = NULL)

Parameters

  • data: The input data frame or tibble.
  • id_cols: Columns to use as unique identifiers for the wide format. If NULL, defaults to all columns not specified in names_from or values_from.
  • names_from: Column(s) to use as the key for creating new column names.
  • values_from: Column(s) to use as the values for the newly created columns.
  • values_fill: A value to fill in for missing entries (default is NA).
  • names_repair: Method for handling duplicate column names.
  • ...: Additional arguments for customizing the output.

Return value

It returns a data frame or tibble in a wide format, where rows are rearranged into columns based on the specified parameters.

Pivot Wide Format with R pivot_wider()

To convert data from a long format to a wide format, you can use the pivot_wider() function. Specify the columns to use as new column names with the names_from argument and the columns that provide the data values with the values_from argument. It pivots the given data frame from a long to a wide format.


# Transform data from long format to wide format
# Load tidyr package
library(tidyr)

# Create data frame
df <- data.frame(
  Student = c("Geetha", "Geetha", "Ram", "Ram"),
  Subject = c("Math", "Science", "Math", "Science"),
  Score = c(85, 80, 90, 85)
)

print("Original Data frame:")
print(df)

# Transform data from long to wide format
wide_df <- pivot_wider(
  data = df,
  names_from = Subject,
  values_from = Score
)

print("Wide-Format Data:")
print(wide_df)

Yields below output.

pivot_wider function in r

pivot_wider with id_cols

By default, the pivot_wider() function automatically identifies columns in the input data to use as identifiers for the wide data frame. You can override this behavior and specify which columns to use as identifiers by using the id_cols argument.”


# pivot_wider with id_cols
# Load tidyr package
library(tidyr)

df <- data.frame(
  Class = c("A", "A", "B", "B"),
  Student = c("Geetha", "Ram", "Sai", "Priya"),
  Subject = c("Math", "Science", "Math", "Science"),
  Score = c(85, 80, 90, 85)
)

# Specify ID columns
wide_df <- pivot_wider(
  data = df,
  id_cols = c(Class, Student),
  names_from = Subject,
  values_from = Score
)

print("Wide-Format Data with Specified ID Columns:")
print(wide_df)

Yields below output.

pivot_wider function in r

Customize Column Names with names_prefix

You can customize the column names in the wide data frame using the names_prefix argument of the pivot_wider() function. It will add a specified prefix to the column names in the wide data frame.


# Customizing column names of wide data
wide_df <- pivot_wider(
  data = df,
  names_from = Subject,
  values_from = Score,
  names_prefix = "Score_"
)

print("Wide-Format Data With custom Column Names:")
print(wide_df)

# Output:
"Wide-Format Data With custom Column Names:"
# A tibble: 4 × 4
#   Class Student Score_Math Score_Science
#   <chr> <chr>        <dbl>         <dbl>
# 1 A     Geetha          85            NA
# 2 A     Ram             NA            80
# 3 B     Sai             90            NA
# 4 B     Priya           NA            85

Handling Missing Values in R pivot_wider()

To handle missing values (NA) in the wide data, you can use the values_fill parameter to replace them with a specified value.


# Handling Missing Values in pivot_wider()
# Load tidyr package
library(tidyr)
# Filling missing values
wide_df <- pivot_wider(
  data = df,
  names_from = Subject,
  values_from = Score,
  values_fill = 0
)
print("Wide-Format Data with specified Missing Values:")
print(wide_df)

# Output:
# [1] "Wide-Format Data with specified Missing Values:"
# A tibble: 4 × 4
#   Class Student  Math Science
#   <chr> <chr>   <dbl>   <dbl>
# 1 A     Geetha     85       0
# 2 A     Ram         0      80
# 3 B     Sai        90       0
# 4 B     Priya       0      85

Frequently Asked Questions of R pivot_wider()

What is the purpose of the pivot_wider() function?

It reshapes data from long format to wide format, turning rows into columns for better readability or specific analytical purposes.

How can I specify multiple columns for names_from or values_from?

Both names_from and values_from accept multiple column names for greater flexibility.

How does values_fill handle missing values?

values_fill replaces NA with a specified value for columns where data is missing.

What happens if there are duplicate column names?

The names_repair parameter can resolve duplicates by ensuring unique names.

Conclusion

In this article, I have explained the pivot_wider() function from the tidyr package, which is an essential tool for transforming long-format data into a wide format. Its flexibility and advanced options make it a powerful choice for data-wrangling tasks. By customizing column names, handling missing values, and providing data integrity, pivot_wider() simplifies the process of preparing data for analysis and visualization.

Happy Learning!

References