• Post author:
  • Post category:R Programming
  • Post last modified:March 27, 2024
  • Reading time:10 mins read
You are currently viewing R Split Column into Multiple Columns of DataFrame

In R, you can split a single column in a DataFrame into multiple columns using seperate() function from the tidyr package and strsplit() function and many other ways. When working with data, it’s often necessary to split a single column into multiple columns for better organization and analysis. In this article, I will explain various methods in R for splitting a column in a DataFrame and demonstrate their implementation using a sample dataset containing information about locations and populations.

Advertisements

Key Points

  • R’s tidyr package provides the separate() function, allowing users to split a single column into multiple columns based on a delimiter or a pattern.
  • The separate() function takes the original column, a destination for the new columns, and a separator argument to define how the splitting should occur.
  • When splitting a column with separate(), it’s important to consider the type of separator used (character or regular expression) and handle cases where the split may result in missing values.
  • The separate() function is particularly useful for cleaning and transforming data when a single column contains multiple pieces of information that need to be separated for analysis or visualization purposes.
  • Also, use strsplit() function to split the DataFrame column.

Let’s create an R DataFrame and explore examples and output.


# Create dataframe
df <- data.frame(
  Location = c("NewYork NY", "LosAngeles CA", "Chicago IL", "Houston TX"),
  Population = c(8175133, 3792621, 2695598, 2328066)
)

# Displaying the original dataframe
print("Original DataFrame:")
print(df)
r split column

2. Using strsplit() and do.call() Fuctions to Split Column in R

You can use strsplit() and do.call() functions of base R to split the data frame column into multiple columns. strsplit() function splits the data frame string column into two separate columns based on a specified delimiter and returns a list where each element is a vector of substrings. The do.call() function is used to call the rbind function with the list of vectors obtained from the previous step. rbind is used to combine these vectors row-wise, effectively creating a matrix.


# Split single column into two separate columns
# using strsplit() 
df[c("area", "state")] <- do.call(rbind, strsplit(df$Location, " "))

# Resulting DataFrame
print(df)

df[c("area", "state")] <- do.call(rbind, strsplit(df$Location, " ")) this code splits the Location column into two new columns, area and state based on the space delimiter. This example yields the below output.

r split column

3. Using separate() from tidyr Package

The separate() function of the tidyr package in R is designed to split a single column into multiple columns based on a specified separator. Before going to use the separate() function of tidyr, we need to install the tidyr package using install.packages(tidyr) and load it as library(tidyr). This package is commonly used for data tidying, which involves reshaping and restructuring data.

Understanding the structure of the data in the original column is crucial before applying the separate() function in R to ensure appropriate splitting.


# Split single column into two separate columns using seperate()
library(tidyr)
df <- separate(df, col = Location, into = c("Area", "State"), sep = " ")

# Resulting DataFrame
print(df)

# Output:
#        Area State Population
#1    NewYork    NY    8175133
#2 LosAngeles    CA    3792621
#3    Chicago    IL    2695598
#4    Houston    TX    2328066

Here,

  1. separate()function separates a single column into multiple columns based on a specified separator.
  2. col = Location specifies the column(Location) of data frame (df) that you want to separate.
  3. into = c("Area", "State") specifies the names of the new columns that will be created after the separation. In this case, the original "Location" column will be split into two new columns named "Area" and "State.
  4. sep = " " specifies the separator used to split the original column. In this case, the separator is a space (” “).

4. Using substr() Split Column into Multiple Columns in R

Alternatively, you can use substr() from R base functions to split a single column into two separate columns of a data frame in R. For example,


# Split single column into two separate columns using substr()
df$Area <- substr(df$Location, 1, regexpr(" ", df$Location) - 1)
df$State <- substr(df$Location, regexpr(" ", df$Location) + 1, nchar(df$Location))

# Resulting DataFrame
print(df)

# Output:
#        Location Population       Area State
# 1    NewYork NY    8175133    NewYork    NY
# 2 LosAngeles CA    3792621 LosAngeles    CA
# 3    Chicago IL    2695598    Chicago    IL
# 4    Houston TX    2328066    Houston    TX

Here,

  • df$Area <- substr(df$Location, 1, regexpr(" ", df$Location) - 1) this code creates a new column named "Area" by extracting a substring from the "Location" column starting from the first character (1) to the position just before the first space in each "Location" value.
  • df$State <- substr(df$Location, regexpr(" ", df$Location) + 1, nchar(df$Location)) this code creates a new column named "State by extracting a substring from the “Location” column starting from the position just after the first space to the end of each "Location" value.

5. Using str_split_fixed()

The str_split_fixed() function is used on a data frame column to split each element in the column based on a specified delimiter and extract a specified number of components.

For example, apply this function to a given data frame column named location of the data frame df. It splits each element in the “Location” column based on the space (‘ ‘) character and extracts a fixed number of components. In this case, the 2 parameter indicates that it should extract two components.


# Split column into two separate columns using str_split_fixed()
library(stringr)
df[c("Area", "State")] <- str_split_fixed(df$Location, ' ', 2)

# Resulting DataFrame
print(df)

Yields the same output as above.

6. Using strsplit() Split the Column in R

Finally, you can use the strsplit() function to split each element in the column of the dataframe based on the specified delimiter. The unlist() function converts the list obtained from strsplit() into a single vector, combining all the split components into a single sequence. Using the matrix() function, the vector obtained from the split operation is reshaped into a matrix with two columns. The resulting matrix is then assigned to two new columns, "Area" and "State", in the original dataframe df.


# Split column into two separate columns uisng strsplit()
df[c("Area", "State")] <- matrix(unlist(strsplit(df$Location, " ")), ncol = 2, byrow = TRUE)

# Resulting DataFrame
print(df)

Yields the same output as above.

7. Conclusion

In this article, I have explained multiple approaches to split a single column in an R DataFrame into multiple columns by using base R functions like the strsplit() and the do.call(), the tidyr package’s separate() function or string manipulation functions like substr() and str_split_fixed(). Each of these methods offers flexibility based on specific requirements. Understanding these techniques empowers data analysts to effectively manage and manipulate data for more insightful analyses in R.

Happy Learning!!

Vijetha

Vijetha is an experienced technical writer with a strong command of various programming languages. She has had the opportunity to work extensively with a diverse range of technologies, including Python, Pandas, NumPy, and R. Throughout her career, Vijetha has consistently exhibited a remarkable ability to comprehend intricate technical details and adeptly translate them into accessible and understandable materials. Follow me at Linkedin.