• Post author:
  • Post category:R Programming
  • Post last modified:August 29, 2024
  • Reading time:9 mins read
You are currently viewing Explain R strsplit() Function with Examples

How to split character vectors in R? You can use the strsplit() function to split the elements of a character vector into substrings in R based on a specified delimiter. In this article, I will explain the strsplit() function and using its syntax, parameters, and usage how we can split the character vector into a list of substrings in multiple ways.

Advertisements

strsplit() Function

The strsplit() function in R is used to split the character vector or string into substrings based on a specific delimiter which is nothing but a character, or a value. It takes the split parameter with a specified delimiter and splits the character vector into a list of substrings.

Syntax of the strsplit() Function

Following is the syntax of the strsplit() function.


# Syntax of the strsplit() function
strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
 

Parameters of strsplit()

Following are the parameters of the strsplit() function.

  • x : Character vector, sting, or a file to be split.
  • split : The delimiter pattern is used to split the string. It could be a single character, a regular expression (if perl = TRUE), or a vector of such character strings.
  • fixed : It is logical. If it is True matches the split exactly otherwise use the regular expression.
  • perl: This parameter accepts logical values. If TRUE, splitting is done as a Perl-style regular expression.
  • useBytes: If its value is TRUE, the matching is done byte-by-byte rather than character-by-character.

Return Value

It returns a list containing the substrings from splitting the input string at the specified delimiter.

Use strsplit() to Split Character Vector in R

Let’s create a character vector then apply the strsplit() function to it and split the elements in a character vector into substrings by a specified match. It returns the list of substrings.


# Create character vector
char_vec = "java python r pyspark"
# Split the character vector using strsplit
split_vector <- strsplit(char_vec, " ")
print("After splitting the vector:")
print(split_vector)

Yields below output.

strsplit in r

Splitting Vector with regular expression

Similarly, you can split the elements of the character vector into substrings by regular expression. Let’s pass the regular expression “[0-9”] along with the given character vector to split it by any sequence of digits from 0 to 9.


# Split the character vector by regular expression
# Create character vector
char_vec = "Spark2By4Examples1." 
split_vector <- strsplit(char_vec, "[0-9]+")
print("After splitting the vector:")
print(split_vector)

The string is split at any sequence of digits [0-9]+.

Yields below output.

strsplit in r

Splitting Vector with R strsplit() and fixed Param

Alternatively, you can split the vector by using the fixed parameter of the strsplit() function. For that, you can set the fixed param with TRUE and pass it into this function along with the given string and specified delimiter pattern. It will split the character vector into a list of substrings based on a specified delimiter rather than a regular expression.


# Split the character vector by fixed = True
# Create character vector
char_vec = "java/python/r/pyspark"
split_vector <- strsplit(char_vec, "/", fixed = TRUE)
print("After splitting the vector:")
print(split_vector)

# Output:
# [1] "After splitting the vector:"
# [[1]]
# [1] "java"    "python"  "r"       "pyspark"

Here, + is treated as a literal character, not as a regular expression. So, the string is split at each occurrence of +. We get the same substrings as in Example 1.

Splitting Vector by Multiple Delimiter

You can also split the character vector by multiple delimiters using the fixed parameter of the strsplit() function. For that, you can set the fixed param with TRUE and pass it into this function along with the given string and specified delimiter pattern. It will split the character vector into a list of substrings based on a specified delimiter rather than a regular expression.


# Split the Character Vector by Multiple Delimeters
# Create character vector
char_vec = "java/python,r%pyspark"
tsplit_vector <- strsplit(char_vec, "/|,|%")
print("After splitting the vector:")
print(split_vector)

# Output:
# [1] "After splitting the vector:"
# [[1]]
# [1] "java"    "python"  "r"       "pyspark"

Here, we’re using a regular expression ;|,, which means splitting at either ; or ,. So, we get the same substrings as in Example 1.

Splitting with R strsplit() and perl Param

You can split the vector by using the perl parameter of the strsplit() function. For that, you can set the perl param with TRUE and pass it into this function along with the given string and specified delimiter pattern. It will split the character vector into a list of substrings based on a specified delimiter rather than a regular expression.


# Split the character vector using strsplit by setting perl = TRUE
# Create character vector
char_vec = "java/python%r,pyspark"
# Split the character vector using strsplit
split_vector <- strsplit(char_vec, "[/%?,]", perl = TRUE)
print("After splitting the vector:")
print(split_vector)


# Output:
# [1] "After splitting the vector:"
# [[1]]
# [1] "java"    "python"  "r"       "pyspark"

Using a Perl-style regular expression [+?], we split the string at either + or ?. .

Conclusion

In this article, I have explained the strsplit() function and using this syntax, parameters, and usage how we can split the elements in a character vector into a list of substrings based on a specified delimiter.

Happy Learning!!

Related article

References

Leave a Reply