The substr()
function in R is used to extract or replace substrings within a character vector. It is particularly helpful for working with text data, as it enables easy manipulation of strings by selecting, modifying, or replacing specific portions based on defined positions. This versatile function can be applied to both individual strings and text columns in data frames. In this article, I will explain the substr()
function, its syntax, parameters, and usage, along with examples of how it can be used to manipulate strings in R.
Key Points-
- The
substr()
function allows you to extract a portion of a string by specifying the starting and ending character positions. - It can also replace part of a string by assigning a new substring to the specified character positions.
- The function operates on single strings and character vectors, allowing you to manipulate multiple elements simultaneously.
substr()
can be applied to text columns in data frames, making it useful for manipulating text data within these structures.- It is commonly used in text cleaning, data transformation, and string processing tasks.
- The function can be combined with other string functions like
nchar()
to calculate positions for substring extraction or replacement dynamically. - It can perform operations on multiple strings simultaneously, improving text processing tasks’ efficiency.
- When used on the left-hand side of an assignment,
substr()
directly modifies the original string or column without needing additional steps.
substr() function
The substr()
function in R is designed to extract or modify parts of a string based on the position of characters within that string. It can also be applied to manipulate text in data frames, which are common data structures in R. Below are step-by-step examples of how to use substr()
with different types of R objects.
Syntax
The syntax of the substr() function is followed.
Syntax of the substr()
substr(x, start, stop)
Parameters of the substr() Function
x
: a character vector.start
: The starting position of the substring.stop
: The ending position of the substring.
Return Value
- When extracting:
substr()
returns the substring from the specified positions in the original string. - When modifying: If used on the left-hand side of an assignment, it modifies the string by replacing the specified substring.
Extract a Substring using R substr() Function
To extract a substring from a string, you can specify the portion of characters you want. By creating a string and passing it into the substr()
function along with the starting and ending positions, you can retrieve the desired substring from the original string.
# Extracting substring from a string
# Create a string
str <- "SparkbyExamples"
print("Given string:")
print(str)
result <- substr(str, 1, 5)
print("After extracting a substring:")
print(result)
Yields below output.
Replace a Substring using substr() Function
Alternatively, the substr()
function can also be used to replace a substring within a given string. By assigning a new substring to the specified positions in the original string, it will replace the existing substring with the new one.
# Replaceing substring with new one
str <- "SparkbyExamples"
print("Given a string:")
print(str)
substr(str, 6, 15) <- "ByExamples"
print("After replacing a substring:")
print(str)
Yields below output.
Extract Substrings from Multiple Elements of a Character R Vector
To extract substrings from multiple elements of a character vector, you can use the substr()
function. Simply pass the vector into the function along with the specified positions for the substring. The function will then extract the substring from each element in the vector.
# Extract substring from multiple elements
vec <- c("Data Science", "Machine Learning", "Deep Learning")
print("Given vector:")
print(vec)
result <- substr(vec, 1, 4)
print("After extracting substrings:")
print(result)
# Output:
# [1] "Given vector:"
# [1] "Data Science" "Machine Learning" "Deep Learning"
# [1] "After extracting substrings:"
# [1] "Data" "Mach" "Deep"
Extract Substring from a Data Frame
In this example, you can get the substring from a specific column of a data frame. Let’s create a data frame with a column containing text, and you want to extract particular parts of the strings in that column.
# Extract substring from a data frame
# Create a sample data frame
df <- data.frame(
ID = 1:4,
Names = c("Nick", "Jhon", "Williams", "Lucky"),
stringsAsFactors = FALSE
)
print("Given data frame:")
print(df)
df$Sub_string <- substr(df$Names, 1, 3)
print("Extract substrings from a specific column:")
print(df)
# Output:
# [1] "Given data frame:"
# ID Names
# 1 1 Nick
# 2 2 Jhon
# 3 3 Williams
# 4 4 Lucky
# [1] Extract substrings from a specific column:
# ID Names Sub_string
# 1 1 Nick Nic
# 2 2 Jhon Jho
# 3 3 Williams Wil
# 4 4 Lucky Luc
Modify a Substring in the Data Frame
You can also replace parts of a string within a data frame column. By specifying a portion of characters in each element of the column, you can replace it with a given substring.
# Modifying substrings of a specific column
df$Modi_Names <- df$Names
substr(df$Modi_Names, nchar(df$Names)-2, nchar(df$Names)) <- "xyz"
print(df)
# Output:
# ID Names Modi_Names
# 1 1 Nick Nxyz
# 2 2 Jhon Jxyz
# 3 3 Williams Willixyz
# 4 4 Lucky Luxyz
Conclusion
In this article, I explained the substr()
function in R, a powerful tool for string manipulation that allows you to extract or replace specific portions of text. I also demonstrated how to use this function to manipulate characters within strings, vectors, and data frames.
Related Articles
- Explain the strsplit() function in R.
- Explain the paste() function.
- Explain the paste0() function.
- Explain the substring() function.