• Post author:
  • Post category:R Programming
  • Post last modified:June 17, 2024
  • Reading time:10 mins read

The substring() function is a R base function that extracts/replaces substrings from a character vector. By specifying first and last positions, you can define the range of extraction. This function is highly flexible and suitable for various text manipulation tasks including handling edge cases such as first and last positions that exceed string length.

Advertisements

In this article, I will explain the R substring() function and using its syntax, parameters, and usage how we can manipulate the character vector by extracting a specified portion of characters.

Key points-

  • Using the substring() function to extract substrings by specifying both start and end positions.
  • If the end position is not provided, it defaults to a large number.
  • You can use the string length to extract the entire string.
  • If the end position exceeds the string length, the function returns the string’s actual length.
  • When the start position is greater than the end position, it returns an empty string.
  • Dynamic substring extraction allows extraction based on a variable length
  • The function can be applied to each element of a character vector.
  • Works with vectors of strings, extracting substrings from each element.

The substring() Function

The substring() function in R is a versatile base function used to extract specific portions of strings from a character vector. Using these parameters such as first and last you can extract substrings in multiple ways. This function can handle individual strings as well as vectors of strings.

Syntax of substring()

Following is the syntax of the substring() function.


# Syntax of substring()
substring(text, first, last = 1000000L)

Parameters of the substring() Function

  • text: A character vector from which substrings are to be extracted.
  • first: The starting position of the substring.
  • last (optional): The ending position of the substring. Defaults to a large number (1000000) if not specified.

Return value

It returns a character vector containing the extracted substrings based on the specified start and end positions.

Usage of R substring() Function with Start and End Positions

To extract a specific portion of a character vector, you can use the substring() function by specifying the first and last parameters. Create the character vector or string and pass it into the substring() function along with the first and last parameters to obtain the desired substring.


# Extract characters at specified range
string <- "SparkByExamples"
print("Given string")
print(string)
print("Extract characters from a given string:")
substring(string, 1, 5)

Yields below output.

substring in r

Extracting Substrings from Start Position to End

You can use the substring() function to extract characters from a given string starting at a specified position and continuing to the end of the string. In this example, only the first parameter is specified for extraction. Since the last parameter is not specified, it defaults to a very large number, effectively extracting from the starting position to the end of the string.


# Extract characters at start position to end
string <- "SparkByExamples"
print("Given string")
print(string)
print("Extract characters at specified satrting position:")
substring(string, first = 6)

Yields below output.

substring in r

Extract Whole String using substring() Function

Alternatively, you can use the substring() function to extract the entire string. To do this, set the first parameter to 1, indicating the extraction starts from the first character, and set the last parameter to nchar(string), which calculates the string’s length. This approach will effectively return the entire string by extracting characters from the first to the last position.


# Extract entire string
string <- "SparkByExamples"
print("Given string")
print(string)
print("Extract whole string:")
substring(string, 1, nchar(string))

# Output:
# [1] "Given string
# [1] "SparkByExamples"
# [1] "Extract whole string:"
# [1] "SparkByExamples"

Extracting Last N Characters from String

To extract the last N characters from a string, use the substring() function. You can specify the starting position with the start parameter and the extraction will continue to the end of the string. This method is useful when you need to obtain a portion of a string starting from a specific position.


# Extract last n characters
print("Extracting Last N characters:")
substring(string, 8)

# Output:
# [1] "Extracting Last N characters:"
# [1] "Examples"

Extract Characters When the Last Position is Out of Range

You can handle situations where the specified ending position exceeds the length of the string, using the substring() function. If the specified ending position exceeds the actual length of the string, the function will simply return all characters from the starting position to the end of the string without causing an error.


# Extract characters when the end position is out of range
print("Get substring when ending position is out of range:")
substring(string, 1, 100)

# Output:
# [1] "Get substring when ending position is out of range:"
# [1] "SparkByExamples"

Extract Substring with First Position Greater than Last

The substring() function in R returns an empty string("") when the start position is greater than the end position. If the starting position exceeds the ending position, the function does not extract any characters due to the invalid range.


# Extract Substring with First Position Greater than Last
print("Get substring when starting position greater than end:")
substring(string, 5, 2)

# Output:
# [1] "Get substring when starting position greater than end:"
# [1] ""

Dynamic Substring Extraction Based on Length

Dynamic substring extraction based on length involves programmatically determining the starting position and the number of characters to extract from a given string. Instead of specifying fixed positions, the extraction is based on variables that can change during runtime. In this example, the ending position is calculated as start + length - 1, potentially exceeding the length of the given string.


# Dynamic Substring Extraction Based on Length
string <- "SparkByExamples"
print("Dynamic Substring Extraction Based on Length:")
start <- 8
length <- 15
substring(string, start, start + length - 1)

# Output:
# [1] "Dynamic Substring Extraction Based on length
# [1] "Examples"

From the above code, substring(string, 8, 22) extracts characters from the 8th position to the 22nd position of the string SparkByExamples.

Extracting Substrings from a Vector of Strings

The substring() function is applied element-wise to each string in the vector. For each string, it extracts characters from the first position to the second position, effectively extracting the first two characters of each string.


Extracting Substrings from a Vector of Strings
string_vec <- c("Spark", "By", "Examples")
print("Extracting substring from vector:")
substring(string_vec, 1, 2)
 
# Output:
# [1] "Extracting substring from vector:"
# [1] "Sp" "By" "Ex"

Conclusion

In this article, I have explained the substring() function in R is an essential tool for string manipulation, offering flexibility for extracting substrings based on specified ranges. This function gracefully handles cases where the end position is out of range, effectively returning the substring from the starting position to the end of the string.

Happy Learning!!

References