• Post author:
  • Post category:R Programming
  • Post last modified:September 12, 2024
  • Reading time:11 mins read
You are currently viewing Explain sample() in R with Examples

The sample() function in R is used to get random samples from a given dataset or vector. It takes a predefined size from the given data set, either with or without the replace parameter, and returns a random sample of the predefined size. In this article, I will explain how the sample() function works, including its structure and features, and demonstrate how to use it to create random samples from a vector or dataset, both with and without replacement.

Advertisements

Related: You can produce the random numbers from a normal distribution using the rnorm() function in R.

Key Points-

  • The sample() function is used to randomly select a subset of elements from a vector or to generate random numbers.
  • The function supports both random sampling with replacement and without replacement, as well as weighted sampling.
  • By default, the sample() function returns a set of unique random samples when replace = FALSE.
  • Setting the replace parameter to TRUE allows elements to be selected more than once in the sample.
  • The function returns a vector of random samples. If a size is specified, it returns the specified number of random samples.
  • An error occurs if the sample size is greater than the vector length and replace = FALSE, with the message “cannot take a sample larger than the population when replace = FALSE.”
  • The prob parameter allows for weighted sampling, where elements have different probabilities of being selected.
  • Using set.seed() ensures that the random samples generated by the sample() function are consistent across different runs of the code.

R sample() Function

The sample() function in R is used to randomly select a subset of elements from a vector or to generate random numbers. This versatile function supports both random sampling with and without replacement, as well as weighted sampling. By default, it returns a set of unique random samples. However, if the replace parameter is set to TRUE, it allows the same numbers to be selected more than once.

Syntax of sample()

Following is the syntax of the sample() function.


# Syntax of sample()
sample(x, size, replace = FALSE, prob = NULL)

Parameter

  • x: The dataset or vector from which samples are selected.
  • size: The number of samples to be drawn.
  • replace: Specifies whether sampling should be with replacement (TRUE) or without (FALSE).
  • prob: An optional vector that defines the probability of each element being selected.

Return Value

It returns a vector of random samples. If a size is specified, it returns the specified number of random samples.

Get Samples from a Range of Numbers in R

To generate random samples from a sequence, use the sample() function. Create a sequence of numbers, pass it to the function, and it will shuffle the numbers and return them in a random order.


# Get samples from the range
# Create a range
data <- 1:10
print("Given range:")
print(data)
samp <- sample(data)
print("Get random samples from a range:")
print(samp)

The sample() function shuffles the numbers 1 to 10 and returns them in a random order.

Yields below output:

sample in r

Get R Samples by Specified Size

The sample() function generates random numbers from a given dataset by specifying a desired sample size. You can create a sequence of numbers, and pass it to the function along with the specified size, and it will shuffle the numbers and return them in a random order based on the specified size.


# Get specified samples from the range
# Create a vector
data <- (1:10)
print("Given vector:")
print(data)
samp <- sample(data, 3)
print("Get random samples by specified size:")
print(samp)

It selects 3 random elements from a sequence of numbers 1 to 10.

Yields below output.

sample in r

Get Samples from Vector With Replacement

Alternatively, you can select a specified number of random elements from a vector with replacement(meaning that an element can be selected more than once in the sample). To do this, set the size parameter to the desired number and the replace parameter to TRUE in the sample() function. This approach will shuffle the elements of the vector and return them in a random order, based on the specified size and whether replacement is allowed.


# Get samples from vector with Replacement
# Create vector
vec <- c(letters[1:4])
print("given vector:")
print(vec)
samp <- samplle(vec, 5, replace = TRUE)
print("Get specified samples from Vector with replacement:")
print(samp)

Select 5 elements from a vector of letters ‘a’ to ‘d’, allowing repetitions.

Yields below output.

sample in r

Size Greater Than the Length of Vector Without Replacement

When implementing the above example without replacement, an error message will occur because the length of the vector is less than the size of the requested sample. By passing the vector and the specified size into the sample() function with replace = FALSE, you will receive the error message: "cannot take a sample larger than the population when replace = FALSE."


# Get samples from vector without replacement
# Create vector
vec <- c(letters[1:4])
print("given vector:")
print(vec)
samp <- samplle(vec, 5, replace = FALSE)
print("Get specified samples from Vector with replacement:")
print(samp)

# Output:
# Error in sample.int(length(x), size, replace, prob) : 
#   cannot take a sample larger than the population when 'replace = FALSE'

R Sample Probabilites with weights

Similarly, You can get the random samples from a given vector with probability weights using this function. To do that, you can specify the prob parameter with specified probability weights for each number, it will return the random samples based on these probability weights.


# Sample Probabilites with weights
vec <- c(letters[1:4])
samp <- sample(vec, 5, replace = TRUE, prob = c(0.7, 0.4, 0.1, 0.6))
print("Get specified samples from Vector with replacement:")
print(samp)

# OUtput:
# [1] "Get specified samples from Vector with replacement:"
# [1] "d" "b" "d" "c" "b"

The above example specifies the probability weights for each number. In this case, the number 1 has a 70% chance of being selected, 2 has a 40% chance, 3 has a 10% chance, and 4 has a 60% chance. Since replace = TRUE, the same number can appear multiple times in the sample.

Get Constant Random Samples

To generate consistent random samples, you can use the set.seed() function along with the sample() function. By setting a specific seed with set.seed() and then using sample(), you will obtain the same random numbers each time the code is executed.


# Get constant sample using set.seed()
# Create vector
vec <- c(1:10)

# Seed the set.seed()
set.seed(5)

# generate random samples
samp <- sample(vec, 4)
print("Get specified samples from Vector with replacement:")
print(samp)

# Output:
# Get specified samples from Vector with replacement:
# [1] 2 9 7 3

# Get specified samples from Vector with replacement:
# [1] 2 9 7 3

Sample Function in R DataFrame

You can also use the sample() function with data frames to randomly select rows. Create a data frame and use the sample() function to specify the number of rows you want to select. Pass this into the df[] notation, which will return a data frame with the randomly selected rows.


# Get random rows of data frame.
df <- data.frame(ID = 1:10, Value = rnorm(10))
print("Given data frame:")
print(df)
sample_df <- df[sample(nrow(df), 3), ]
print("Get random rows from data frame:")
print(sample_df)

# Output;
# [1] "Given data frame:"
#    ID       Value
# 1   1 -0.93930703
# 2   2  0.65365844
# 3   3 -1.46666334
# 4   4  1.36396660
# 5   5  0.32804221
# 6   6  0.03724202
# 7   7  1.90776910
# 8   8  0.33546662
# 9   9 -0.39697197
# 10 10  0.49556841

# [1] "Get random rows from data frame:"
#   ID      Value
# 10 10 0.49556841
# 6   6 0.03724202
# 5   5 0.32804221

Conclusion

In this article, I explained how the sample() function in R is a powerful tool for generating random samples from a dataset or vector. I also discussed how to use its various parameters to return random samples in different ways, including unique samples, sampling with replacement, and weighted sampling.

Happy Learning!!

References