• Post author:
• Post category:R Programming

The `sample()` function in R is used to get random samples from a given dataset or vector. It takes a predefined size from the given data set, either with or without the `replace` parameter, and returns a random sample of the predefined size. In this article, I will explain how the `sample()` function works, including its structure and features, and demonstrate how to use it to create random samples from a vector or dataset, both with and without replacement.

Related: You can produce the random numbers from a normal distribution using the rnorm() function in R.

Key Points-

• The `sample()` function is used to randomly select a subset of elements from a vector or to generate random numbers.
• The function supports both random sampling with replacement and without replacement, as well as weighted sampling.
• By default, the `sample()` function returns a set of unique random samples when `replace = FALSE`.
• Setting the `replace` parameter to `TRUE` allows elements to be selected more than once in the sample.
• The function returns a vector of random samples. If a size is specified, it returns the specified number of random samples.
• An error occurs if the sample size is greater than the vector length and `replace = FALSE`, with the message “cannot take a sample larger than the population when replace = FALSE.”
• The `prob` parameter allows for weighted sampling, where elements have different probabilities of being selected.
• Using `set.seed()` ensures that the random samples generated by the `sample()` function are consistent across different runs of the code.

## R sample() Function

The `sample()` function in R is used to randomly select a subset of elements from a vector or to generate random numbers. This versatile function supports both random sampling with and without replacement, as well as weighted sampling. By default, it returns a set of unique random samples. However, if the `replace` parameter is set to `TRUE`, it allows the same numbers to be selected more than once.

### Syntax of sample()

Following is the syntax of the sample() function.

``````
# Syntax of sample()
sample(x, size, replace = FALSE, prob = NULL)
``````

### Parameter

• `x`: The dataset or vector from which samples are selected.
• `size`: The number of samples to be drawn.
• `replace`: Specifies whether sampling should be with replacement (`TRUE`) or without (`FALSE`).
• `prob`: An optional vector that defines the probability of each element being selected.

### Return Value

It returns a vector of random samples. If a size is specified, it returns the specified number of random samples.

## Get Samples from a Range of Numbers in R

To generate random samples from a sequence, use the `sample()` function. Create a sequence of numbers, pass it to the function, and it will shuffle the numbers and return them in a random order.

``````
# Get samples from the range
# Create a range
data <- 1:10
print("Given range:")
print(data)
samp <- sample(data)
print("Get random samples from a range:")
print(samp)
``````

The sample() function shuffles the numbers 1 to 10 and returns them in a random order.

Yields below output:

## Get R Samples by Specified Size

The `sample()` function generates random numbers from a given dataset by specifying a desired sample size. You can create a sequence of numbers, and pass it to the function along with the specified size, and it will shuffle the numbers and return them in a random order based on the specified size.

``````
# Get specified samples from the range
# Create a vector
data <- (1:10)
print("Given vector:")
print(data)
samp <- sample(data, 3)
print("Get random samples by specified size:")
print(samp)
``````

It selects 3 random elements from a sequence of numbers 1 to 10.

Yields below output.

## Get Samples from Vector With Replacement

Alternatively, you can select a specified number of random elements from a vector with replacement(meaning that an element can be selected more than once in the sample). To do this, set the `size` parameter to the desired number and the `replace` parameter to `TRUE` in the `sample()` function. This approach will shuffle the elements of the vector and return them in a random order, based on the specified size and whether replacement is allowed.

``````
# Get samples from vector with Replacement
# Create vector
vec <- c(letters[1:4])
print("given vector:")
print(vec)
samp <- samplle(vec, 5, replace = TRUE)
print("Get specified samples from Vector with replacement:")
print(samp)
``````

Select 5 elements from a vector of letters ‘a’ to ‘d’, allowing repetitions.

Yields below output.

## Size Greater Than the Length of Vector Without Replacement

When implementing the above example without replacement, an error message will occur because the length of the vector is less than the size of the requested sample. By passing the vector and the specified size into the `sample()` function with `replace = FALSE`, you will receive the error message: `"cannot take a sample larger than the population when replace = FALSE."`

``````
# Get samples from vector without replacement
# Create vector
vec <- c(letters[1:4])
print("given vector:")
print(vec)
samp <- samplle(vec, 5, replace = FALSE)
print("Get specified samples from Vector with replacement:")
print(samp)

# Output:
# Error in sample.int(length(x), size, replace, prob) :
#   cannot take a sample larger than the population when 'replace = FALSE'
``````

## R Sample Probabilites with weights

Similarly, You can get the random samples from a given vector with probability weights using this function. To do that, you can specify the `prob` parameter with specified probability weights for each number, it will return the random samples based on these probability weights.

``````
# Sample Probabilites with weights
vec <- c(letters[1:4])
samp <- sample(vec, 5, replace = TRUE, prob = c(0.7, 0.4, 0.1, 0.6))
print("Get specified samples from Vector with replacement:")
print(samp)

# OUtput:
# [1] "Get specified samples from Vector with replacement:"
# [1] "d" "b" "d" "c" "b"
``````

The above example specifies the probability weights for each number. In this case, the number `1` has a 70% chance of being selected, `2` has a 40% chance, `3` has a 10% chance, and `4` has a 60% chance. Since `replace = TRUE`, the same number can appear multiple times in the sample.

## Get Constant Random Samples

To generate consistent random samples, you can use the set.seed() function along with the `sample()` function. By setting a specific seed with `set.seed()` and then using `sample()`, you will obtain the same random numbers each time the code is executed.

``````
# Get constant sample using set.seed()
# Create vector
vec <- c(1:10)

# Seed the set.seed()
set.seed(5)

# generate random samples
samp <- sample(vec, 4)
print("Get specified samples from Vector with replacement:")
print(samp)

# Output:
# Get specified samples from Vector with replacement:
# [1] 2 9 7 3

# Get specified samples from Vector with replacement:
# [1] 2 9 7 3
``````

## Sample Function in R DataFrame

You can also use the `sample()` function with data frames to randomly select rows. Create a data frame and use the `sample()` function to specify the number of rows you want to select. Pass this into the `df[]` notation, which will return a data frame with the randomly selected rows.

``````
# Get random rows of data frame.
df <- data.frame(ID = 1:10, Value = rnorm(10))
print("Given data frame:")
print(df)
sample_df <- df[sample(nrow(df), 3), ]
print("Get random rows from data frame:")
print(sample_df)

# Output;
# [1] "Given data frame:"
#    ID       Value
# 1   1 -0.93930703
# 2   2  0.65365844
# 3   3 -1.46666334
# 4   4  1.36396660
# 5   5  0.32804221
# 6   6  0.03724202
# 7   7  1.90776910
# 8   8  0.33546662
# 9   9 -0.39697197
# 10 10  0.49556841

# [1] "Get random rows from data frame:"
#   ID      Value
# 10 10 0.49556841
# 6   6 0.03724202
# 5   5 0.32804221
``````

## Conclusion

In this article, I explained how the `sample()` function in R is a powerful tool for generating random samples from a dataset or vector. I also discussed how to use its various parameters to return random samples in different ways, including unique samples, sampling with replacement, and weighted sampling.

Happy Learning!!