• Post author:
  • Post category:Pandas
  • Post last modified:June 22, 2024
  • Reading time:13 mins read

In Pandas, the sample() function is used to obtain a random sample of items from a Pandas Series. It allows you to specify the number of items you want to sample and provides options for setting a random seed for reproducibility.

Advertisements

In this article, I will explain the sample() function and using its syntax, parameters, and usage how we can return a random sample of items from the Series. It provides a convenient way to select a subset of data for analysis or further processing. Additionally, the random_state parameter can be utilized to ensure the reproducibility of results.

Key Points –

  • The sample() function is used to obtain a random sample of items from a Pandas Series. It provides a convenient way to select a subset of data for analysis or further processing.
  • It offers various parameters such as n (number of items), frac (fraction of items), and replace (whether to sample with replacement) to tailor the sampling process according to specific requirements.
  • The random_state parameter allows setting a seed for the random number generator, ensuring reproducibility of results. This is useful for obtaining the same sample across multiple code runs.
  • By default, the sample() function samples rows of the Series. However, you can specify the axis parameter to sample columns instead.

Syntax of Series sample() Function

Let’s know the syntax of the pandas series sample() function.


# Syntax of Pandas Series sample()
Series.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)

Parameters of the Series sample() Function

Following are the parameters of the sample() function.

  • n – Number of items to return if frac is None.
  • frac – Fraction of items to return if n is None.
  • replace – Whether to sample with replacement. Default is False.
  • weights – An optional array-like of weights, same length as the axis being sampled.
  • random_state – Seed for the random number generator for reproducibility.
  • axis – The axis to sample. By default, it samples rows.

Return Value

It returns a new Pandas Series containing a random sample of elements from the original Series.

Basic Random Sampling Using Series.sample()

You can use Series.sample() function to basic random sampling involves selecting a specified number of random elements from a Pandas Series.

First, let’s create a Pandas Series from a Python list.


import pandas as pd

# Creating a Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print("Original Series:\n",series)

Yields below output.

pandas Series sample

Here, sample() is called directly on the Pandas Series series to obtain a random sample of 3 elements. The random_state parameter is used to set the seed for random number generation, ensuring reproducibility of results.


# Sample 3 random elements from the Series
result = series.sample(n=3, random_state=42)
print("Random Sample:\n",result)

Output.

pandas Series sample

Use Series.sample() Function to Fractional Sampling

Alternatively, fractional sampling involves drawing a random sample of a specified fraction of the elements from a Pandas Series.


# Draw a random sample of 50% of the values from the Series
result = series.sample(frac=0.5, random_state=42)
print("Fractional Sample:\n",result)

# Output:
# Fractional Sample:
# 1    20
# 4    50
# dtype: int64

In the above examples, you can use the sample() function on this Series, specifying frac=0.5 to select 50% of the values randomly. Then using the random_state=42 parameter ensures the reproducibility of the random sample. Finally, you print out the randomly sampled values.

Sampling with Replacement

Sampling with replacement means that each element selected from the Series is returned to the pool of elements before the next selection. This means that the same element can be chosen multiple times.


# Draw a random sample of 3 values from the Series with replacement
result = series.sample(n=3, replace=True, random_state=42)
print("Sample with Replacement:\n",result)

# Output:
# Sample with Replacement:
# 3    40
# 4    50
# 2    30
# dtype: int64

In the above examples, you use the sample() function on the series object, specifying n=3 to select 3 random values. Then, the replace=True parameter indicates that sampling is done with replacement. The random_state=42 parameter ensures the reproducibility of the random sample.

Weighted Sampling

Similarly, weighted sampling allows you to specify weights for each element in the Series, influencing the probability of selection. Elements with higher weights have a higher chance of being selected.


# Draw a random sample of 3 values from the Series with weights
weights = [0.1, 0.2, 0.3, 0.2, 0.2]
result = series.sample(n=3, weights=weights, random_state=42)
print("Weighted Sample:\n",result)

# Output:
# Weighted Sample:
# 2    30
# 4    50
# 3    40
# dtype: int64

In the above examples, you define weights for each element using the weights list. Higher weights imply a higher probability of selection. You can use the sample() function on the series object, specifying n=3 to select 3 random values. Then the weights parameter is set to the list of weights defined earlier. The random_state=42 parameter ensures the reproducibility of the random sample.

FAQ on Pandas Series sample() Function

What does the sample() function do in Pandas Series?

The sample() function in Pandas Series is used to draw a random sample of elements from the Series. It allows for random sampling with or without replacement and also supports weighted sampling.

How can I ensure reproducibility of my random sample?

You can set the random_state parameter to a specific value. This ensures that the same random sample is obtained every time the code is executed with the same seed value.

Can I perform weighted sampling with the sample() function?

You can perform weighted sampling by specifying the weights parameter. This allows you to assign different probabilities to each element in the Series.

Does the sample() function modify the original Series?

The sample() function returns a new Series containing the randomly sampled elements. It does not modify the original Series.

Conclusion

In this article, I have explained the sample() function and using its syntax, parameters, and usage how we can return Series containing the randomly sampled elements from the original Series.

Happy Learning!!

Reference