In Pandas, the sample()
function is used to obtain a random sample of items from a Pandas Series. It allows you to specify the number of items you want to sample and provides options for setting a random seed for reproducibility.
In this article, I will explain the sample()
function and using its syntax, parameters, and usage how we can return a random sample of items from the Series. It provides a convenient way to select a subset of data for analysis or further processing. Additionally, the random_state
parameter can be utilized to ensure the reproducibility of results.
Key Points –
- The
sample()
function is used to obtain a random sample of items from a Pandas Series. It provides a convenient way to select a subset of data for analysis or further processing. - It offers various parameters such as
n
(number of items),frac
(fraction of items), andreplace
(whether to sample with replacement) to tailor the sampling process according to specific requirements. - The
random_state
parameter allows setting a seed for the random number generator, ensuring reproducibility of results. This is useful for obtaining the same sample across multiple code runs. - By default, the
sample()
function samples rows of the Series. However, you can specify theaxis
parameter to sample columns instead.
Syntax of Series sample() Function
Let’s know the syntax of the pandas series sample() function.
# Syntax of Pandas Series sample()
Series.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
Parameters of the Series sample() Function
Following are the parameters of the sample() function.
n
– Number of items to return iffrac
is None.frac
– Fraction of items to return ifn
is None.replace
– Whether to sample with replacement. Default isFalse
.weights
– An optional array-like of weights, same length as the axis being sampled.random_state
– Seed for the random number generator for reproducibility.axis
– The axis to sample. By default, it samples rows.
Return Value
It returns a new Pandas Series containing a random sample of elements from the original Series.
Basic Random Sampling Using Series.sample()
You can use Series.sample()
function to basic random sampling involves selecting a specified number of random elements from a Pandas Series.
First, let’s create a Pandas Series from a Python list.
import pandas as pd
# Creating a Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print("Original Series:\n",series)
Yields below output.
Here, sample()
is called directly on the Pandas Series series
to obtain a random sample of 3 elements. The random_state
parameter is used to set the seed for random number generation, ensuring reproducibility of results.
# Sample 3 random elements from the Series
result = series.sample(n=3, random_state=42)
print("Random Sample:\n",result)
Output.
Use Series.sample() Function to Fractional Sampling
Alternatively, fractional sampling involves drawing a random sample of a specified fraction of the elements from a Pandas Series.
# Draw a random sample of 50% of the values from the Series
result = series.sample(frac=0.5, random_state=42)
print("Fractional Sample:\n",result)
# Output:
# Fractional Sample:
# 1 20
# 4 50
# dtype: int64
In the above examples, you can use the sample()
function on this Series, specifying frac=0.5
to select 50% of the values randomly. Then using the random_state=42
parameter ensures the reproducibility of the random sample. Finally, you print out the randomly sampled values.
Sampling with Replacement
Sampling with replacement means that each element selected from the Series is returned to the pool of elements before the next selection. This means that the same element can be chosen multiple times.
# Draw a random sample of 3 values from the Series with replacement
result = series.sample(n=3, replace=True, random_state=42)
print("Sample with Replacement:\n",result)
# Output:
# Sample with Replacement:
# 3 40
# 4 50
# 2 30
# dtype: int64
In the above examples, you use the sample()
function on the series
object, specifying n=3
to select 3 random values. Then, the replace=True
parameter indicates that sampling is done with replacement. The random_state=42
parameter ensures the reproducibility of the random sample.
Weighted Sampling
Similarly, weighted sampling allows you to specify weights for each element in the Series, influencing the probability of selection. Elements with higher weights have a higher chance of being selected.
# Draw a random sample of 3 values from the Series with weights
weights = [0.1, 0.2, 0.3, 0.2, 0.2]
result = series.sample(n=3, weights=weights, random_state=42)
print("Weighted Sample:\n",result)
# Output:
# Weighted Sample:
# 2 30
# 4 50
# 3 40
# dtype: int64
In the above examples, you define weights for each element using the weights
list. Higher weights imply a higher probability of selection. You can use the sample()
function on the series
object, specifying n=3
to select 3 random values. Then the weights
parameter is set to the list of weights defined earlier. The random_state=42
parameter ensures the reproducibility of the random sample.
FAQ on Pandas Series sample() Function
The sample()
function in Pandas Series is used to draw a random sample of elements from the Series. It allows for random sampling with or without replacement and also supports weighted sampling.
You can set the random_state
parameter to a specific value. This ensures that the same random sample is obtained every time the code is executed with the same seed value.
You can perform weighted sampling by specifying the weights
parameter. This allows you to assign different probabilities to each element in the Series.
The sample()
function returns a new Series containing the randomly sampled elements. It does not modify the original Series.
Conclusion
In this article, I have explained the sample()
function and using its syntax, parameters, and usage how we can return Series containing the randomly sampled elements from the original Series.
Happy Learning!!
Related Articles
- Pandas Series.rolling() Function
- Pandas Series rank() Function
- Pandas Series concat() Function
- Pandas Series where() Function
- Pandas Series count() Function
- Pandas Series.max() Function
- Pandas series.str.get() Function
- Pandas Series mode() Function
- Pandas Series.clip() Function
- Pandas Series iloc[] Function
- Pandas Series Drop duplicates() Function
- How to Make a Histogram in Pandas Series?