In Polars, the sample()
method on a Series is used to randomly select a subset of elements from that Series. It functions much like sampling methods in other data libraries, such as Pandas, but is optimized for speed and efficiency. With sample()
, you can specify either an exact number of items or a fraction of the Series to select at random. Additionally, you have the option to enable sampling with replacement, which permits the same element to be selected multiple times.
In this article, I will explain the Polars Series sample()
method, covering its syntax, parameters, and usage, and explain how it returns a new Series containing randomly selected elements from the original one.
Key Points –
- The
sample()
function is used to randomly select elements from a Polars Series. - You can sample by specifying either a fixed number (
n
) or a fraction (frac
) of elements. - Sampling is done without replacement by default, meaning no duplicate selections.
- You can enable sampling with replacement using
with_replacement=True
. - Use
shuffle=True
to return the sampled elements in a randomly shuffled order. - The
seed
parameter allows you to set a fixed random seed for reproducibility. - If neither
n
norfrac
is provided, it defaults to selecting one random element. - The method returns a new Polars Series containing the sampled values.
Polars Series sample() Introduction
Let’s know the syntax of the series sample() method.
# Syntax of sample()
Series.sample(
n: int | None = None,
*,
fraction: float | None = None,
with_replacement: bool = False,
shuffle: bool = False,
seed: int | None = None,
) → Series
Parameters of the Series sample()
Following are the parameters of the series sample() method.
n
– Number of elements to sample (optional, specify eithern
orfraction
).fraction
– Fraction of elements to sample (optional, specify eitherfraction
orn
).with_replacement
– Whether to sample with replacement (defaultFalse
).shuffle
– Whether to shuffle the sampled result (defaultFalse
).seed
– Random seed for reproducibility (optional).
Return Value
This function returns a new Polars Series containing the randomly sampled elements from the original Series.
Usage of Polars Series sample() Function
The sample()
function in a Polars Series is used to randomly extract a subset of elements, either by specifying a fixed number or a fraction of the total. While similar to sampling features in other libraries like Pandas, this method is optimized specifically for Polars.
First, let’s create a Polars Series.
import polars as pl
ser = pl.Series("numbers", [10, 20, 30, 40, 50, 60])
print("Original Series:\n", ser)
Yields below output.
To randomly select one element from a Polars Series using the default behavior of sample()
, simply call the method without specifying n
. By default, sample()
returns a single random element when n
is not provided.
# Sample 1 random element (default)
ser2 = ser.sample()
print("Sampled 1 random element:\n", ser2)
Here,
- By specifying
n=1
, you ask for one random element. - By default,
with_replacement=False
, so the sample will be without replacement. - The sampled element will be returned as a new Series.
Sample 3 Random Elements (Without Replacement)
To randomly select 3 elements without replacement from your Polars Series, set n=3
in the sample()
function. This will return 3 unique random elements from the Series.
# Sample 3 random elements without replacement (default behavior)
ser2 = ser.sample(n=3)
print("Sampled 3 random elements:\n", ser2)
# Output:
# Sampled 3 random elements:
# shape: (3,)
# Series: 'numbers' [i64]
#[
# 10
# 60
# 50
#]
Here,
n=3
means pick 3 elements.- By default,
with_replacement=False
, so elements are unique. - The sampled elements will be randomly picked each time you run it.
- Returns a new Series with 3 randomly selected unique elements.
Sample 60% of the Series using Fraction
To sample 60% of a Polars Series using a fraction, you can use the fraction
parameter in the sample()
method.
# Sample 60% of the Series
ser2 = ser.sample(fraction=0.6)
print("Sampled 60% of the Series:\n", ser2)
# Output:
# Sampled 60% of the Series:
# shape: (3,)
# Series: 'numbers' [i64]
# [
# 20
# 60
# 30
# ]
Here,
fraction=0.6
tells Polars to sample 60% of the values.- Since the Series has 6 elements, 60% = 3.6, which is rounded to 3.
- Sampling is without replacement by default, so no duplicates unless
with_replacement=True
.
Sample 2 Elements with a Fixed Seed for Reproducibility
To sample 2 elements with a fixed seed for reproducibility, use the seed
parameter in the sample()
function.
# Sample 2 elements with a fixed seed
ser2 = ser.sample(n=2, seed=42)
print("Sampled 2 elements with seed=42:\n", ser2)
# Output:
# Sampled 2 elements with seed=42:
# shape: (2,)
# Series: 'numbers' [i64]
# [
# 20
# 50
# ]
Here,
n=2
selects 2 elements.seed=42
ensures you get the same result every time you run the code.- Default is sampling without replacement (no duplicates in output).
Sample 4 Elements and Shuffle the Result
To sample 4 elements and shuffle the results in a Polars Series, use the sample()
method with the shuffle=True
parameter. Since sample()
already returns elements in random order, sampling 4 elements naturally produces a shuffled subset.
# Sample 4 elements and shuffle the result
ser2 = ser.sample(n=4, shuffle=True)
print("Sampled 4 elements and shuffled:\n", ser2)
# Output:
# Sampled 4 elements and shuffled:
# shape: (4,)
# Series: 'numbers' [i64]
# [
# 40
# 20
# 60
# 30
# ]
Here,
n=4
picks 4 random elements.- The returned sample is already shuffled because it’s random sampling without replacement.
Sample 80% of the Series, with Replacement, and Shuffled
To sample 80% of the Series with replacement and shuffle the result, you can use the fraction
, with_replacement=True
, and shuffle=True
parameters in the sample()
method.
# Sample 80% with replacement and shuffle
ser2 = ser.sample(fraction=0.8, with_replacement=True, shuffle=True)
print("Sampled 80% with replacement and shuffled:\n", ser2)
# Output:
# Sampled 80% with replacement and shuffled:
# shape: (4,)
# Series: 'numbers' [i64]
# [
# 10
# 30
# 30
# 40
# ]
Here,
fraction=0.8
: Sample 80% of the Series length (6 * 0.8 = ~5 elements).with_replacement=True
: Allows the same element to be picked multiple times.shuffle=True
: Randomly shuffles the sampled elements.
Conclusion
In conclusion, the sample()
method in Polars Series is a powerful and flexible tool for randomly selecting data. Whether you need a fixed number of elements, a percentage of the Series, or want to sample with replacement and shuffle the output, sample()
makes it easy and efficient.
Happy Learning!!
Related Articles
- Convert Polars Series to List
- Polars Series Unique Values
- Polars Series sort() Usage & Examples
- Polars Series cast() – Usage & Examples
- Polars Series shift() Function with Examples
- Polars Series min() – Explained by Examples
- Polars Series list.join() Function with Examples
- Polars Series explode() – Explained by Examples
- How to Replace Certain Values in a Polars Series?
- How to Convert Polars Series to Pandas Series in Python?