In Polars, the clone()
method is used to create a deep copy of a DataFrame, ensuring that the cloned DataFrame is independent of the original. This means that any modifications made to the cloned DataFrame will not affect the original DataFrame, and vice versa.
In this article, I will explain the Polar DataFrame clone()
function, covering its syntax, parameters, and usage. Through detailed examples, I will demonstrate how to create a new DataFrame containing the same data as the original. It performs a deep copy, ensuring the original and cloned DataFrame have independent memory allocations.
Key Points –
- The
clone()
method creates a deep copy of a Polars DataFrame, ensuring the original remains unchanged. df.clone()
returns a new DataFrame with the same structure and data as the original.- The cloned DataFrame is independent of the original, meaning modifications to one do not affect the other.
- It is useful when performing transformations (like filtering, or adding columns) without modifying the original DataFrame.
- Unlike shallow copies,
clone()
ensures that changes in the cloned DataFrame do not reflect in the original. - Since Polars is optimized for immutability,
clone()
helps maintain data integrity in complex operations. - Polars uses efficient memory techniques, so
clone()
is optimized for performance compared to deep copies in other libraries. - In multi-step operations,
clone()
ensures the original dataset remains unchanged, helping maintain reproducibility.
Polars DataFrame clone() Introduction
Let’s know the syntax of the clone() function.
# Syntax of polars clone()
DataFrame.clone() → DataFrame
Return Value
This function returns a new DataFrame with the same data as the original. Works as a deep copy, meaning the original and cloned DataFrame have separate memory allocations.
Usage of Polars DataFrame clone() Method
The clone()
method in Polars creates an independent copy of a DataFrame, ensuring that modifications to the cloned DataFrame do not affect the original one.
First, let’s create a Polars DataFrame.
import polars as pl
technologies= {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fees' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','35days', '40days','55days'],
'Discount':[1000,2300,1000,1200,2500]
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
The clone()
method in Polars creates an independent copy of a DataFrame. This ensures that any modifications to the cloned DataFrame do not affect the original.
# Clone the DataFrame
df2 = df.clone()
print("Cloned DataFrame:\n", df2)
Here,
clone()
creates a deep copy of the DataFrame.- Any modifications to
df2
will not affectdf
. - Useful when performing transformations while keeping the original data intact.
Modifying the Cloned DataFrame
When you modify a cloned DataFrame, the original DataFrame remains unchanged because clone()
creates a deep copy.
# Clone the DataFrame
df_clone = df.clone()
# Modify the cloned DataFrame (increase Fees by 10%)
df2= df_clone.with_columns((df_clone["Fees"] * 1.10).alias("Fees"))
print("Modified Cloned DataFrame:\n",df2)
# Output:
# Modified Cloned DataFrame:
# shape: (5, 4)
┌─────────┬─────────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ str ┆ i64 │
╞═════════╪═════════╪══════════╪══════════╡
│ Spark ┆ 24200.0 ┆ 30days ┆ 1000 │
│ PySpark ┆ 27500.0 ┆ 50days ┆ 2300 │
│ Hadoop ┆ 25300.0 ┆ 35days ┆ 1000 │
│ Python ┆ 26400.0 ┆ 40days ┆ 1200 │
│ Pandas ┆ 28600.0 ┆ 55days ┆ 2500 │
└─────────┴─────────┴──────────┴──────────┘
Any modifications made to the cloned DataFrame will not affect the original one. This is because clone()
performs a deep copy, ensuring the original DataFrame remains unchanged.
# Clone the DataFrame
df_clone = df.clone()
# Modifying the cloned DataFrame
df2 = df_clone.with_columns((df_clone["Fees"] - df_clone["Discount"]).alias("Net_Fees"))
print("Modified Cloned DataFrame:\n", df2)
# Output:
# Modified Cloned DataFrame:
# shape: (5, 5)
┌─────────┬───────┬──────────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount ┆ Net_Fees │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ i64 │
╞═════════╪═══════╪══════════╪══════════╪══════════╡
│ Spark ┆ 22000 ┆ 30days ┆ 1000 ┆ 21000 │
│ PySpark ┆ 25000 ┆ 50days ┆ 2300 ┆ 22700 │
│ Hadoop ┆ 23000 ┆ 35days ┆ 1000 ┆ 22000 │
│ Python ┆ 24000 ┆ 40days ┆ 1200 ┆ 22800 │
│ Pandas ┆ 26000 ┆ 55days ┆ 2500 ┆ 23500 │
└─────────┴───────┴──────────┴──────────┴──────────┘
Here,
- The
clone()
method creates an independent copy of the DataFrame. - Modifications in
df_clone
do not affectdf
. - The
Net_Fees
column is added only todf_clone
, whiledf
remains unchanged.
Checking if the Clone is Independent
To confirm that the cloned DataFrame is independent, we can modify the cloned DataFrame and check whether the original DataFrame remains unchanged.
# Cloning the DataFrame
df_clone = df.clone()
# Modifying the cloned DataFrame
df2 = df_clone.with_columns(pl.lit("Updated").alias("Status"))
print("Modified Cloned DataFrame:\n", df2)
# Output:
# Modified Cloned DataFrame:
# shape: (5, 5)
┌─────────┬───────┬──────────┬──────────┬─────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount ┆ Status │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╪══════════╪═════════╡
│ Spark ┆ 22000 ┆ 30days ┆ 1000 ┆ Updated │
│ PySpark ┆ 25000 ┆ 50days ┆ 2300 ┆ Updated │
│ Hadoop ┆ 23000 ┆ 35days ┆ 1000 ┆ Updated │
│ Python ┆ 24000 ┆ 40days ┆ 1200 ┆ Updated │
│ Pandas ┆ 26000 ┆ 55days ┆ 2500 ┆ Updated │
└─────────┴───────┴──────────┴──────────┴─────────┘
Here,
- The
clone()
method creates a fully independent copy. - Changes made to
df_clone
(like adding theStatus
column) do not affectdf
. - Useful when performing temporary transformations without modifying the original dataset.
Adding a Column to the Cloned DataFrame
Adding a column to the cloned DataFrame does not impact the original, as clone()
creates an independent copy. You can modify the cloned DataFrame freely without affecting the original.
# Cloning the DataFrame
df_clone = df.clone()
# Adding a new column to the cloned DataFrame
df2 = df_clone.with_columns(pl.Series("Category", ["BigData", "BigData", "BigData", "Programming", "DataScience"]))
print("Modified Cloned DataFrame:\n", df2)
# #Output:
# Modified Cloned DataFrame:
# shape: (5, 5)
┌─────────┬───────┬──────────┬──────────┬─────────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount ┆ Category │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╪══════════╪═════════════╡
│ Spark ┆ 22000 ┆ 30days ┆ 1000 ┆ BigData │
│ PySpark ┆ 25000 ┆ 50days ┆ 2300 ┆ BigData │
│ Hadoop ┆ 23000 ┆ 35days ┆ 1000 ┆ BigData │
│ Python ┆ 24000 ┆ 40days ┆ 1200 ┆ Programming │
│ Pandas ┆ 26000 ┆ 55days ┆ 2500 ┆ DataScience │
└─────────┴───────┴──────────┴──────────┴─────────────┘
Here,
clone()
ensures that modifications indf_clone
don’t affectdf
.- The new Category column is only added to
df2
. - Great for performing transformations without modifying the original DataFrame.
Applying Filters to the Cloned DataFrame
Filtering the cloned DataFrame does not alter the original, as clone()
creates an independent copy. You can apply filters to the cloned DataFrame without impacting the original.
# Cloning the DataFrame
df_clone = df.clone()
# Applying a filter: Select rows where Fees > 23000
df2 = df_clone.filter(df_clone["Fees"] > 23000)
print("Filtered Cloned DataFrame:\n", df2)
# Output:
# Filtered Cloned DataFrame:
# shape: (3, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Duration ┆ Discount │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪═══════╪══════════╪══════════╡
│ PySpark ┆ 25000 ┆ 50days ┆ 2300 │
│ Python ┆ 24000 ┆ 40days ┆ 1200 │
│ Pandas ┆ 26000 ┆ 55days ┆ 2500 │
└─────────┴───────┴──────────┴──────────┘
Here,
- The filter() method allows selective row extraction.
- The original
df
remains unchanged, whiledf2
contains only rows whereFees > 23000
. - Useful when performing exploratory data analysis (EDA) without modifying the main dataset.
Conclusion
In conclusion, using clone()
ensures that any modifications made to the cloned DataFrame do not affect the original. Since it creates a deep copy, both DataFrames have separate memory allocations, allowing you to work on the cloned version independently.
Happy Learning!!
Related Articles
- Polars DataFrame clear() Usage & Examples
- Polars DataFrame row() Usage & Examples
- Polars DataFrame.unique() Function
- Polars DataFrame shape – Explained by Examples
- Polars DataFrame replace_column() – by Examples
- Polars DataFrame partition_by() Usage & Examples
- Polars DataFrame product() Usage with Examples
- Polars DataFrame fill_nan() Usage & Examples
- Polars DataFrame.join() Explained With Examples