• Post author:
  • Post category:Polars
  • Post last modified:March 31, 2025
  • Reading time:12 mins read
You are currently viewing Polars DataFrame clone() – Explained by Examples

In Polars, the clone() method is used to create a deep copy of a DataFrame, ensuring that the cloned DataFrame is independent of the original. This means that any modifications made to the cloned DataFrame will not affect the original DataFrame, and vice versa.

Advertisements

In this article, I will explain the Polar DataFrame clone() function, covering its syntax, parameters, and usage. Through detailed examples, I will demonstrate how to create a new DataFrame containing the same data as the original. It performs a deep copy, ensuring the original and cloned DataFrame have independent memory allocations.

Key Points –

  • The clone() method creates a deep copy of a Polars DataFrame, ensuring the original remains unchanged.
  • df.clone() returns a new DataFrame with the same structure and data as the original.
  • The cloned DataFrame is independent of the original, meaning modifications to one do not affect the other.
  • It is useful when performing transformations (like filtering, or adding columns) without modifying the original DataFrame.
  • Unlike shallow copies, clone() ensures that changes in the cloned DataFrame do not reflect in the original.
  • Since Polars is optimized for immutability, clone() helps maintain data integrity in complex operations.
  • Polars uses efficient memory techniques, so clone() is optimized for performance compared to deep copies in other libraries.
  • In multi-step operations, clone() ensures the original dataset remains unchanged, helping maintain reproducibility.

Polars DataFrame clone() Introduction

Let’s know the syntax of the clone() function.


# Syntax of polars clone() 
DataFrame.clone() → DataFrame

Return Value

This function returns a new DataFrame with the same data as the original. Works as a deep copy, meaning the original and cloned DataFrame have separate memory allocations.

Usage of Polars DataFrame clone() Method

The clone() method in Polars creates an independent copy of a DataFrame, ensuring that modifications to the cloned DataFrame do not affect the original one.

First, let’s create a Polars DataFrame.


import polars as pl

technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fees' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','35days', '40days','55days'],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

polars clone

The clone() method in Polars creates an independent copy of a DataFrame. This ensures that any modifications to the cloned DataFrame do not affect the original.


# Clone the DataFrame
df2 = df.clone()
print("Cloned DataFrame:\n", df2)

Here,

  • clone() creates a deep copy of the DataFrame.
  • Any modifications to df2 will not affect df.
  • Useful when performing transformations while keeping the original data intact.
polars clone

Modifying the Cloned DataFrame

When you modify a cloned DataFrame, the original DataFrame remains unchanged because clone() creates a deep copy.


# Clone the DataFrame
df_clone = df.clone()

# Modify the cloned DataFrame (increase Fees by 10%)
df2= df_clone.with_columns((df_clone["Fees"] * 1.10).alias("Fees"))
print("Modified Cloned DataFrame:\n",df2)

# Output:
# Modified Cloned DataFrame:
# shape: (5, 4)
┌─────────┬─────────┬──────────┬──────────┐
│ Courses ┆ Fees    ┆ Duration ┆ Discount │
│ ---     ┆ ---     ┆ ---      ┆ ---      │
│ str     ┆ f64     ┆ str      ┆ i64      │
╞═════════╪═════════╪══════════╪══════════╡
│ Spark   ┆ 24200.0 ┆ 30days   ┆ 1000     │
│ PySpark ┆ 27500.0 ┆ 50days   ┆ 2300     │
│ Hadoop  ┆ 25300.0 ┆ 35days   ┆ 1000     │
│ Python  ┆ 26400.0 ┆ 40days   ┆ 1200     │
│ Pandas  ┆ 28600.0 ┆ 55days   ┆ 2500     │
└─────────┴─────────┴──────────┴──────────┘

Any modifications made to the cloned DataFrame will not affect the original one. This is because clone() performs a deep copy, ensuring the original DataFrame remains unchanged.


# Clone the DataFrame
df_clone = df.clone()

# Modifying the cloned DataFrame
df2 = df_clone.with_columns((df_clone["Fees"] - df_clone["Discount"]).alias("Net_Fees"))
print("Modified Cloned DataFrame:\n", df2)

# Output:
# Modified Cloned DataFrame:
# shape: (5, 5)
┌─────────┬───────┬──────────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount ┆ Net_Fees │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╪══════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     ┆ 21000    │
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     ┆ 22700    │
│ Hadoop  ┆ 23000 ┆ 35days   ┆ 1000     ┆ 22000    │
│ Python  ┆ 24000 ┆ 40days   ┆ 1200     ┆ 22800    │
│ Pandas  ┆ 26000 ┆ 55days   ┆ 2500     ┆ 23500    │
└─────────┴───────┴──────────┴──────────┴──────────┘

Here,

  • The clone() method creates an independent copy of the DataFrame.
  • Modifications in df_clone do not affect df.
  • The Net_Fees column is added only to df_clone, while df remains unchanged.

Checking if the Clone is Independent

To confirm that the cloned DataFrame is independent, we can modify the cloned DataFrame and check whether the original DataFrame remains unchanged.


# Cloning the DataFrame
df_clone = df.clone()

# Modifying the cloned DataFrame
df2 = df_clone.with_columns(pl.lit("Updated").alias("Status"))
print("Modified Cloned DataFrame:\n", df2)

# Output:
# Modified Cloned DataFrame:
# shape: (5, 5)
┌─────────┬───────┬──────────┬──────────┬─────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount ┆ Status  │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---     │
│ str     ┆ i64   ┆ str      ┆ i64      ┆ str     │
╞═════════╪═══════╪══════════╪══════════╪═════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     ┆ Updated │
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     ┆ Updated │
│ Hadoop  ┆ 23000 ┆ 35days   ┆ 1000     ┆ Updated │
│ Python  ┆ 24000 ┆ 40days   ┆ 1200     ┆ Updated │
│ Pandas  ┆ 26000 ┆ 55days   ┆ 2500     ┆ Updated │
└─────────┴───────┴──────────┴──────────┴─────────┘

Here,

  • The clone() method creates a fully independent copy.
  • Changes made to df_clone (like adding the Status column) do not affect df.
  • Useful when performing temporary transformations without modifying the original dataset.

Adding a Column to the Cloned DataFrame

Adding a column to the cloned DataFrame does not impact the original, as clone() creates an independent copy. You can modify the cloned DataFrame freely without affecting the original.


# Cloning the DataFrame
df_clone = df.clone()

# Adding a new column to the cloned DataFrame
df2 = df_clone.with_columns(pl.Series("Category", ["BigData", "BigData", "BigData", "Programming", "DataScience"]))
print("Modified Cloned DataFrame:\n", df2)

# #Output:
# Modified Cloned DataFrame:
# shape: (5, 5)
┌─────────┬───────┬──────────┬──────────┬─────────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount ┆ Category    │
│ ---     ┆ ---   ┆ ---      ┆ ---      ┆ ---         │
│ str     ┆ i64   ┆ str      ┆ i64      ┆ str         │
╞═════════╪═══════╪══════════╪══════════╪═════════════╡
│ Spark   ┆ 22000 ┆ 30days   ┆ 1000     ┆ BigData     │
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     ┆ BigData     │
│ Hadoop  ┆ 23000 ┆ 35days   ┆ 1000     ┆ BigData     │
│ Python  ┆ 24000 ┆ 40days   ┆ 1200     ┆ Programming │
│ Pandas  ┆ 26000 ┆ 55days   ┆ 2500     ┆ DataScience │
└─────────┴───────┴──────────┴──────────┴─────────────┘

Here,

  • clone() ensures that modifications in df_clone don’t affect df.
  • The new Category column is only added to df2.
  • Great for performing transformations without modifying the original DataFrame.

Applying Filters to the Cloned DataFrame

Filtering the cloned DataFrame does not alter the original, as clone() creates an independent copy. You can apply filters to the cloned DataFrame without impacting the original.


# Cloning the DataFrame
df_clone = df.clone()

# Applying a filter: Select rows where Fees > 23000
df2 = df_clone.filter(df_clone["Fees"] > 23000)
print("Filtered Cloned DataFrame:\n", df2)

# Output:
# Filtered Cloned DataFrame:
# shape: (3, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees  ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ PySpark ┆ 25000 ┆ 50days   ┆ 2300     │
│ Python  ┆ 24000 ┆ 40days   ┆ 1200     │
│ Pandas  ┆ 26000 ┆ 55days   ┆ 2500     │
└─────────┴───────┴──────────┴──────────┘

Here,

  • The filter() method allows selective row extraction.
  • The original df remains unchanged, while df2 contains only rows where Fees > 23000.
  • Useful when performing exploratory data analysis (EDA) without modifying the main dataset.

Conclusion

In conclusion, using clone() ensures that any modifications made to the cloned DataFrame do not affect the original. Since it creates a deep copy, both DataFrames have separate memory allocations, allowing you to work on the cloned version independently.

Happy Learning!!

References