• Post author:
  • Post category:Pandas
  • Post last modified:June 26, 2024
  • Reading time:16 mins read

In pandas, the copy() function is used to create a deep or shallow copy of a DataFrame. By default, it creates a deep copy, which means that changes to the original DataFrame will not affect the copied DataFrame and vice versa. This function can be particularly useful when you need to work with a duplicate DataFrame while ensuring the original remains unaltered.

Advertisements

In this article, I will explain the Pandas DataFrame copy() method by using its syntax, parameters, usage, and how to return a new DataFrame or Series that is a copy of the original.

Key Points –

  • The copy() function is used to create a duplicate of a DataFrame, which can be a deep or shallow copy depending on the deep parameter.
  • Creating a deep copy is useful for preserving the original data, ensuring that any manipulations on the copied DataFrame do not impact the original.
  • The function takes a single parameter deep which is a boolean, defaulting to True for deep copy and False for shallow copy.
  • By default (deep=True), it creates a deep copy where the data is completely independent of the original DataFrame. Changes in the original DataFrame do not affect the copy.
  • When deep=False, a shallow copy is created, meaning changes to the original DataFrame will be reflected in the copied DataFrame because only references to the data are copied, not the data itself.

Pandas DataFrame copy() Introduction

Following is the syntax of the Pandas DataFrame copy()


# Syntax of Pandas DataFrame copy()
DataFrame.copy(deep=True)

Parameters of the DataFrame copy()

Following are the parameters of the DataFrame copy() function.

  • deep – bool, default True.
    • When True (default), a deep copy is made, meaning that data is copied and changes to the data in the original DataFrame will not be reflected in the copy.
    • When False, a shallow copy is made, meaning that the original data is not copied and changes to the data in the original DataFrame will be reflected in the copy.

Return Value

It returns a deep or shallow copy of the DataFrame, depending on the deep parameter.

Usage of Pandas DataFrame copy() Function

The pandas DataFrame.copy() function is used to create a duplicate of an existing DataFrame. This can be helpful in a variety of situations where you need to work with a duplicate DataFrame independently of the original.

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Column1Column2.


# Create DataFrame 
import pandas as pd
import numpy as np

data = {'Column1': [5, 15, 8, 20, 25],
        'Column2': [4, 12, 8, 17, 25]}
df = pd.DataFrame(data)
print("Create DataFrame:\n",df)

Yields below output.

pandas dataframe copy

Deep Copy

A deep copy means creating a completely independent copy of the DataFrame. This means that any changes made to the original DataFrame after the copy will not affect the copied DataFrame and vice versa. The data is fully duplicated, ensuring the two DataFrames are independent of each other.


# Create a deep copy of the DataFrame
df_deep_copy = df.copy()

# Modify the original DataFrame
df.iloc[0, 0] = 99
print("Modified Original DataFrame:\n", df)
print("Deep Copied DataFrame (Unchanged):\n", df_deep_copy)

Here,

  • The original DataFrame df is created with the provided data.
  • df.copy() is used to create a deep copy, df_deep_copy, of the original DataFrame.
  • The original DataFrame is modified by setting the first element of Column1 to 99.
  • The deep copied DataFrame remains unchanged, demonstrating that it is independent of the original DataFrame.
pandas dataframe copy

Shallow Copy

A shallow copy in pandas means creating a new DataFrame that references the same data as the original DataFrame. Changes made to the original DataFrame will be reflected in the shallow copy and vice versa, because they share the same underlying data.


# Create a shallow copy of the DataFrame
df_shallow_copy = df.copy(deep=False)

# Modify the original DataFrame
df.iloc[0, 0] = 99
print("Modified Original DataFrame:\n", df)
print("Shallow Copied DataFrame (Changed):\n", df_shallow_copy)

# Output:
# Modified Original DataFrame:
#     Column1  Column2
# 0       99        4
# 1       15       12
# 2        8        8
# 3       20       17
# 4       25       25
# Shallow Copied DataFrame (Changed):
#     Column1  Column2
# 0       99        4
# 1       15       12
# 2        8        8
# 3       20       17
# 4       25       25

Here,

  • The original DataFrame df is created with the provided data.
  • df.copy(deep=False) is used to create a shallow copy, df_shallow_copy, of the original DataFrame.
  • The original DataFrame is modified by setting the first element of Column1 to 99.
  • The shallow copied DataFrame reflects the changes made to the original DataFrame, demonstrating that they share the same underlying data.

Copying a DataFrame with Changes

In Pandas, you might want to create a copy of a DataFrame and make changes to the copied DataFrame while keeping the original DataFrame intact. This can be useful for various data manipulation tasks, such as testing different transformations or preparing subsets of data for analysis.


# Create a deep copy of the DataFrame
df_copy = df.copy()

# Make changes to the copied DataFrame
df_copy['Column1'] = df_copy['Column1'] * 10 
df_copy['Column3'] = df_copy['Column1'] + df_copy['Column2'] 
print("Original DataFrame (Unchanged):\n", df)
print("Modified Copied DataFrame:\n", df_copy)

# Output:
# Original DataFrame (Unchanged):
#     Column1  Column2
# 0        5        4
# 1       15       12
# 2        8        8
# 3       20       17
# 4       25       25
# Modified Copied DataFrame:
#     Column1  Column2  Column3
# 0       50        4       54
# 1      150       12      162
# 2       80        8       88
# 3      200       17      217
# 4      250       25      275

Here,

  • A deep copy of the DataFrame is created using df.copy() and assigned to df_copy.
  • The Column1 values in df_copy are scaled by a factor of 10.
  • A new column Column3 is added, which is the sum of Column1 and Column2.
  • Both the original and modified copied DataFrames are printed to show that the original DataFrame remains unchanged, while the copied DataFrame reflects the changes.

Copying a DataFrame and Renaming Columns

When working with pandas DataFrames, you might want to create a copy of a DataFrame and rename its columns for clarity or to match specific requirements. This can be done easily using the copy() method and the rename() method or by directly assigning new column names.


# Create a deep copy of the DataFrame
df_copy = df.copy()

# Rename columns in the copied DataFrame
df_copy.rename(columns={'Column1': 'Renamed1', 'Column2': 'Renamed2'}, inplace=True)

print("Original DataFrame (Unchanged):\n", df)
print("Copied DataFrame with Renamed Columns:\n", df_copy)

# Output:
# Original DataFrame (Unchanged):
#     Column1  Column2
# 0        5        4
# 1       15       12
# 2        8        8
# 3       20       17
# 4       25       25

# Copied DataFrame with Renamed Columns:
#     Renamed1  Renamed2
# 0         5         4
# 1        15        12
# 2         8         8
# 3        20        17
# 4        25        25

Here,

  • A deep copy of the DataFrame is created using df.copy() and assigned to df_copy.
  • The rename() method is used to rename the columns in the copied DataFrame. The inplace=True parameter ensures that the changes are made directly to df_copy.
  • Alternatively, you can directly assign new column names to df_copy.columns.

Frequently Asked Questions Pandas DataFrame copy() Function

What is the purpose of the copy() function in pandas?

The copy() function is used to create a duplicate of a DataFrame. This can be useful for preserving the original data while making changes to the duplicate, thereby ensuring that the original DataFrame remains unaffected by any modifications.

How do I create a deep copy of a DataFrame?

To create a deep copy of a DataFrame in pandas, you use the copy() method with its default parameter deep=True. This ensures that the new DataFrame is a completely independent copy of the original, meaning that changes made to the original DataFrame do not affect the copied DataFrame and vice versa.

When should I use a deep copy?

Use a deep copy when you need to make independent changes to the copied DataFrame without affecting the original DataFrame. This is useful for experiments, backups, or situations where data integrity is crucial.

When should I use a shallow copy?

Use a shallow copy when you want to create a different view of the same data. This can be useful when you want to manipulate the DataFrame structure (like renaming columns or changing the index) without duplicating the data.

Can I copy and rename columns in a DataFrame at the same time?

You can copy a DataFrame and then rename its columns. However, these are separate operations. First, create a copy, and then rename the columns

Conclusion

In this article, you have learned the Pandas DataFrame copy() function by using its syntax, parameters, usage, and how we can return a new DataFrame (or Series) that is a copy of the original. This new DataFrame can be either a deep copy or a shallow copy, depending on the parameter passed to the copy() method.

Happy Learning!!

Reference