In pandas, the copy()
function is used to create a deep or shallow copy of a DataFrame. By default, it creates a deep copy, which means that changes to the original DataFrame will not affect the copied DataFrame and vice versa. This function can be particularly useful when you need to work with a duplicate DataFrame while ensuring the original remains unaltered.
In this article, I will explain the Pandas DataFrame copy()
method by using its syntax, parameters, usage, and how to return a new DataFrame or Series that is a copy of the original.
Key Points –
- The
copy()
function is used to create a duplicate of a DataFrame, which can be a deep or shallow copy depending on thedeep
parameter. - Creating a deep copy is useful for preserving the original data, ensuring that any manipulations on the copied DataFrame do not impact the original.
- The function takes a single parameter
deep
which is a boolean, defaulting toTrue
for deep copy andFalse
for shallow copy. - By default (
deep=True
), it creates a deep copy where the data is completely independent of the original DataFrame. Changes in the original DataFrame do not affect the copy. - When
deep=False
, a shallow copy is created, meaning changes to the original DataFrame will be reflected in the copied DataFrame because only references to the data are copied, not the data itself.
Pandas DataFrame copy() Introduction
Following is the syntax of the Pandas DataFrame copy()
# Syntax of Pandas DataFrame copy()
DataFrame.copy(deep=True)
Parameters of the DataFrame copy()
Following are the parameters of the DataFrame copy() function.
deep
– bool, defaultTrue
.- When
True
(default), a deep copy is made, meaning that data is copied and changes to the data in the original DataFrame will not be reflected in the copy. - When
False
, a shallow copy is made, meaning that the original data is not copied and changes to the data in the original DataFrame will be reflected in the copy.
- When
Return Value
It returns a deep or shallow copy of the DataFrame, depending on the deep
parameter.
Usage of Pandas DataFrame copy() Function
The pandas DataFrame.copy()
function is used to create a duplicate of an existing DataFrame. This can be helpful in a variety of situations where you need to work with a duplicate DataFrame independently of the original.
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Column1
, Column2
.
# Create DataFrame
import pandas as pd
import numpy as np
data = {'Column1': [5, 15, 8, 20, 25],
'Column2': [4, 12, 8, 17, 25]}
df = pd.DataFrame(data)
print("Create DataFrame:\n",df)
Yields below output.
Deep Copy
A deep copy means creating a completely independent copy of the DataFrame. This means that any changes made to the original DataFrame after the copy will not affect the copied DataFrame and vice versa. The data is fully duplicated, ensuring the two DataFrames are independent of each other.
# Create a deep copy of the DataFrame
df_deep_copy = df.copy()
# Modify the original DataFrame
df.iloc[0, 0] = 99
print("Modified Original DataFrame:\n", df)
print("Deep Copied DataFrame (Unchanged):\n", df_deep_copy)
Here,
- The original DataFrame
df
is created with the provided data. df.copy()
is used to create a deep copy,df_deep_copy
, of the original DataFrame.- The original DataFrame is modified by setting the first element of
Column1
to99
. - The deep copied DataFrame remains unchanged, demonstrating that it is independent of the original DataFrame.
Shallow Copy
A shallow copy in pandas means creating a new DataFrame that references the same data as the original DataFrame. Changes made to the original DataFrame will be reflected in the shallow copy and vice versa, because they share the same underlying data.
# Create a shallow copy of the DataFrame
df_shallow_copy = df.copy(deep=False)
# Modify the original DataFrame
df.iloc[0, 0] = 99
print("Modified Original DataFrame:\n", df)
print("Shallow Copied DataFrame (Changed):\n", df_shallow_copy)
# Output:
# Modified Original DataFrame:
# Column1 Column2
# 0 99 4
# 1 15 12
# 2 8 8
# 3 20 17
# 4 25 25
# Shallow Copied DataFrame (Changed):
# Column1 Column2
# 0 99 4
# 1 15 12
# 2 8 8
# 3 20 17
# 4 25 25
Here,
- The original DataFrame
df
is created with the provided data. df.copy(deep=False)
is used to create a shallow copy,df_shallow_copy
, of the original DataFrame.- The original DataFrame is modified by setting the first element of
Column1
to99
. - The shallow copied DataFrame reflects the changes made to the original DataFrame, demonstrating that they share the same underlying data.
Copying a DataFrame with Changes
In Pandas, you might want to create a copy of a DataFrame and make changes to the copied DataFrame while keeping the original DataFrame intact. This can be useful for various data manipulation tasks, such as testing different transformations or preparing subsets of data for analysis.
# Create a deep copy of the DataFrame
df_copy = df.copy()
# Make changes to the copied DataFrame
df_copy['Column1'] = df_copy['Column1'] * 10
df_copy['Column3'] = df_copy['Column1'] + df_copy['Column2']
print("Original DataFrame (Unchanged):\n", df)
print("Modified Copied DataFrame:\n", df_copy)
# Output:
# Original DataFrame (Unchanged):
# Column1 Column2
# 0 5 4
# 1 15 12
# 2 8 8
# 3 20 17
# 4 25 25
# Modified Copied DataFrame:
# Column1 Column2 Column3
# 0 50 4 54
# 1 150 12 162
# 2 80 8 88
# 3 200 17 217
# 4 250 25 275
Here,
- A deep copy of the DataFrame is created using
df.copy()
and assigned todf_copy
. - The
Column1
values indf_copy
are scaled by a factor of 10. - A new column
Column3
is added, which is the sum ofColumn1
andColumn2
. - Both the original and modified copied DataFrames are printed to show that the original DataFrame remains unchanged, while the copied DataFrame reflects the changes.
Copying a DataFrame and Renaming Columns
When working with pandas DataFrames, you might want to create a copy of a DataFrame and rename its columns for clarity or to match specific requirements. This can be done easily using the copy()
method and the rename()
method or by directly assigning new column names.
# Create a deep copy of the DataFrame
df_copy = df.copy()
# Rename columns in the copied DataFrame
df_copy.rename(columns={'Column1': 'Renamed1', 'Column2': 'Renamed2'}, inplace=True)
print("Original DataFrame (Unchanged):\n", df)
print("Copied DataFrame with Renamed Columns:\n", df_copy)
# Output:
# Original DataFrame (Unchanged):
# Column1 Column2
# 0 5 4
# 1 15 12
# 2 8 8
# 3 20 17
# 4 25 25
# Copied DataFrame with Renamed Columns:
# Renamed1 Renamed2
# 0 5 4
# 1 15 12
# 2 8 8
# 3 20 17
# 4 25 25
Here,
- A deep copy of the DataFrame is created using
df.copy()
and assigned todf_copy
. - The
rename()
method is used to rename the columns in the copied DataFrame. Theinplace=True
parameter ensures that the changes are made directly todf_copy
. - Alternatively, you can directly assign new column names to
df_copy.columns
.
Frequently Asked Questions Pandas DataFrame copy() Function
The copy()
function is used to create a duplicate of a DataFrame. This can be useful for preserving the original data while making changes to the duplicate, thereby ensuring that the original DataFrame remains unaffected by any modifications.
To create a deep copy of a DataFrame in pandas, you use the copy()
method with its default parameter deep=True
. This ensures that the new DataFrame is a completely independent copy of the original, meaning that changes made to the original DataFrame do not affect the copied DataFrame and vice versa.
Use a deep copy when you need to make independent changes to the copied DataFrame without affecting the original DataFrame. This is useful for experiments, backups, or situations where data integrity is crucial.
Use a shallow copy when you want to create a different view of the same data. This can be useful when you want to manipulate the DataFrame structure (like renaming columns or changing the index) without duplicating the data.
You can copy a DataFrame and then rename its columns. However, these are separate operations. First, create a copy, and then rename the columns
Conclusion
In this article, you have learned the Pandas DataFrame copy()
function by using its syntax, parameters, usage, and how we can return a new DataFrame (or Series) that is a copy of the original. This new DataFrame can be either a deep copy or a shallow copy, depending on the parameter passed to the copy()
method.
Happy Learning!!
Related Articles
- Pandas DataFrame sum() Method
- Pandas DataFrame corr() Method
- Pandas DataFrame assign() Method
- Pandas DataFrame insert() Function
- Pandas DataFrame clip() Method
- Pandas DataFrame median() Method
- Pandas DataFrame div() Function
- How to Split Pandas DataFrame?
- Pandas DataFrame mode() Method
- Split Pandas DataFrame by Column Value
- Pandas DataFrame quantile() Function
- pandas DataFrame.sort_index() – Sort by Index