In pandas, the pivot()
method is used to reshape a DataFrame by pivoting it around a specific index and creating a new DataFrame. It can transform or summarize data by reorganizing it according to specified criteria.
In this article, I will describe the Pandas DataFrame pivot()
method, covering its syntax, parameters, and usage. I will also demonstrate how to return a new DataFrame that is pivoted based on specified index, columns, and values. This new DataFrame will have the specified index as its row index, the specified columns as its column headers, and the specified values populating the cells.
Key Points –
- The
pivot()
method reshapes data by pivoting a DataFrame around a specified index, creating a new DataFrame with restructured columns and values. - The method can raise a
ValueError
if there are duplicate entries for the specified index/columns pair. Usepivot_table()
with an aggregation function to handle such cases. - Unlike
pivot()
, thepivot_table()
method can handle duplicate values and allows for aggregation, making it suitable for more complex data reshaping tasks. - The result is a new DataFrame that represents a pivoted or transformed view of the original data, facilitating easier analysis and visualization of grouped data.
Pandas DataFrame pivot() Introduction
Following is the syntax of the pandas DataFrame pivot() function.
# Syntax of Pandas dataframe pivot()
DataFrame.pivot(index=None, columns=None, values=None)
Parameters of the DataFrame pivot()
Following are the parameters of the DataFrame pivot() function.
index
– (string or object, optional) Column(s) to set as the index (rows) of the pivoted DataFrame.columns
– (string or object, optional) Column(s) to set as the new columns of the pivoted DataFrame.values
– (string, object or a list of the previous, optional) Column(s) to use for populating new frame’s values.
Return Value
It returns a pivoted DataFrame based on the specified index, columns, and values.
Usage of Pandas DataFrame pivot() Method
The pivot()
method is used to reshape data in a DataFrame by reorganizing it based on specified index, columns, and values. It creates a new DataFrame with a new index and columns derived from the original data, facilitating easier data analysis and visualization.
Now, let’s create a Pandas DataFrame using data from a dictionary.
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'Student Names' : ['Jenny', 'Singh', 'Charles', 'Richard', 'Veena'],
'Category' : ['Online', 'Offline', 'Offline', 'Offline', 'Online'],
'Gender' : ['Female', 'Male', 'Male', 'Male', 'Female'],
'Courses': ['Java', 'Spark', 'PySpark','Hadoop','C'],
'Fee': [15000, 17000, 27000, 29000, 12000],
'Discount': [1100, 800, 1000, 1600, 600]})
print("Original DataFrame:\n",df)
Yields below output.
Let’s perform a basic pivot operation on the DataFrame. We’ll pivot the DataFrame to show the Fee
for each Student Names
based on the Category
.
# Perform the pivot operation
df2 = df.pivot(index='Student Names', columns='Category', values='Fee')
print("Pivoted DataFrame:\n", df2)
This program reshapes the original DataFrame to show the fee for each student categorized by whether they are in the Online
or Offline
category. Each student’s name is used as the index, and the Category
values (Online
and Offline
) become the columns. The Fee
values populate the corresponding cells. If a student doesn’t belong to a particular category, the corresponding cell will contain NaN
.
Pivot with Multiple Values
Alternatively, to perform a pivot with multiple values, you can specify a list of columns in the values
parameter. Let’s pivot the DataFrame to show both the Fee
and Discount
for each Student Names
based on the Category
.
# Perform the pivot operation with multiple values
df2 = df.pivot(index='Student Names', columns='Category', values=['Fee', 'Discount'])
print("Pivoted DataFrame with Multiple Values:\n", df2)
# Output:
# Pivoted DataFrame with Multiple Values:
# Fee Discount
# Category Offline Online Offline Online
# Student Names
# Charles 27000.0 NaN 1000.0 NaN
# Jenny NaN 15000.0 NaN 1100.0
# Richard 29000.0 NaN 1600.0 NaN
# Singh 17000.0 NaN 800.0 NaN
# Veena NaN 12000.0 NaN 600.0
Here,
- The rows are indexed by
Student Names
. - The columns are multi-indexed by the
Category
(withOffline
andOnline
categories) and thevalues
columns (Fee
andDiscount
). - Each cell shows the
Fee
andDiscount
for the corresponding student and category combination. If a student doesn’t belong to a particular category, the corresponding cell will containNaN
.
Pivot with a Single Column as Value
To perform a pivot with a single column as the value, you can specify one column in the values
parameter. Here’s an example where we pivot the DataFrame to show the Fee
for each Student Names
based on their Gender
# Perform the pivot operation with a single column as value
df2 = df.pivot(index='Student Names', columns='Gender', values='Fee')
print("Pivoted DataFrame with Single Column as Value:\n", df2)
# Output:
# Pivoted DataFrame with Single Column as Value:
# Gender Female Male
# Student Names
# Charles NaN 27000.0
# Jenny 15000.0 NaN
# Richard NaN 29000.0
# Singh NaN 17000.0
# Veena 12000.0 NaN
Here,
- The rows are indexed by
Student Names
. - The columns are based on
Gender
(Female
andMale
). - The values are from the
Fee
column. - Each cell shows the fee for the corresponding student and gender combination. If a student doesn’t belong to a particular gender category, the corresponding cell will contain
NaN
.
Pivot with Missing Data
To demonstrate pivoting with missing data, we’ll pivot the DataFrame to show Courses
for each Student Names
based on Category
. This will introduce NaN
where data is missing because some students belong to either the Online
or Offline
category, but not both.
# Pivoting to show Courses for each Student Names based on Category
df2 = df.pivot(index='Student Names', columns='Category', values='Courses')
print("Pivoted DataFrame with Missing Data:\n", df2)
# Outut:
# Pivoted DataFrame with Missing Data:
# Category Offline Online
# Student Names
# Charles PySpark NaN
# Jenny NaN Java
# Richard Hadoop NaN
# Singh Spark NaN
# Veena NaN C
Here,
- The rows are indexed by
Student Names
. - The columns are based on
Category
(Offline
andOnline
). - The values are from the
Courses
column. - Each cell shows the course for the corresponding student and category combination.
- If a student doesn’t belong to a particular category (either
Offline
orOnline
), the corresponding cell will containNaN
, indicating missing data.
Pivot with Non-Unique Entries (Handling Error)
Similarly, when pivoting a DataFrame using pivot()
in pandas, if there are non-unique entries for the specified index and columns combination, a ValueError
will be raised.
import pandas as pd
# Sample DataFrame with non-unique entries
df = pd.DataFrame({
'date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02', '2024-01-01'],
'city': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York'],
'temperature': [55, 60, 53, 59, 56]
})
# Attempting to pivot with non-unique entries will raise a ValueError
try:
df2 = df.pivot(index='date', columns='city', values='temperature')
print("Pivoted DataFrame:\n", df2)
except ValueError as e:
print("Error:", e)
# Output:
# Error: Index contains duplicate entries, cannot reshape
Here,
- The original DataFrame
df
has duplicate entries for the combination ofdate
andcity
. - When we attempt to pivot the DataFrame using
pivot()
withindex='date'
,columns='city'
, andvalues='temperature'
, pandas raise aValueError
becausepivot()
requires unique index and column pairs to reshape the data. - To handle this situation, you would typically use pivot_table() with an aggregation function (
mean
,sum
,max
,min
, etc.) to aggregate values for the duplicate entries, as shown in the previous example.
Frequently Asked Questions on Pandas DataFrame pivot() Method
The pivot()
method in pandas is used to reshape a DataFrame by transforming or reorganizing its data based on specific criteria.
If there are duplicate entries for the specified combination of index
and columns
, the pivot()
method will raise a ValueError
.
The pivot()
method returns a new DataFrame that has been reshaped according to the specified index
, columns
, and values
.
You can pivot a DataFrame using multiple columns for values by passing a list of column names to the values
parameter. This will create a multi-indexed DataFrame.
Missing values (NaN
) in the pivoted DataFrame indicate that the specified combination of index
and columns
did not have corresponding data in the original DataFrame. You can handle missing values by filling them with a specific value or using methods like fillna()
to replace NaN
values.
Conclusion
In this article, I have explained the Pandas DataFrame pivot()
method, including its syntax, parameters, and usage. I have also explained how to reshape a DataFrame based on a specified index, columns, and values to create a new DataFrame.
Happy Learning!!
Related Articles
- Pandas DataFrame sample() Function
- Pandas DataFrame describe() Method
- Pandas DataFrame equals() Method
- Pandas DataFrame sum() Method
- Pandas DataFrame shift() Function
- Pandas DataFrame info() Function
- Pandas DataFrame head() Method
- Pandas DataFrame explode() Method
- Pandas DataFrame nunique() Method
- Pandas Select All Columns Except One Column
- Pandas Get DataFrame Columns by Data Type
- How to Compare Two Columns Using Pandas?