• Post author:
  • Post category:Pandas
  • Post last modified:December 12, 2024
  • Reading time:18 mins read
You are currently viewing Pandas DataFrame pivot() Method

In pandas, the pivot() method is used to reshape a DataFrame by pivoting it around a specific index and creating a new DataFrame. It can transform or summarize data by reorganizing it according to specified criteria.

Advertisements

In this article, I will describe the Pandas DataFrame pivot() method, covering its syntax, parameters, and usage. I will also demonstrate how to return a new DataFrame that is pivoted based on specified index, columns, and values. This new DataFrame will have the specified index as its row index, the specified columns as its column headers, and the specified values populating the cells.

Key Points –

  • The pivot() method reshapes data by pivoting a DataFrame around a specified index, creating a new DataFrame with restructured columns and values.
  • The method can raise a ValueError if there are duplicate entries for the specified index/columns pair. Use pivot_table() with an aggregation function to handle such cases.
  • Unlike pivot(), the pivot_table() method can handle duplicate values and allows for aggregation, making it suitable for more complex data reshaping tasks.
  • The result is a new DataFrame that represents a pivoted or transformed view of the original data, facilitating easier analysis and visualization of grouped data.

Pandas DataFrame pivot() Introduction

Following is the syntax of the pandas DataFrame pivot() function.


# Syntax of Pandas dataframe pivot()
DataFrame.pivot(index=None, columns=None, values=None)

Parameters of the DataFrame pivot()

Following are the parameters of the DataFrame pivot() function.

  • index – (string or object, optional) Column(s) to set as the index (rows) of the pivoted DataFrame.
  • columns – (string or object, optional) Column(s) to set as the new columns of the pivoted DataFrame.
  • values – (string, object or a list of the previous, optional) Column(s) to use for populating new frame’s values.

Return Value

It returns a pivoted DataFrame based on the specified index, columns, and values.

Usage of Pandas DataFrame pivot() Method

The pivot() method is used to reshape data in a DataFrame by reorganizing it based on specified index, columns, and values. It creates a new DataFrame with a new index and columns derived from the original data, facilitating easier data analysis and visualization.

Now, let’s create a Pandas DataFrame using data from a dictionary.


import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Student Names' : ['Jenny', 'Singh', 'Charles', 'Richard', 'Veena'],
                   'Category' : ['Online', 'Offline', 'Offline', 'Offline', 'Online'],
                   'Gender' : ['Female', 'Male', 'Male', 'Male', 'Female'],
                  'Courses': ['Java', 'Spark', 'PySpark','Hadoop','C'],
                   'Fee': [15000, 17000, 27000, 29000, 12000],
                   'Discount': [1100, 800, 1000, 1600, 600]})
print("Original DataFrame:\n",df)

Yields below output.

pandas dataframe pivot

Let’s perform a basic pivot operation on the DataFrame. We’ll pivot the DataFrame to show the Fee for each Student Names based on the Category.


# Perform the pivot operation
df2 = df.pivot(index='Student Names', columns='Category', values='Fee')
print("Pivoted DataFrame:\n", df2)

This program reshapes the original DataFrame to show the fee for each student categorized by whether they are in the Online or Offline category. Each student’s name is used as the index, and the Category values (Online and Offline) become the columns. The Fee values populate the corresponding cells. If a student doesn’t belong to a particular category, the corresponding cell will contain NaN.

pandas dataframe pivot

Pivot with Multiple Values

Alternatively, to perform a pivot with multiple values, you can specify a list of columns in the values parameter. Let’s pivot the DataFrame to show both the Fee and Discount for each Student Names based on the Category.


# Perform the pivot operation with multiple values
df2 = df.pivot(index='Student Names', columns='Category', values=['Fee', 'Discount'])
print("Pivoted DataFrame with Multiple Values:\n", df2)

# Output:
# Pivoted DataFrame with Multiple Values:
#                     Fee          Discount        
# Category       Offline   Online  Offline  Online
# Student Names                                   
# Charles        27000.0      NaN   1000.0     NaN
# Jenny              NaN  15000.0      NaN  1100.0
# Richard        29000.0      NaN   1600.0     NaN
# Singh          17000.0      NaN    800.0     NaN
# Veena              NaN  12000.0      NaN   600.0

Here,

  • The rows are indexed by Student Names.
  • The columns are multi-indexed by the Category (with Offline and Online categories) and the values columns (Fee and Discount).
  • Each cell shows the Fee and Discount for the corresponding student and category combination. If a student doesn’t belong to a particular category, the corresponding cell will contain NaN.

Pivot with a Single Column as Value

To perform a pivot with a single column as the value, you can specify one column in the values parameter. Here’s an example where we pivot the DataFrame to show the Fee for each Student Names based on their Gender


# Perform the pivot operation with a single column as value
df2 = df.pivot(index='Student Names', columns='Gender', values='Fee')
print("Pivoted DataFrame with Single Column as Value:\n", df2)

# Output:
# Pivoted DataFrame with Single Column as Value:
#  Gender          Female     Male
# Student Names                  
# Charles            NaN  27000.0
# Jenny          15000.0      NaN
# Richard            NaN  29000.0
# Singh              NaN  17000.0
# Veena          12000.0      NaN

Here,

  • The rows are indexed by Student Names.
  • The columns are based on Gender (Female and Male).
  • The values are from the Fee column.
  • Each cell shows the fee for the corresponding student and gender combination. If a student doesn’t belong to a particular gender category, the corresponding cell will contain NaN.

Pivot with Missing Data

To demonstrate pivoting with missing data, we’ll pivot the DataFrame to show Courses for each Student Names based on Category. This will introduce NaN where data is missing because some students belong to either the Online or Offline category, but not both.


# Pivoting to show Courses for each Student Names based on Category
df2 = df.pivot(index='Student Names', columns='Category', values='Courses')
print("Pivoted DataFrame with Missing Data:\n", df2)

# Outut:
# Pivoted DataFrame with Missing Data:
#  Category       Offline Online
# Student Names                
# Charles        PySpark    NaN
# Jenny              NaN   Java
# Richard         Hadoop    NaN
# Singh            Spark    NaN
# Veena              NaN      C

Here,

  • The rows are indexed by Student Names.
  • The columns are based on Category (Offline and Online).
  • The values are from the Courses column.
  • Each cell shows the course for the corresponding student and category combination.
  • If a student doesn’t belong to a particular category (either Offline or Online), the corresponding cell will contain NaN, indicating missing data.

Pivot with Non-Unique Entries (Handling Error)

Similarly, when pivoting a DataFrame using pivot() in pandas, if there are non-unique entries for the specified index and columns combination, a ValueError will be raised.


import pandas as pd

# Sample DataFrame with non-unique entries
df = pd.DataFrame({
    'date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02', '2024-01-01'],
    'city': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York'],
    'temperature': [55, 60, 53, 59, 56]
})

# Attempting to pivot with non-unique entries will raise a ValueError
try:
    df2 = df.pivot(index='date', columns='city', values='temperature')
    print("Pivoted DataFrame:\n", df2)
except ValueError as e:
    print("Error:", e)

# Output:
# Error: Index contains duplicate entries, cannot reshape

Here,

  • The original DataFrame df has duplicate entries for the combination of date and city.
  • When we attempt to pivot the DataFrame using pivot() with index='date', columns='city', and values='temperature', pandas raise a ValueError because pivot() requires unique index and column pairs to reshape the data.
  • To handle this situation, you would typically use pivot_table() with an aggregation function (mean, sum, max, min, etc.) to aggregate values for the duplicate entries, as shown in the previous example.

Frequently Asked Questions on Pandas DataFrame pivot() Method

What is the purpose of the pivot() method in pandas?

The pivot() method in pandas is used to reshape a DataFrame by transforming or reorganizing its data based on specific criteria.

What happens if there are duplicate entries in the data when using pivot()?

If there are duplicate entries for the specified combination of index and columns, the pivot() method will raise a ValueError.

What does the pivot() method return?

The pivot() method returns a new DataFrame that has been reshaped according to the specified index, columns, and values.

Can I pivot a DataFrame using multiple columns for values?

You can pivot a DataFrame using multiple columns for values by passing a list of column names to the values parameter. This will create a multi-indexed DataFrame.

What should I do if I encounter missing values (NaN) in the pivoted DataFrame?

Missing values (NaN) in the pivoted DataFrame indicate that the specified combination of index and columns did not have corresponding data in the original DataFrame. You can handle missing values by filling them with a specific value or using methods like fillna() to replace NaN values.

Conclusion

In this article, I have explained the Pandas DataFrame pivot() method, including its syntax, parameters, and usage. I have also explained how to reshape a DataFrame based on a specified index, columns, and values to create a new DataFrame.

Happy Learning!!

Reference