How to Create Pandas Pivot Multiple Columns

  • Post author:
  • Post category:Pandas
  • Post last modified:January 10, 2024
  • Reading time:14 mins read

We can create a Pandas pivot table with multiple columns and return reshaped DataFrame. By manipulating given index or column values we can reshape the data based on column values. Use the pandas.pivot_table to create a spreadsheet-style pivot table in pandas DataFrame. This function does not support data aggregation, multiple values will result in a Multi-Index in the columns. In this article, I will explain how to create a pivot table with multiple columns.

1. Quick Examples of Pandas Pivot Table with Multiple Columns

If you are in a hurry, below are some quick examples of how to create pandas pivot tables with multiple columns.


# Quick examples of pandas pivot table with multiple columns

# Example 1: Create a pivot table with a single index
p_table = pd.pivot_table(df, index = ['Gender'])

# Example 2: Create a pivot table with multiple columns
p_table = pd.pivot_table(df, index = ['Gender', 'Courses', 'Fee'])

2. Syntax of Pivot Table

Following is the syntax of the pivot_table().


# Syntax of pivot table
pivot_table(data, index=None, columns=None, values=None)

# Another Syntax
DataFrame.pivot(index=None, columns=None, values=None)

2.1 Parameters of Pivot Table

Below are the parameters of the pivot table.

  • data : The DataFrame to pivot.
  • values : Are the numeric data in a given DataFrame, that are to be aggregated.
  • index : Defines the rows of the pivot table
  • columns : Defines the columns of the pivot table

3. Create Pandas DataFrame

Python pandas is widely used for data science/data analysis and machine learning applications. It is built on top of another popular package named NumPy, which provides scientific computing in Python. pandas DataFrame is a 2-dimensional labeled data structure with rows and columns (columns of potentially different types like integers, strings, float, None, Python objects e.t.c). You can think of it as an excel spreadsheet or SQL table.

We can create DataFrame in many ways here, I will create Pandas DataFrame using Python Dictionary.


# Create DataFrame
import pandas as pd

df = pd.DataFrame({'Gender' : ['Female', 'Male', 'Male', 'Male', 'Female'],
                  'Courses': ['Java', 'Spark', 'PySpark', 'C', 'Pandas'],
                   'Fee': [15000, 17000, 27000, 29000, 12000],
                   'Discount': [1100, 800, 1000, 1600, 600]})
print("Create DataFrame:\n",df)

Yields below output.

Pandas pivot multiple columns

4. Create the Pivot Table with Multiple Columns

Using the Pandas pivot_table() function we can reshape the DataFrame on multiple columns in the form of an Excel pivot table. To group the data in a pivot table we will need to pass a DataFrame into this function and the multiple columns you wanted to group as an index.

Here, I will take a list of values as an index and pass it into pivot_table(), it will return these values as columns of the pivot table. For example,


# Create pivot table with multiple columns
p_table = pd.pivot_table(df, index = ['Gender', 'Courses', 'Fee'])
print("Create pivot table with multiple columns:\n",p_table)

Yields below output.

Pandas pivot multiple columns

5. Pivot Table with Multiple Aggregation Functions

Aggregate function aggfunc param takes np.mean() function as a default function for grouping the data while creating pivot table. The values in the pivot table are the result of the summarization that aggfunc applies to the feature data.

We can use different aggregate functions, as I mentioned above if it is set to the dictionary, where the keys are aggregated to columns and values are functions or lists of functions. For example,

Here, I have taken np.mean() and np.sum() as a aggfunc functions.


# Create pivot table using multiple aggfunc
p_table = pd.pivot_table(df, index=['Gender'], aggfunc= {'Fee': 'mean', 'Discount': 'sum'})
print(p_table)

Yields below output.


# Output:
         Discount           Fee
Gender                        
Female      1700  13500.000000
Male        3400  24333.333333

Frequently Asked Questions on Create Pandas Pivot Multiple Columns

How do I create a pivot table with multiple columns in Pandas?

To create a pivot table with multiple columns in Pandas, you can use the pivot_table function and specify multiple columns in the columns parameter.

Can I specify multiple aggregation functions for different columns in the pivot table?

You can specify multiple aggregation functions for different columns in a Pandas pivot table using the aggfunc parameter. The aggfunc parameter allows you to define a dictionary that maps column names to the aggregation functions you want to apply.

How can I handle missing values when creating a pivot table?

When creating a pivot table in Pandas, you can handle missing values using the fill_value parameter. The fill_value parameter allows you to specify a value that will be used to fill any missing (NaN) values in the resulting pivot table.

How can I reset the index of the pivot table?

You can reset the index of a Pandas pivot table using the reset_index method. This method is used to reset the index of the DataFrame, and it returns a new DataFrame with the index reset.

How can I filter the rows or columns in the pivot table?

You can filter rows or columns in a Pandas pivot table by using boolean indexing. Boolean indexing allows you to select rows or columns based on a specified condition.

Is it possible to rename the columns of the pivot table?

It is possible to rename the columns of a Pandas pivot table using the rename_axis method. This method allows you to rename the levels of the columns and index of the DataFrame.

Conclusion

In this article, I have explained how to create a Panda pivot table with multiple columns involves using the pivot_table function with the appropriate parameters. You can customize the pivot table by specifying the columns to be used as the index and columns, selecting the values to aggregate, and defining the aggregation functions with examples.

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply