• Post author:
  • Post category:Pandas
  • Post last modified:May 21, 2024
  • Reading time:12 mins read
You are currently viewing Pandas melt() DataFrame Example

In pandas, the melt() function is used to transform or reshape a DataFrame into a different format. It unpivots a DataFrame from a wide format to a long format, optionally specifying identifier variables (id_vars) and variable names (var_name) for the melted variables.

Advertisements

This function parameter reduces the number of columns and increases the number of rows. This transformation results in a long-format DataFrame where each row represents a unique combination of variables, allowing for easier analysis and interpretation of the data. It’s used to create a specific format of the DataFrame object where one or more columns work as identifiers. In this article, I will explain the melt() function, its syntax, parameters, and usage of how we can change the DataFrame format from wide to long.

Quick Examples of DataFrame melt()

If you are in a hurry below are some quick examples of how to use the pandas DataFrame melt() method.


# Quick examples of dataframe melt()

# Example 1: Use pandas.melt() function
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'])

# Example 2:  Using id_vars & value_vars 
# To melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee', 'Discount'])

# Example 3:  Using var_name & value_name 
# To melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'],
              var_name ='Courses Fees', value_name ='Courses Fee')

# Example 4:  Using ignore_index
df2 = pd.melt(df, id_vars=['Courses'], value_vars=['Fee', 'Duration'], ignore_index=False)

# Example 5: Use multi-index columns
df.columns = [list('ABCD'), list('EFGH')]

Syntax of the melt()

Following is the syntax of the melt() function.


# Syntax of melt()
pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)

Parameters of the melt() Function

Following are the parameters of the melt() function.

  • frame – DataFrame
  • id_vars – (tuple, list, or ndarray, optional) Using this param we can set single or multiple columns that will use as identifiers of the new format.
  • value_vars – (tuple, list, or ndarray, optional) Using this param we can set the columns that are used to unpivot. If we do not provide this parameter Pandas will use all remaining columns a value_vars except for those specified in id_vars.
  • var_name– It defines the column name for the variable column.
  • value_name–  [scalar, default ‘value’]: It defines the column name for the value column.
  • col_level – [int or str, optional]: If we have multi-index columns then, use this level to melt.
  • ignore_index: Accepts a boolean, if True, the original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary.

Return Value of melt()

It returns reshaped DataFrame object. This function doesn’t change the original DataFrame.

To run some examples of pandas DataFrame melt() method, let’s create Pandas DataFrame using data from a dictionary.


# Create DataFrame
import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee': [22000,25000,30000,35000],
    'Duration':['30days','50days','40days','35days'],
    'Discount':[1000,2000,2500,1500]
              })
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2000
2   Hadoop  30000   40days      2500
3   Pandas  35000   35days      1500

Pandas melt() Usage with Example

Pandas melt() function is used to change the shape of the given DataFrame (wide to long format). It’s particularly useful when you have data where variables are spread across different columns and you want to reorganize it for analysis or visualization purposes. Here, 'variable' and 'value‘ are the default values of var_name and value_name parameters respectively.


# Use pandas.melt() function
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'])
print(df2)

Yields below output.


# Output:
   Courses variable  value
0    Spark      Fee  22000
1  PySpark      Fee  25000
2   Hadoop      Fee  30000
3   Pandas      Fee  35000

Using id_vars & value_vars to melt() of a Pandas DataFrame

Alternatively, Pandas melt() function changes the shape of the given DataFrame, for that we need to pass id_vars and value_vars, it will return the long format reshaped DataFrame object. Here, 'variable' and 'value‘ are the default values of var_name and value_name parameters respectively.


# Using id_vars & value_vars to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee', 'Discount'])
print(df2)

Yields below output.


# Output:
   Courses  variable  value
0    Spark       Fee  22000
1  PySpark       Fee  25000
2   Hadoop       Fee  30000
3   Pandas       Fee  35000
4    Spark  Discount   1000
5  PySpark  Discount   2000
6   Hadoop  Discount   2500
7   Pandas  Discount   1500

Using var_name & value_name to melt() of a Pandas DataFrame

Using melt() function you can also customize var_name & value_name, for that we need to specify this paramerters and pass into melt() function, it will return the customized var_name and value_name of reshaped DataFrame object.


# Using var_name & value_name to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'],
              var_name ='Courses Fees', value_name ='Courses Fee')
print(df2)

Yields below output.


# Output:
   Courses Courses Fees  Courses Fee
0    Spark          Fee        22000
1  PySpark          Fee        25000
2   Hadoop          Fee        30000
3   Pandas          Fee        35000

Use ignore_index

We can set the False to ignore_index and pass it into the melt() function, it will return the reshaped DataFrame with the original index. Default value for ignore_index is True.


# Using ignore_index
df2 = pd.melt(df, id_vars=['Courses'], value_vars=['Fee', 'Duration'], ignore_index=False)
print(df2)

Yields below output.


# Output:
   Courses  variable   value
0    Spark       Fee   22000
1  PySpark       Fee   25000
2   Hadoop       Fee   30000
3   Pandas       Fee   35000
0    Spark  Duration  30days
1  PySpark  Duration  50days
2   Hadoop  Duration  40days
3   Pandas  Duration  35days

Use Multi-Index Columns

Using this syntax df.columns=[list('ABCD'),list('EFGH')] we can get the multi-level index columns.


# Use multi-index columns
df.columns = [list('ABCD'), list('EFGH')]
print(df)

Yields below output.


# Output:
         A      B       C     D
         E      F       G     H
0    Spark  22000  30days  1000
1  PySpark  25000  50days  2000
2   Hadoop  30000  40days  2500
3   Pandas  35000  35days  1500

Complete Example of Pandas DataFrame melt()


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee': [22000,25000,30000,35000],
    'Duration':['30days','50days','40days','35days'],
    'Discount':[1000,2000,2500,1500]
              })
df = pd.DataFrame(technologies)
print(df)

# Use pandas.melt() function
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'])
print(df2)

# Using id_vars & value_vars to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee', 'Discount'])
print(df2)

# Using var_name & value_name to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'],
              var_name ='Courses Fees', value_name ='Courses Fee')
print(df2)

# Using ignore_index
df2 = pd.melt(df, id_vars=['Courses'], value_vars=['Fee', 'Duration'], ignore_index=False)
print(df2)

# Use multi-index columns
df.columns = [list('ABCD'), list('EFGH')]
print(df)

Conclusion

In this article, I have explained the Pandas DataFrame melt() function and using its syntax and parameters how we can change the DataFrame format from wide to long with examples.

Happy Learning !!

Related Articles

References