Pandas melt() DataFrame Example

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:December 9, 2022

Pandas melt() function is used to change the DataFrame format from wide to long. This function parameter reduces the columns and increases the rows of the given DataFrame, this will give a long format of the DataFrame. It’s used to create a specific format of the DataFrame object where one or more columns work as identifiers.

In this article, I will explain how to use pandas DataFrame melt() function and its syntax and parameters how we can change the DataFrame format from wide to long with examples.

1. Quick Examples of DataFrame melt() Function

If you are in a hurry below are some quick examples of how to use the pandas DataFrame melt() function.


# Example 1: Use pandas.melt() function
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'])

# Example 2:  Using id_vars & value_vars 
# to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee', 'Discount'])

# Example 3:  Using var_name & value_name 
# to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'],
              var_name ='Courses Fees', value_name ='Courses Fee')

# Example 4:  Using ignore_index
df2 = pd.melt(df, id_vars=['Courses'], value_vars=['Fee', 'Duration'], ignore_index=False)

# Example 5: Use multi-index columns
df.columns = [list('ABCD'), list('EFGH')]

2. Syntax of the melt()

Following is the syntax of the melt() function.


pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)

2.1 Parameters of the melt() Function

Following are the parameters of the melt() function.

  • frame – DataFrame
  • id_vars – (tuple, list, or ndarray, optional) Using this param we can set single or multiple columns that will use as identifiers of the new format.
  • value_vars – (tuple, list, or ndarray, optional) Using this param we can set the columns that are used to unpivot. If we do not provide this parameter Pandas will use all remaining columns a value_vars except for those specified in id_vars.
  • var_name– It defines the column name for the variable column.
  • value_name–  [scalar, default ‘value’]: It defines the column name for the value column.
  • col_level – [int or str, optional]: If we have multi-index columns then, use this level to melt.
  • ignore_index: Accepts a boolean, if True, the original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary.

2.2 Return Value of melt()

It returns reshaped DataFrame object. This function doesn’t change the original DataFrame.

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are CoursesFeeDuration and Discount.


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee': [22000,25000,30000,35000],
    'Duration':['30days','50days','40days','35days'],
    'Discount':[1000,2000,2500,1500]
              })
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output
   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2000
2   Hadoop  30000   40days      2500
3   Pandas  35000   35days      1500

3. Pandas melt() Usage with Example

Pandas melt() function is used to change the shape of the given DataFrame (wide to long format). This process is nothing but one or more columns are used as identifiers and the remaining columns are used as values. Here, 'variable' and 'value‘ are the default values of var_name and value_name parameters respectively.


# Use pandas.melt() function
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'])
print(df2)

Yields below output.


# Output
   Courses variable  value
0    Spark      Fee  22000
1  PySpark      Fee  25000
2   Hadoop      Fee  30000
3   Pandas      Fee  35000

4. Using id_vars & value_vars to melt() of a Pandas DataFrame

Alternatively, Pandas melt() function changes the shape of the given DataFrame, for that we need to pass id_vars and value_vars, it will return the long format reshaped DataFrame object. Here, 'variable' and 'value‘ are the default values of var_name and value_name parameters respectively.


# Using id_vars & value_vars to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee', 'Discount'])
print(df2)

Yields below output.


# Output
   Courses  variable  value
0    Spark       Fee  22000
1  PySpark       Fee  25000
2   Hadoop       Fee  30000
3   Pandas       Fee  35000
4    Spark  Discount   1000
5  PySpark  Discount   2000
6   Hadoop  Discount   2500
7   Pandas  Discount   1500

5. Using var_name & value_name to melt() of a Pandas DataFrame

Using melt() function you can also customize var_name & value_name, for that we need to specified this paramerters and pass into melt() function, it will return the customized var_name and value_name of reshaped DataFrame object.


# Using var_name & value_name to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'],
              var_name ='Courses Fees', value_name ='Courses Fee')
print(df2)

Yields below output.


# Output
   Courses Courses Fees  Courses Fee
0    Spark          Fee        22000
1  PySpark          Fee        25000
2   Hadoop          Fee        30000
3   Pandas          Fee        35000

6. Using ignore_index

We can set the False to ignore_index and pass it into the melt() function, it will return the reshaped DataFrame with the original index. Default value for ignore_index is True.


# Using ignore_index
df2 = pd.melt(df, id_vars=['Courses'], value_vars=['Fee', 'Duration'], ignore_index=False)
print(df2)

Yields below output.


# Output
   Courses  variable   value
0    Spark       Fee   22000
1  PySpark       Fee   25000
2   Hadoop       Fee   30000
3   Pandas       Fee   35000
0    Spark  Duration  30days
1  PySpark  Duration  50days
2   Hadoop  Duration  40days
3   Pandas  Duration  35days

7. Use Multi-Index Columns

Using this syntax df.columns=[list('ABCD'),list('EFGH')] we can get the multi-level index columns.


# Use multi-index columns
df.columns = [list('ABCD'), list('EFGH')]
print(df)

Yields below output.


# Output
         A      B       C     D
         E      F       G     H
0    Spark  22000  30days  1000
1  PySpark  25000  50days  2000
2   Hadoop  30000  40days  2500
3   Pandas  35000  35days  1500

8. Complete Example of Pandas DataFrame melt() Function


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee': [22000,25000,30000,35000],
    'Duration':['30days','50days','40days','35days'],
    'Discount':[1000,2000,2500,1500]
              })
df = pd.DataFrame(technologies)
print(df)

# Use pandas.melt() function
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'])
print(df2)

# Using id_vars & value_vars to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee', 'Discount'])
print(df2)

# Using var_name & value_name to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'],
              var_name ='Courses Fees', value_name ='Courses Fee')
print(df2)

# Using ignore_index
df2 = pd.melt(df, id_vars=['Courses'], value_vars=['Fee', 'Duration'], ignore_index=False)
print(df2)

# Use multi-index columns
df.columns = [list('ABCD'), list('EFGH')]
print(df)

9. Conclusion

In this article, I have explained how to use pandas DataFrame melt() function and using its syntax and parameters how we can change the DataFrame format from wide to long with examples.

Happy Learning !!

Related Articles

References

Leave a Reply