In pandas, the melt()
function is used to transform or reshape a DataFrame into a different format. It unpivots a DataFrame from a wide format to a long format, optionally specifying identifier variables (id_vars) and variable names (var_name) for the melted variables.
This function parameter reduces the number of columns and increases the number of rows. This transformation results in a long-format DataFrame where each row represents a unique combination of variables, allowing for easier analysis and interpretation of the data. It’s used to create a specific format of the DataFrame object where one or more columns work as identifiers. In this article, I will explain the melt()
function, its syntax, parameters, and usage of how we can change the DataFrame format from wide to long.
Quick Examples of DataFrame melt()
If you are in a hurry below are some quick examples of how to use the pandas DataFrame melt() method.
# Quick examples of dataframe melt()
# Example 1: Use pandas.melt() function
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'])
# Example 2: Using id_vars & value_vars
# To melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee', 'Discount'])
# Example 3: Using var_name & value_name
# To melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'],
var_name ='Courses Fees', value_name ='Courses Fee')
# Example 4: Using ignore_index
df2 = pd.melt(df, id_vars=['Courses'], value_vars=['Fee', 'Duration'], ignore_index=False)
# Example 5: Use multi-index columns
df.columns = [list('ABCD'), list('EFGH')]
Syntax of the melt()
Following is the syntax of the melt()
function.
# Syntax of melt()
pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)
Parameters of the melt() Function
Following are the parameters of the melt() function.
frame
– DataFrameid_vars
– (tuple, list, or ndarray, optional) Using this param we can set single or multiple columns that will use as identifiers of the new format.value_vars
– (tuple, list, or ndarray, optional) Using this param we can set the columns that are used to unpivot. If we do not provide this parameter Pandas will use all remaining columns avalue_vars
except for those specified inid_vars
.var_name
– It defines the column name for the variable column.value_name
– [scalar, default ‘value’]: It defines the column name for the value column.col_level
– [int or str, optional]: If we have multi-index columns then, use this level to melt.ignore_index
: Accepts a boolean, if True, the original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary.
Return Value of melt()
It returns reshaped DataFrame object. This function doesn’t change the original DataFrame.
To run some examples of pandas DataFrame melt()
method, let’s create Pandas DataFrame using data from a dictionary.
# Create DataFrame
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fee': [22000,25000,30000,35000],
'Duration':['30days','50days','40days','35days'],
'Discount':[1000,2000,2500,1500]
})
df = pd.DataFrame(technologies)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1000
1 PySpark 25000 50days 2000
2 Hadoop 30000 40days 2500
3 Pandas 35000 35days 1500
Pandas melt() Usage with Example
Pandas melt()
function is used to change the shape of the given DataFrame (wide to long format). It’s particularly useful when you have data where variables are spread across different columns and you want to reorganize it for analysis or visualization purposes. Here, 'variable'
and 'value‘
are the default values of var_name
and value_name
parameters respectively.
# Use pandas.melt() function
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'])
print(df2)
Yields below output.
# Output:
Courses variable value
0 Spark Fee 22000
1 PySpark Fee 25000
2 Hadoop Fee 30000
3 Pandas Fee 35000
Using id_vars & value_vars to melt() of a Pandas DataFrame
Alternatively, Pandas melt()
function changes the shape of the given DataFrame, for that we need to pass id_vars
and value_vars
, it will return the long format reshaped DataFrame object. Here, 'variable'
and 'value‘
are the default values of var_name
and value_name
parameters respectively.
# Using id_vars & value_vars to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee', 'Discount'])
print(df2)
Yields below output.
# Output:
Courses variable value
0 Spark Fee 22000
1 PySpark Fee 25000
2 Hadoop Fee 30000
3 Pandas Fee 35000
4 Spark Discount 1000
5 PySpark Discount 2000
6 Hadoop Discount 2500
7 Pandas Discount 1500
Using var_name & value_name to melt() of a Pandas DataFrame
Using melt()
function you can also customize var_name
& value_name
, for that we need to specify this paramerters and pass into melt()
function, it will return the customized var_name
and value_name
of reshaped DataFrame object.
# Using var_name & value_name to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'],
var_name ='Courses Fees', value_name ='Courses Fee')
print(df2)
Yields below output.
# Output:
Courses Courses Fees Courses Fee
0 Spark Fee 22000
1 PySpark Fee 25000
2 Hadoop Fee 30000
3 Pandas Fee 35000
Use ignore_index
We can set the False
to ignore_index
and pass it into the melt()
function, it will return the reshaped DataFrame with the original index. Default value for ignore_index
is True
.
# Using ignore_index
df2 = pd.melt(df, id_vars=['Courses'], value_vars=['Fee', 'Duration'], ignore_index=False)
print(df2)
Yields below output.
# Output:
Courses variable value
0 Spark Fee 22000
1 PySpark Fee 25000
2 Hadoop Fee 30000
3 Pandas Fee 35000
0 Spark Duration 30days
1 PySpark Duration 50days
2 Hadoop Duration 40days
3 Pandas Duration 35days
Use Multi-Index Columns
Using this syntax df.columns=[list('ABCD'),list('EFGH')]
we can get the multi-level index columns.
# Use multi-index columns
df.columns = [list('ABCD'), list('EFGH')]
print(df)
Yields below output.
# Output:
A B C D
E F G H
0 Spark 22000 30days 1000
1 PySpark 25000 50days 2000
2 Hadoop 30000 40days 2500
3 Pandas 35000 35days 1500
Complete Example of Pandas DataFrame melt()
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fee': [22000,25000,30000,35000],
'Duration':['30days','50days','40days','35days'],
'Discount':[1000,2000,2500,1500]
})
df = pd.DataFrame(technologies)
print(df)
# Use pandas.melt() function
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'])
print(df2)
# Using id_vars & value_vars to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee', 'Discount'])
print(df2)
# Using var_name & value_name to melt() of a pandas dataframe
df2 = pd.melt(df, id_vars =['Courses'], value_vars =['Fee'],
var_name ='Courses Fees', value_name ='Courses Fee')
print(df2)
# Using ignore_index
df2 = pd.melt(df, id_vars=['Courses'], value_vars=['Fee', 'Duration'], ignore_index=False)
print(df2)
# Use multi-index columns
df.columns = [list('ABCD'), list('EFGH')]
print(df)
Conclusion
In this article, I have explained the Pandas DataFrame melt()
function and using its syntax and parameters how we can change the DataFrame format from wide to long with examples.
Happy Learning !!
Related Articles
- Pandas Add Multiple Columns to DataFrame
- Pandas Drop First Column From DataFrame
- Pandas Drop Last Column From DataFrame
- How to Count Duplicates in Pandas DataFrame
- Pandas Convert String to Integer
- Pandas DataFrame reindex() Function
- Pandas DataFrame count() Function
- Convert Pandas Column to Lowercase
- How to Convert Pandas Uppercase Column
- Pandas Filter DataFrame by Multiple Conditions