In Polars, the melt()
method is used to reshape a DataFrame from a wide format to a long format. It unpivots specific columns into rows, while keeping other columns as identifier variables. This is particularly useful when you need to convert columns into rows while preserving certain identifier columns.
In this article, I will explain the Polars DataFrame melt()
method, including its syntax, parameters, and how it returns a reshaped DataFrame in a long format. The resulting DataFrame retains the identifier columns specified in id_vars
, and adds two new columns: one for the names of the original columns (specified in value_vars
) and another for the corresponding values.
Key Points –
- The
.melt()
method transforms a DataFrame from wide format to long format, unpivoting columns into rows. - The
id_vars
parameter specifies the columns that will remain as unique identifiers and won’t be unpivoted. - The
value_vars
parameter defines the columns to be melted into key-value pairs. - The
variable_name
parameter allows you to rename the column that holds the original column names (variables). - The
value_name
parameter lets you rename the column that stores the values corresponding to the melted variables. - Both
id_vars
andvalue_vars
can accept a single column name, a list of columns, or column selectors. - If
value_vars
is not specified, all columns not listed inid_vars
are melted by default. melt()
is commonly used to prepare data for analysis or visualization, especially when working with libraries that require data in long format.
Polars DataFrame melt() Introduction
Let’s know the syntax of the Polars DataFrame melt() method.
# Syntax of melt()
DataFrame.melt(
id_vars: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None = None,
value_vars: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None = None,
variable_name: str | None = None,
value_name: str | None = None,
) → DataFrame
Parameters of the Polars DataFrame.melt()
Following are the parameters of the polars DataFrame.melt() method.
id_vars
– The column(s) to keep as identifiers (do not melt). IfNone
, all columns not invalue_vars
are treated asid_vars
.value_vars
– The column(s) to unpivot (melt). IfNone
, all columns exceptid_vars
will be melted.variable_name
– The name of the new column holding the column names fromvalue_vars
(the melted variables). IfNone
, defaults to"variable"
.value_name
– The name of the new column holding the values fromvalue_vars
. IfNone
, defaults to"value"
.
Return Value
It returns the reshaped DataFrame in long format.
Usage of Polars DataFrame.melt() Method
The melt()
method reshapes a DataFrame from wide format to long format. It unpivots selected columns into rows while retaining specified columns as identifiers.
To run some examples of the Polars DataFrame.sort() method, let’s create a Polars DataFrame.
import polars as pl
# Creating a new Polars DataFrame
technologies= {
'Courses':["Spark","Hadoop","Python","Pandas"],
'Fees' :[22000,25000,20000,26000],
'Duration':['30days','50days','40days','40days'],
'Discount':[1000,1500,1200,2500]
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
Melt with Single Column
To melt the Polars DataFrame with a single column, we will use the melt()
method, specifying the columns you want to keep as identifiers (using id_vars
) and the columns you want to melt (using value_vars
).
# Melt DataFrame with 'Courses' as the identifier column
df2 = df.melt(id_vars=["Courses"], value_vars=["Fees", "Duration", "Discount"])
print("Melted DataFrame with a Single Column:\n", df2)
In the above example, the Courses
column is retained as an identifier, while the Fees
, Duration
, and Discount
columns are melted into two columns: variable
(representing the original column names) and value
(holding the corresponding values). This example yields the below output.
# Output:
# Melted DataFrame with a Single Column
# shape: (12, 3)
┌─────────┬───────────┬────────┐
│ Courses │ variable │ value │
├─────────┼───────────┼────────┤
│ Spark │ Fees │ 22000 │
│ Spark │ Duration │ 30days │
│ Spark │ Discount │ 1000 │
│ Hadoop │ Fees │ 25000 │
│ Hadoop │ Duration │ 50days │
│ Hadoop │ Discount │ 1500 │
│ Python │ Fees │ 20000 │
│ Python │ Duration │ 40days │
│ Python │ Discount │ 1200 │
│ Pandas │ Fees │ 26000 │
│ Pandas │ Duration │ 40days │
│ Pandas │ Discount │ 2500 │
└─────────┴───────────┴────────┘
Melt with Multiple Columns
Alternatively, to melt the Polars DataFrame with multiple columns, you can specify multiple columns in the id_vars
parameter to retain them as identifier columns, while the other columns are melted into two columns: one for the variable names and one for the corresponding values.
# Melt DataFrame with 'Courses' and 'Duration' as identifier columns
df2 = df.melt(id_vars=["Courses", "Duration"], value_vars=["Fees", "Discount"])
print("Melted DataFrame with Multiple Columns:\n", df2)
# Output:
# Melted DataFrame with Multiple Columns:
# shape: (8, 4)
┌─────────┬──────────┬──────────┬───────┐
│ Courses ┆ Duration ┆ variable ┆ value │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ i64 │
╞═════════╪══════════╪══════════╪═══════╡
│ Spark ┆ 30days ┆ Fees ┆ 22000 │
│ Hadoop ┆ 50days ┆ Fees ┆ 25000 │
│ Python ┆ 40days ┆ Fees ┆ 20000 │
│ Pandas ┆ 40days ┆ Fees ┆ 26000 │
│ Spark ┆ 30days ┆ Discount ┆ 1000 │
│ Hadoop ┆ 50days ┆ Discount ┆ 1500 │
│ Python ┆ 40days ┆ Discount ┆ 1200 │
│ Pandas ┆ 40days ┆ Discount ┆ 2500 │
└─────────┴──────────┴──────────┴───────┘
Here,
- The
Courses
andDuration
columns are kept as identifier columns (specified inid_vars
). - The
Fees
andDiscount
columns are melted into thevariable
andvalue
columns. Thevariable
column holds the original column names, and thevalue
column holds the corresponding values.
Melt with Custom variable_name and value_name
To customize the column names generated by the melt()
method, you can use the variable_name
and value_name
parameters. These allow you to rename the columns that hold the original column names and their corresponding values.
# Melt DataFrame with custom variable_name and value_name
df2 = df.melt(
id_vars=["Courses"],
value_vars=["Fees", "Discount"],
variable_name="Attribute",
value_name="Details"
)
print("Melted DataFrame with Custom variable_name and value_name:\n", df2)
# Output:
# Melted DataFrame with Custom variable_name and value_name:
# shape: (8, 3)
┌─────────┬───────────┬─────────┐
│ Courses ┆ Attribute ┆ Details │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════════╪═══════════╪═════════╡
│ Spark ┆ Fees ┆ 22000 │
│ Hadoop ┆ Fees ┆ 25000 │
│ Python ┆ Fees ┆ 20000 │
│ Pandas ┆ Fees ┆ 26000 │
│ Spark ┆ Discount ┆ 1000 │
│ Hadoop ┆ Discount ┆ 1500 │
│ Python ┆ Discount ┆ 1200 │
│ Pandas ┆ Discount ┆ 2500 │
└─────────┴───────────┴─────────┘
Here,
id_vars=["Courses"]
: Keeps theCourses
column as an identifier.value_vars=["Fees", "Discount"]
: Specifies the columns to melt.variable_name="Attribute"
: Renames the defaultvariable
column toAttribute
.value_name="Details"
: Renames the defaultvalue
column toDetails
.
Melt Without Specifying value_vars
When you use the melt()
method without specifying the value_vars
parameter, Polars automatically considers all columns except those specified in id_vars
to be melted into rows.
# Melt DataFrame without specifying value_vars
df2 = df.melt(id_vars=["Courses"])
print("Melted DataFrame Without Specifying value_vars:\n", df2)
# Output:
# Melted DataFrame Without Specifying value_vars:
# shape: (12, 3)
┌─────────┬───────────┬────────┐
│ Courses │ variable │ value │
├─────────┼───────────┼────────┤
│ Spark │ Fees │ 22000 │
│ Spark │ Duration │ 30days │
│ Spark │ Discount │ 1000 │
│ Hadoop │ Fees │ 25000 │
│ Hadoop │ Duration │ 50days │
│ Hadoop │ Discount │ 1500 │
│ Python │ Fees │ 20000 │
│ Python │ Duration │ 40days │
│ Python │ Discount │ 1200 │
│ Pandas │ Fees │ 26000 │
│ Pandas │ Duration │ 40days │
│ Pandas │ Discount │ 2500 │
└─────────┴───────────┴────────┘
Here,
id_vars=["Courses"]
: Specifies that the Courses column remains as an identifier column.- No
value_vars
: By default, all remaining columns (Fees, Duration, Discount) are treated as the columns to melt.
Conclusion
In conclusion, the melt()
method in Polars is a powerful tool for reshaping DataFrames from a wide format to a long format. By specifying id_vars
, value_vars
, and optional parameter names like variable_name
and value_name
, you can customize the resulting structure to suit your analysis needs.
Happy Learning!!