• Post author:
  • Post category:Polars
  • Post last modified:December 23, 2024
  • Reading time:11 mins read

In Polars, the melt() method is used to reshape a DataFrame from a wide format to a long format. It unpivots specific columns into rows, while keeping other columns as identifier variables. This is particularly useful when you need to convert columns into rows while preserving certain identifier columns.

Advertisements

In this article, I will explain the Polars DataFrame melt() method, including its syntax, parameters, and how it returns a reshaped DataFrame in a long format. The resulting DataFrame retains the identifier columns specified in id_vars, and adds two new columns: one for the names of the original columns (specified in value_vars) and another for the corresponding values.

Key Points –

  • The .melt() method transforms a DataFrame from wide format to long format, unpivoting columns into rows.
  • The id_vars parameter specifies the columns that will remain as unique identifiers and won’t be unpivoted.
  • The value_vars parameter defines the columns to be melted into key-value pairs.
  • The variable_name parameter allows you to rename the column that holds the original column names (variables).
  • The value_name parameter lets you rename the column that stores the values corresponding to the melted variables.
  • Both id_vars and value_vars can accept a single column name, a list of columns, or column selectors.
  • If value_vars is not specified, all columns not listed in id_vars are melted by default.
  • melt() is commonly used to prepare data for analysis or visualization, especially when working with libraries that require data in long format.

Polars DataFrame melt() Introduction

Let’s know the syntax of the Polars DataFrame melt() method.


# Syntax of melt()
DataFrame.melt(
    id_vars: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None = None,
    value_vars: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None = None,
    variable_name: str | None = None,
    value_name: str | None = None,
) → DataFrame

Parameters of the Polars DataFrame.melt()

Following are the parameters of the polars DataFrame.melt() method.

  • id_vars – The column(s) to keep as identifiers (do not melt). If None, all columns not in value_vars are treated as id_vars.
  • value_vars – The column(s) to unpivot (melt). If None, all columns except id_vars will be melted.
  • variable_name – The name of the new column holding the column names from value_vars (the melted variables). If None, defaults to "variable".
  • value_name – The name of the new column holding the values from value_vars. If None, defaults to "value".

Return Value

It returns the reshaped DataFrame in long format.

Usage of Polars DataFrame.melt() Method

The melt() method reshapes a DataFrame from wide format to long format. It unpivots selected columns into rows while retaining specified columns as identifiers.

To run some examples of the Polars DataFrame.sort() method, let’s create a Polars DataFrame.


import polars as pl

# Creating a new Polars DataFrame
technologies= {
    'Courses':["Spark","Hadoop","Python","Pandas"],
    'Fees' :[22000,25000,20000,26000],
    'Duration':['30days','50days','40days','40days'],
    'Discount':[1000,1500,1200,2500]
}

df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

polars melt

Melt with Single Column

To melt the Polars DataFrame with a single column, we will use the melt() method, specifying the columns you want to keep as identifiers (using id_vars) and the columns you want to melt (using value_vars).


# Melt DataFrame with 'Courses' as the identifier column
df2 = df.melt(id_vars=["Courses"], value_vars=["Fees", "Duration", "Discount"])
print("Melted DataFrame with a Single Column:\n", df2)

In the above example, the Courses column is retained as an identifier, while the Fees, Duration, and Discount columns are melted into two columns: variable (representing the original column names) and value (holding the corresponding values). This example yields the below output.


# Output:
# Melted DataFrame with a Single Column
# shape: (12, 3)
┌─────────┬───────────┬────────┐
│ Courses │ variable  │ value  │
├─────────┼───────────┼────────┤
│ Spark   │ Fees      │ 22000  │
│ Spark   │ Duration  │ 30days │
│ Spark   │ Discount  │ 1000   │
│ Hadoop  │ Fees      │ 25000  │
│ Hadoop  │ Duration  │ 50days │
│ Hadoop  │ Discount  │ 1500   │
│ Python  │ Fees      │ 20000  │
│ Python  │ Duration  │ 40days │
│ Python  │ Discount  │ 1200   │
│ Pandas  │ Fees      │ 26000  │
│ Pandas  │ Duration  │ 40days │
│ Pandas  │ Discount  │ 2500   │
└─────────┴───────────┴────────┘

Melt with Multiple Columns

Alternatively, to melt the Polars DataFrame with multiple columns, you can specify multiple columns in the id_vars parameter to retain them as identifier columns, while the other columns are melted into two columns: one for the variable names and one for the corresponding values.


# Melt DataFrame with 'Courses' and 'Duration' as identifier columns
df2 = df.melt(id_vars=["Courses", "Duration"], value_vars=["Fees", "Discount"])
print("Melted DataFrame with Multiple Columns:\n", df2)

# Output:
# Melted DataFrame with Multiple Columns:
# shape: (8, 4)
┌─────────┬──────────┬──────────┬───────┐
│ Courses ┆ Duration ┆ variable ┆ value │
│ ---     ┆ ---      ┆ ---      ┆ ---   │
│ str     ┆ str      ┆ str      ┆ i64   │
╞═════════╪══════════╪══════════╪═══════╡
│ Spark   ┆ 30days   ┆ Fees     ┆ 22000 │
│ Hadoop  ┆ 50days   ┆ Fees     ┆ 25000 │
│ Python  ┆ 40days   ┆ Fees     ┆ 20000 │
│ Pandas  ┆ 40days   ┆ Fees     ┆ 26000 │
│ Spark   ┆ 30days   ┆ Discount ┆ 1000  │
│ Hadoop  ┆ 50days   ┆ Discount ┆ 1500  │
│ Python  ┆ 40days   ┆ Discount ┆ 1200  │
│ Pandas  ┆ 40days   ┆ Discount ┆ 2500  │
└─────────┴──────────┴──────────┴───────┘

Here,

  • The Courses and Duration columns are kept as identifier columns (specified in id_vars).
  • The Fees and Discount columns are melted into the variable and value columns. The variable column holds the original column names, and the value column holds the corresponding values.

Melt with Custom variable_name and value_name

To customize the column names generated by the melt() method, you can use the variable_name and value_name parameters. These allow you to rename the columns that hold the original column names and their corresponding values.


# Melt DataFrame with custom variable_name and value_name
df2 = df.melt(
    id_vars=["Courses"],
    value_vars=["Fees", "Discount"],
    variable_name="Attribute",
    value_name="Details"
)
print("Melted DataFrame with Custom variable_name and value_name:\n", df2)

# Output:
# Melted DataFrame with Custom variable_name and value_name:
# shape: (8, 3)
┌─────────┬───────────┬─────────┐
│ Courses ┆ Attribute ┆ Details │
│ ---     ┆ ---       ┆ ---     │
│ str     ┆ str       ┆ i64     │
╞═════════╪═══════════╪═════════╡
│ Spark   ┆ Fees      ┆ 22000   │
│ Hadoop  ┆ Fees      ┆ 25000   │
│ Python  ┆ Fees      ┆ 20000   │
│ Pandas  ┆ Fees      ┆ 26000   │
│ Spark   ┆ Discount  ┆ 1000    │
│ Hadoop  ┆ Discount  ┆ 1500    │
│ Python  ┆ Discount  ┆ 1200    │
│ Pandas  ┆ Discount  ┆ 2500    │
└─────────┴───────────┴─────────┘

Here,

  • id_vars=["Courses"]: Keeps the Courses column as an identifier.
  • value_vars=["Fees", "Discount"]: Specifies the columns to melt.
  • variable_name="Attribute": Renames the default variable column to Attribute.
  • value_name="Details": Renames the default value column to Details.

Melt Without Specifying value_vars

When you use the melt() method without specifying the value_vars parameter, Polars automatically considers all columns except those specified in id_vars to be melted into rows.


# Melt DataFrame without specifying value_vars
df2 = df.melt(id_vars=["Courses"])
print("Melted DataFrame Without Specifying value_vars:\n", df2)

# Output:
# Melted DataFrame Without Specifying value_vars:
# shape: (12, 3)
┌─────────┬───────────┬────────┐
│ Courses │ variable  │ value  │
├─────────┼───────────┼────────┤
│ Spark   │ Fees      │ 22000  │
│ Spark   │ Duration  │ 30days │
│ Spark   │ Discount  │ 1000   │
│ Hadoop  │ Fees      │ 25000  │
│ Hadoop  │ Duration  │ 50days │
│ Hadoop  │ Discount  │ 1500   │
│ Python  │ Fees      │ 20000  │
│ Python  │ Duration  │ 40days │
│ Python  │ Discount  │ 1200   │
│ Pandas  │ Fees      │ 26000  │
│ Pandas  │ Duration  │ 40days │
│ Pandas  │ Discount  │ 2500   │
└─────────┴───────────┴────────┘

Here,

  • id_vars=["Courses"]: Specifies that the Courses column remains as an identifier column.
  • No value_vars: By default, all remaining columns (Fees, Duration, Discount) are treated as the columns to melt.

Conclusion

In conclusion, the melt() method in Polars is a powerful tool for reshaping DataFrames from a wide format to a long format. By specifying id_vars, value_vars, and optional parameter names like variable_name and value_name, you can customize the resulting structure to suit your analysis needs.

Happy Learning!!

References