In Polars, you can use the select()
function to reorder columns in a specific order, allowing you to explicitly define the desired column sequence for your DataFrame. Alternatively, you can rearrange columns using df[column_order]
, where column_order
is a list of column names in the desired order. In this article, I will explain how to reorder columns in a specific order using Polars.
Key Points –
- Polars allows reordering columns using the
select()
method, where you explicitly define the desired order. - Column indexing (
df[:, [columns]]
) can also be used to manually rearrange columns in a specific order. - Dynamic reordering can be achieved by extracting column names using
df.columns
and arranging them programmatically. - Sorting column names alphabetically can be done using sorted(
df.columns
) to maintain consistency. - Reordering based on data types is possible by grouping columns according to their types using
df.schema
. - Using
with_columns()
can be helpful when adding new columns while maintaining a specific order. - Moving a specific column to the first position can be achieved by separating it from the rest and reconstructing the column order.
- Reordering does not modify the original DataFrame, but rather creates a new one with the specified column order.
Usage of Reorder Columns in a Specific Order
In Polars, you can reorder columns using the select()
function by specifying the column names in the desired order. Simply provide a list of column names in the preferred order, giving you full control over the DataFrame’s column arrangement.
To run some examples of reorder columns in a specific order using polars, let’s create a Polars DataFrame.
import polars as pl
technologies= {
'Courses':["Spark", "PySpark", "Hadoop", "Python"],
'Fees' :[22000, 25000, 23000, 24000],
'Discount':[1000, 2300, 1000, 1200],
'Duration':['35days', '60days', '30days', '45days']
}
df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
You can reorder columns in a specific order in polars using the select() method. This allows you to control the layout of your DataFrame, making it easier to work with.
# Reorder columns using select()
df2 = df.select(["Courses", "Duration", "Fees", "Discount"])
print("Reordered DataFrame:\n", df2)
Here,
- Define the new column order,
["Courses", "Duration", "Fees", "Discount"]
- Use the
select()
method to reorder the columns and print the DataFrame to view the updated arrangement.
Reordering Columns Dynamically Using select() Method
When working with Polars, you may not always know the column names in advance. Instead of manually specifying the order, you can dynamically rearrange them based on a desired logic. This is useful when dealing with large datasets or unknown column structures.
# Define dynamic column order
first_column = "Courses"
remaining_columns = [col for col in df.columns if col != first_column]
# Reorder using select()
df2 = df.select([first_column] + remaining_columns)
print("Reordered DataFrame:\n", df2)
# Output:
# Reordered DataFrame:
# shape: (4, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fees ┆ Discount ┆ Duration │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ str │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark ┆ 22000 ┆ 1000 ┆ 35days │
│ PySpark ┆ 25000 ┆ 2300 ┆ 60days │
│ Hadoop ┆ 23000 ┆ 1000 ┆ 30days │
│ Python ┆ 24000 ┆ 1200 ┆ 45days │
└─────────┴───────┴──────────┴──────────┘
Here,
df.columns
fetches all column names. Theremaining_columns
list is created by excluding the"Courses"
column using list comprehension.select([first_column] + remaining_columns)
ensures"Courses"
is first.- This approach works dynamically, even if the column names change.
Reordering Columns Using with_columns() Method
Unlike select()
, which explicitly defines the new column order, the with_columns() method in Polars does not reorder columns directly. Instead, it is used to add or modify columns. However, we can use it to reorder columns indirectly by reassigning them in a specific order.
# Reorder columns using with_columns()
df2 = df.with_columns([df["Courses"], df["Duration"], df["Discount"], df["Fees"]])
print("Reordered DataFrame:\n", df2)
# Output:
# # Reorder columns using with_columns()
df2 = df.with_columns([df["Courses"], df["Duration"], df["Fees"], df["Discount"]])
print("Reordered DataFrame:\n", df2)
# Output:
# Reordered DataFrame:
# shape: (4, 4)
┌─────────┬─────────┬───────┬─────────┐
│ Courses │ Duration│ Fees │ Discount│
│ --- │ --- │ --- │ --- │
│ str │ str │ i64 │ i64 │
├─────────┼─────────┼───────┼─────────┤
│ Spark │ 35days │ 22000 │ 1000 │
│ PySpark │ 60days │ 25000 │ 2300 │
│ Hadoop │ 30days │ 23000 │ 1000 │
│ Python │ 45days │ 24000 │ 1200 │
└─────────┴─────────┴───────┴─────────┘
Here,
with_columns()
is typically used to modify or add columns, but here, we reassign columns in the desired order.- Each column is explicitly mentioned, ensuring the correct order.
- The new DataFrame preserves the same data but with reordered columns.
Reordering Columns Using Column Indexing
You can reorder columns using column indexing by specifying the new order in Polars using indices. This method is useful when you don’t want to refer to column names explicitly.
# Reorder columns using column indexing
df2 = df[:, [0, 3, 1, 2]] # Selecting columns using index positions
print("Reordered DataFrame:\n", df2)
Here,
df[:, [0, 3, 1, 2]]
selects all rows, while[0, 3, 1, 2]
specifies the column order by index positions. Here, index0
corresponds to"Courses"
,3
to"Duration"
,1
to"Fees"
, and2
to"Discount"
.- The column order is modified without using column names.
You can also reorder columns by explicitly specifying the desired order in Polars using column names inside indexing (df[:, []]
). This method allows you to rearrange the DataFrame efficiently.
# Reorder columns
df2 = df[:, ['Courses', 'Duration', 'Fees', 'Discount']]
print("Reordered DataFrame:\n", df2)
# Output:
# Reordered DataFrame:
# shape: (4, 4)
┌─────────┬──────────┬───────┬──────────┐
│ Courses ┆ Duration ┆ Fees ┆ Discount │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞═════════╪══════════╪═══════╪══════════╡
│ Spark ┆ 35days ┆ 22000 ┆ 1000 │
│ PySpark ┆ 60days ┆ 25000 ┆ 2300 │
│ Hadoop ┆ 30days ┆ 23000 ┆ 1000 │
│ Python ┆ 45days ┆ 24000 ┆ 1200 │
└─────────┴──────────┴───────┴──────────┘
Sorting Columns Alphabetically
If you want to sort the column names alphabetically, you can achieve this dynamically using sorted(df.columns)
. This is useful when working with datasets where column names may not have a predefined order.
# Sort columns alphabetically
df2 = df.select(sorted(df.columns))
print("DataFrame with Sorted Columns:\n", df2)
# Output:
# DataFrame with Sorted Columns:
# shape: (4, 4)
┌─────────┬──────────┬──────────┬───────┐
│ Courses ┆ Discount ┆ Duration ┆ Fees │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪══════════╪══════════╪═══════╡
│ Spark ┆ 1000 ┆ 35days ┆ 22000 │
│ PySpark ┆ 2300 ┆ 60days ┆ 25000 │
│ Hadoop ┆ 1000 ┆ 30days ┆ 23000 │
│ Python ┆ 1200 ┆ 45days ┆ 24000 │
└─────────┴──────────┴──────────┴───────┘
Here,
df[:, ['Courses', 'Duration', 'Fees', 'Discount']]
selects and reorders the columns according to the specified sequence.df.select(sorted(df.columns))
rearranges the columns based on alphabetical order.- Works with any number of columns, making it useful for large datasets.
Conclusion
In conclusion, reordering columns in Polars is a straightforward yet powerful operation that helps in organizing data efficiently. Whether using select()
, column indexing, or dynamic sorting, Polars provides multiple ways to achieve this. Sorting columns alphabetically, reordering based on data types, or prioritizing key columns improves readability, data processing, and compatibility with external systems.
Happy Learning!!
Related Articles
- Convert Polars Cast Integer to Float
- Convert Polars Cast Float to Integer
- Polars Sum Multiple Columns
- How to Drop Row in Polars
- How to drop a column using Polars
- Add New Columns to Polars DataFrame
- How to Select Columns by Data Type in Polars
- Polars Filter DataFrame with Multilple Conditions
- How to Convert String to Date or Datetime in Polars