• Post author:
  • Post category:Polars
  • Post last modified:March 11, 2025
  • Reading time:12 mins read
You are currently viewing Append or Concatenate Two DataFrames in Polars

In polars, you can use the pl.concat() function to merge or concatenate two or more DataFrames along either rows or columns. When combining DataFrames along rows, concat() creates a new DataFrame that includes all rows from the input DataFrames, effectively appending one to another. This function enables you to stack rows vertically or add columns horizontally.

Advertisements

In this article, I will explain the concat() function in Polars DataFrame, covering its syntax, parameters, and usage to concatenate two DataFrames either by rows or columns.

Key Points –

  • The primary function for concatenating two or more DataFrames in Polars is pl.concat().
  • Use pl.concat() to concatenate multiple DataFrames or Series in Polars.
  • By default, pl.concat() performs row-wise concatenation (how="vertical").
  • Use how="horizontal" to merge DataFrames side by side.
  • Use df.vstack(df2) as an alternative for row-wise concatenation.
  • When concatenating row-wise, the DataFrames should have the same column names and data types to avoid mismatches.
  • You can pass a list of DataFrames to pl.concat() instead of calling it multiple times.

Polars DataFrame concat() Introduction

Let’s know the syntax of the polars DataFrame.concat() function.


# Syntax of polars concat()
polars.concat(
    items: Iterable[PolarsType],
    *,
    how: ConcatMethod = 'vertical',
    rechunk: bool = False,
    parallel: bool = True
) → PolarsType

Parameters of the Polars DataFrame.concat()

Following are the parameters of the concat() method.

  • items (Iterable[PolarsType]) – A list of DataFrames, Series, or Expressions to concatenate.
  • how (ConcatMethod, default=’vertical’) – Determines how to concatenate.
    • 'vertical' – Appends rows (default).
    • 'horizontal' – Merges columns side by side.
    • 'diagonal' – Fills missing values diagonally when combining.
  • rechunk (bool, default=False) – If True, re-chunks the result for better performance.
  • parallel – (bool, default=True) – If True, performs parallel processing.

Return Value

This function returns a new Polars DataFrame or Series based on the input objects.

Usage of Concatenate Two DataFrames

You can concatenate two DataFrames in polars using the pl.concat() function. This function allows you to join DataFrames vertically (row-wise) or horizontally (column-wise).

First, let’s create two Polars DataFrames with different data, and then use the concat() method to combine them.


import polars as pl

df = pl.DataFrame({'Courses': ["Spark","PySpark","Python","pandas"],
                    'Fee' : [20000,25000,22000,24000]})

df1 = pl.DataFrame({'Courses': ["Pandas","Hadoop","Hyperion","Java"],
                    'Fee': [25000,25200,24500,24900]})
print("First DataFrame:\n", df)
print("Second DataFrame:\n", df1)

Yields below output.

polars concatenate two dataframes

You can use the polars.concat() method to concatenate two DataFrames row-wise, effectively appending them. By default, it functions like a union, combining all rows from both DataFrames into a single DataFrame.


# Concatenate row-wise (default)
df2 = pl.concat([df, df1])
print("After concatenating the two DataFrames:\n", df2)

# Using pandas.concat() 
# To concat two DataFrames
data = [df, df1]
df2 = pl.concat(data)
print("After concatenating the two DataFrames:\n", df2)

Yields below output.

polars concatenate two dataframes

Alternatively, the rechunk=True option in pl.concat() is used to optimize memory layout by ensuring the resulting polars DataFrame is stored as contiguous memory blocks. This can improve performance, especially when working with large DataFrames.


# Row-wise append with rechunking
df2 = pl.concat([df, df1], rechunk=True)
print("After concatenating the two DataFrames:\n", df2)

This example yields the above output.

Concatenating DataFrames Vertically (Row-wise)

When concatenating DataFrames vertically, rows from the second DataFrame are added below the first one. This is similar to pd.concat([…], axis=0) in Pandas. By default, pl.concat() performs row-wise concatenation (how="vertical").


# Concatenating DataFrames vertically (row-wise)
df2 = pl.concat([df, df1], how="vertical")
print(df2)

This example yields the above output.

Using vstack() Method for Appending Rows

You can use the vstack() method to append rows from one DataFrame to another. This method is an alternative to pl.concat() and is useful when you need to append one DataFrame at a time.


# Append df2 to df1 using vstack
df2 = df.vstack(df1)
print(df2)

This example yields the above output.

Append In-Place Using extend()

The extend() method allows you to append rows in-place, modifying the original DataFrame instead of creating a new one. This is useful when working with large datasets to avoid unnecessary copies and improve performance.


# Append df2 to df1 in-place
df3 = df.extend(df1)
print(df3)

This example yields the above output.

Concatenating Two DataFrames Horizontally (Column-wise)

Similarly, when concatenating two DataFrames horizontally, we align them side-by-side, adding columns from one DataFrame to another while keeping the row order intact. This method is useful when both DataFrames have the same number of rows and provide complementary information.


import polars as pl

df = pl.DataFrame({'Courses': ["Spark", "PySpark", "Python", "Pandas"],
                    'Fee' : ['20000', '25000', '22000', '24000']}) 
  
df1 = pl.DataFrame({'Duration':['30day','40days', '60days','55days'],
                    'Discount':[1000,2500,2000,3000]})
  
# Column-wise concatenation
df2 = pl.concat([df, df1], how="horizontal")
print("After concatenating column-wise:\n", df2)

# Output:
# After concatenating column-wise:
# shape: (4, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fee   ┆ Duration ┆ Discount │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ str   ┆ str      ┆ i64      │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark   ┆ 20000 ┆ 30day    ┆ 1000     │
│ PySpark ┆ 25000 ┆ 40days   ┆ 2500     │
│ Python  ┆ 22000 ┆ 60days   ┆ 2000     │
│ Pandas  ┆ 24000 ┆ 55days   ┆ 3000     │
└─────────┴───────┴──────────┴──────────┘

Here,

  • Both DataFrames must have the same number of rows. If they have different row counts, Polars will raise an error.
  • The resulting DataFrame retains all columns from both input DataFrames.
  • If column data types don’t match, Polars will attempt to infer compatible types.

Concatenate Multiple DataFrames Using polars.concat()

You can concatenate multiple polars DataFrames at once using the pl.concat() function. This is useful when combining more than two DataFrames, either row-wise or column-wise.


import polars as pl

df = pl.DataFrame({'Courses': ["Spark", "PySpark"],
                    'Fee' : ['20000', '25000']}) 
  
df1 = pl.DataFrame({'Courses': ["Unix", "Hadoop"],
                    'Fee': ['25000', '24500']})
  
df2 = pl.DataFrame({'Courses': ["MongoDB", "Hyperion"],
                    'Fee': ['25000', '24500']})

# Appending multiple DataFrame
df3 = pl.concat([df, df1, df2])
print(df3)

# Output:
# shape: (6, 2)
┌──────────┬───────┐
│ Courses  ┆ Fee   │
│ ---      ┆ ---   │
│ str      ┆ str   │
╞══════════╪═══════╡
│ Spark    ┆ 20000 │
│ PySpark  ┆ 25000 │
│ Unix     ┆ 25000 │
│ Hadoop   ┆ 24500 │
│ MongoDB  ┆ 25000 │
│ Hyperion ┆ 24500 │
└──────────┴───────┘

Here,

  • In this case, df, df1, and df2 are combined row-wise (default behavior) to form a single DataFrame, df3. For successful concatenation, all DataFrames should have the same column structure.

Conclusion

In summary, Polars provides fast and efficient methods for appending and concatenating DataFrames. Whether using pl.concat() for flexible row-wise (how="vertical") or column-wise (how="horizontal") merging, or vstack() for strict row-wise appending, Polars enables seamless data manipulation, even for large datasets.

Happy Learning!!

References