• Post author:
  • Post category:Polars
  • Post last modified:February 20, 2025
  • Reading time:11 mins read
You are currently viewing Polars DataFrame limit() Method

In Polars, the limit() method is used to retrieve a specific number of rows from a DataFrame. It functions similarly to SQL’s LIMIT clause, allowing you to restrict the number of rows returned from a DataFrame.

Advertisements

In this article, I will explain the Polars DataFrame limit() method by using its syntax, parameters, usage, and how to return a new Polars DataFrame containing only the first n rows of the original DataFrame.

Key Points –

  • limit(n) restricts the DataFrame to the first n rows.
  • If n is not specified, limit() returns the first 5 rows by default.
  • You can assign n dynamically based on runtime conditions.
  • Apply filter() first to remove unwanted rows before limiting the output.
  • Can be used after select() to return a limited subset of columns and rows.
  • It does not modify the original DataFrame but returns a new one with limited rows.
  • Useful for previewing large datasets without loading the entire DataFrame into memory.
  • When working with large datasets, limit() helps avoid excessive memory consumption by reducing the number of processed rows.

Polars DataFrame limit() Introduction

Following is a syntax of the DataFrame.limit(). This function takes n params.


# Syntax of limit()
DataFrame.limit(n: int = 5) → DataFrame

Parameters of the Polars DataFrame.limit()

It allows only one parameter.

  • n (int, default = 5) – Number of rows to return.

Return Value

This function returns a new Polars DataFrame with at most n rows.

Usage of Polars DataFrame limit() Method

The limit() method in Polars is used to restrict the number of rows returned from a DataFrame, similar to SQL’s LIMIT clause. It is useful for previewing data, reducing computation time, and implementing pagination.

To run some examples of the Polars DataFrame limit() method, let’s create a Polars DataFrame.


import polars as pl

technologies= {
    'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas", "PySpark", "Java"],
    'Fee' :[22000, 25000, 23000, 24000, 26000, 30000, 35000],
    'Discount':[1000, 2300, 1000, 1200, 2500, 2000, 2200],
    'Duration':['35days', '40days', '65days', '50days', '60days', '30days', '45days']
          }

df = pl.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

polars limit

To retrieve the first 5 rows of a DataFrame using Polars’ limit() method (which defaults to 5), simply call limit() without any arguments.


# Using limit() without arguments (defaults to 5 rows)
df2= df.limit()
print("Last 5 rows of the DataFrame (default usage):\n", df2)

Here,

  • By default, limit() returns 5 rows if no parameter is provided.
  • If you want more or fewer rows, pass a specific number, df.limit(n).
polars limit

Specifying the Number of Rows Using limit() Method

You can specify the exact number of rows you want to retrieve using the limit(n) method, where n is the number of rows.


# Specifying the number of rows explicitly
df2 = df.limit(4) 
print(df2)

# Output:
# shape: (4, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fee   ┆ Discount ┆ Duration │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ i64      ┆ str      │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark   ┆ 22000 ┆ 1000     ┆ 35days   │
│ PySpark ┆ 25000 ┆ 2300     ┆ 40days   │
│ Hadoop  ┆ 23000 ┆ 1000     ┆ 65days   │
│ Python  ┆ 24000 ┆ 1200     ┆ 50days   │
└─────────┴───────┴──────────┴──────────┘

Here,

  • limit(4) retrieves exactly 4 rows from the start of the DataFrame.
  • If n is greater than the total number of rows, it returns the entire DataFrame.
  • limit() without arguments defaults to 5 rows (df.limit()).

Limiting Rows Dynamically

You can dynamically set the number of rows to return using a variable. This is useful when the limit value is determined at runtime.


# Dynamically setting the limit
n = 3  # Change this value to limit rows dynamically
df2 = df.limit(n)
print(df2)

# Output:
# shape: (3, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fee   ┆ Discount ┆ Duration │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ i64      ┆ str      │
╞═════════╪═══════╪══════════╪══════════╡
│ Spark   ┆ 22000 ┆ 1000     ┆ 35days   │
│ PySpark ┆ 25000 ┆ 2300     ┆ 40days   │
│ Hadoop  ┆ 23000 ┆ 1000     ┆ 65days   │
└─────────┴───────┴──────────┴──────────┘

Here,

  • Use a variable (n) to dynamically set the number of rows.
  • Modify n at runtime to adjust the result dynamically.

Combining limit() with filter() Method

You can use filter() to select specific rows based on a condition and then apply limit(n) to return only a subset of those filtered rows.


# Applying filter to select rows where Fee > 24000, then limit to 3 rows
df2 = df.filter(pl.col("Fee") > 24000).limit(3)
print(df2)

# Output:
# shape: (3, 4)
┌─────────┬───────┬──────────┬──────────┐
│ Courses ┆ Fee   ┆ Discount ┆ Duration │
│ ---     ┆ ---   ┆ ---      ┆ ---      │
│ str     ┆ i64   ┆ i64      ┆ str      │
╞═════════╪═══════╪══════════╪══════════╡
│ PySpark ┆ 25000 ┆ 2300     ┆ 40days   │
│ Pandas  ┆ 26000 ┆ 2500     ┆ 60days   │
│ PySpark ┆ 30000 ┆ 2000     ┆ 30days   │
└─────────┴───────┴──────────┴──────────┘

Here,

  • filter() first removes rows that don’t match the condition.
  • limit(3) then restricts the output to 3 rows.

Using limit() with select() Method

You can use select() to choose specific columns from the DataFrame and then apply limit(n) to restrict the number of rows returned.


# Selecting specific columns and limiting rows
df2 = df.select(["Courses", "Fee"]).limit(3)
print(df2)

# Output:
# shape: (3, 2)
┌─────────┬───────┐
│ Courses ┆ Fee   │
│ ---     ┆ ---   │
│ str     ┆ i64   │
╞═════════╪═══════╡
│ Spark   ┆ 22000 │
│ PySpark ┆ 25000 │
│ Hadoop  ┆ 23000 │
└─────────┴───────┘

Here,

  • select(["Courses", "Fee"]) retrieves only the specified columns.
  • limit(3) restricts the number of rows.
  • This is useful when you need a subset of both rows and columns.

Conclusion

In conclusion, the limit() method in Polars is a powerful way to restrict the number of rows in a DataFrame. It is useful for previewing data, sampling, and optimizing performance when working with large datasets.

Happy Learning!!

References