• Post author:
  • Post category:PySpark
  • Post last modified:July 3, 2025
  • Reading time:16 mins read
You are currently viewing PySpark dense_rank() Function with Examples

In PySpark, the dense_rank() window function is used to assign ranks to rows within a partition of a DataFrame based on specified order criteria. When multiple rows have the same value in the ordering column, they receive the same rank, and unlike <a href="https://sparkbyexamples.com/pyspark/pyspark-rank-function-with-examples/">rank()</a>, it doesn’t skip the subsequent rank. This function is helpful when you need a consistent, gap-free ranking within groups.

Advertisements

In this article, you’ll learn how to use the dense_rank() function with partitionBy() and orderBy() to group and rank data in a DataFrame.

Key Points-

  • dense_rank() is a PySpark window function for ranking rows.
  • Duplicate values receive the same rank.
  • No skipped ranks for ties (gapless ranking).
  • Requires a window specification: Window.partitionBy().orderBy().
  • If partitionBy() is omitted, the entire DataFrame is treated as a single group.
  • Supports both ascending and descending order.
  • Suitable for analytics, leaderboard generation, and grouped ranking tasks.
  • Ideal for “top-N by group” scenarios without rank gaps.
  • dense_rank() vs rank(): dense_rank() does not skip ranks on ties.
  • row_number() assigns a unique sequential number to each row.

PySpark dense_rank()

The dense_rank() function ranks rows within a partition based on the specified order. Rows with the same order value receive the same rank, but the next rank is not skipped.

Syntax

The following is the syntax of the dense_rank() function.


# Syntax of the dense_rank()
from pyspark.sql.functions import dense_rank
pyspark.sql.functions.dense_rank()

Parameters

  • It has no direct parameters.
  • Needs a Window specification to work.

Return Value

Returns a column of type IntegerType, assigning ranks without skipping for ties

PySpark dense_rank Partition By

You can use the dense_rank() function to add a new column based on a specified window. For that, you can apply the dense_rank() function to a specific partition using a defined ordering. This function assigns a rank to each row based on the order. If two or more rows have the same value, they get the same rank, but unlike rank(), dense_rank() does not skip the next numbers in the ranking.


# Add a new column using dense_rank() over the specified window
# Applying partitionBy() and orderBy()
from pyspark.sql import SparkSession
from pyspark.sql.functions import row_number, rank, dense_rank, col
from pyspark.sql.window import Window

# Create SparkSession
spark = SparkSession.builder.appName("Sparkbyexamples").getOrCreate()

# Sample data
data = [
    ("James", "Sales", 3000),
    ("Michael", "Sales", 4600),
    ("Robert", "Sales", 4100),
    ("Maria", "Finance", 3000),
    ("Scott", "Finance", 3300),
    ("Jen", "Finance", 3900),
    ("Jeff", "Marketing", 3000),
    ("Kumar", "Marketing", 2000),
    ("Saif", "Sales", 4100)
]

columns = ["employee_name", "department", "salary"]
df = spark.createDataFrame(data, columns)

df.show()

window_spec = Window.partitionBy("department").orderBy(col("salary"))
df.withColumn("dense_rank", dense_rank().over(window_spec)).show()

Yields below the output.

PySpark dense_rank() function
PySpark dense_rank() function

PySpark dense_rank Without Partition

You can also use the dense_rank() function to assign ranks without partitioning by using orderBy(). It will treat the whole DataFrame as a single group and add row numbers based on the global order of the specified column.


# Add the rank to each row without partition
global_window = Window.orderBy(col("salary").desc())
df.withColumn("dense_rank", dense_rank().over(global_window)).show()

# Output:
# +-------------+----------+------+----------+
# |employee_name|department|salary|dense_rank|
# +-------------+----------+------+----------+
# |      Michael|     Sales|  4600|         1|
# |       Robert|     Sales|  4100|         2|
# |         Saif|     Sales|  4100|         2|
# |          Jen|   Finance|  3900|         3|
# |        Scott|   Finance|  3300|         4|
# |        James|     Sales|  3000|         5|
# |        Maria|   Finance|  3000|         5|
# |         Jeff| Marketing|  3000|         5|
# |        Kumar| Marketing|  2000|         6|
# +-------------+----------+------+----------+

PySpark dense_rank Order by Desc

To add ranks within each group based on descending order, you can use the dense_rank() function along with a window specification.


# Add the rank to each row within a partition by descending order 
window_spec = Window.partitionBy("department").orderBy(col("salary").desc())
df.withColumn("dense_rank", dense_rank().over(window_spec)).show()

# Output:
# +-------------+----------+------+----------+
# |employee_name|department|salary|dense_rank|
# +-------------+----------+------+----------+
# |          Jen|   Finance|  3900|         1|
# |        Scott|   Finance|  3300|         2|
# |        Maria|   Finance|  3000|         3|
# |         Jeff| Marketing|  3000|         1|
# |        Kumar| Marketing|  2000|         2|
# |      Michael|     Sales|  4600|         1|
# |       Robert|     Sales|  4100|         2|
# |         Saif|     Sales|  4100|         2|
# |        James|     Sales|  3000|         3|
# +-------------+----------+------+----------+

dense_rank() vs rank()

This comparison demonstrates how each function behaves when multiple rows have identical values.


# Difference between PySpark rank() and dense_rank()
window_spec = Window.partitionBy("department").orderBy(col("salary"))
result_df = df.withColumn("rank", rank().over(window_spec)) \
                  .withColumn("dense_rank", dense_rank().over(window_spec))
result_df.show()

# Output:
# +-------------+----------+------+----+----------+
# |employee_name|department|salary|rank|dense_rank|
# +-------------+----------+------+----+----------+
# |        Maria|   Finance|  3000|   1|         1|
# |        Scott|   Finance|  3300|   2|         2|
# |          Jen|   Finance|  3900|   3|         3|
# |        Kumar| Marketing|  2000|   1|         1|
# |         Jeff| Marketing|  3000|   2|         2|
# |        James|     Sales|  3000|   1|         1|
# |       Robert|     Sales|  4100|   2|         2|
# |         Saif|     Sales|  4100|   2|         2|
# |      Michael|     Sales|  4600|   4|         3|
# +-------------+----------+------+----+----------+

PySpark rank() vs dense_rank() vs row_number()

This example shows the differences between the rank(), dense_rank(), and row_number() functions in PySpark with a window partition. We’ll apply these functions to a DataFrame to add columns that represent row rankings based on the specified partition.


# Complete example of Difference between PySpark rank(), dense_rank(),and row_number()
# Applying partitionBy() and orderBy()
window_spec = Window.partitionBy("department").orderBy(col("salary"))
result_df = df.withColumn("row_number", row_number().over(window_spec))\
                  .withColumn("rank", rank().over(window_spec)) \
                  .withColumn("dense_rank", dense_rank().over(window_spec))


# Show the result
result_df.show()

# Output:
# +-------------+----------+------+----------+----+----------+
# |employee_name|department|salary|row_number|rank|dense_rank|
# +-------------+----------+------+----------+----+----------+
# |        Maria|   Finance|  3000|         1|   1|         1|
# |        Scott|   Finance|  3300|         2|   2|         2|
# |          Jen|   Finance|  3900|         3|   3|         3|
# |        Kumar| Marketing|  2000|         1|   1|         1|
# |         Jeff| Marketing|  3000|         2|   2|         2|
# |        James|     Sales|  3000|         1|   1|         1|
# |       Robert|     Sales|  4100|         2|   2|         2|
# |         Saif|     Sales|  4100|         3|   2|         2|
# |      Michael|     Sales|  4600|         4|   4|         3|
# +-------------+----------+------+----------+----+----------+

Frequently Asked Questions of PySpark dense_rank() Function

What does the dense_rank() function do in PySpark?

The dense_rank() function assigns ranks to rows within a partition based on a specified order. When multiple rows have the same value, they are given the same rank, and no ranks are skipped afterward.

How is dense_rank() different from rank() and row_number()?

dense_rank(): Assigns the same rank to duplicate values without skipping the next rank (e.g., 1, 2, 2, 3).
rank(): Assigns the same rank to duplicates but skips subsequent ranks (e.g., 1, 2, 2, 4).
row_number(): Assigns a unique sequential number to each row, even if the values are the same (e.g., 1, 2, 3, 4).

How can I use dense_rank() without partitionBy()?

You use dense_rank() with only orderBy(), the entire DataFrame is considered a single group, and rows are ranked globally based on the specified column.

What happens when multiple rows have the same value in the ordering column?

They receive the same rank. Unlike rank(), the next rank is not skipped. For example, two rows tied at rank 2 will be followed by rank 3.

What is the return type of dense_rank()?

The function returns a column of type IntegerType, where each row has a numeric rank based on the sort and partition criteria.

How can I use dense_rank() to find top-N rows in each group?

You can use dense_rank() with a Window.partitionBy() and orderBy(), and then filter rows where the rank is less than or equal to N. This approach is ideal for selecting top performers per group.

Conclusion

In this article, you have learned how to use the dense_rank() function in PySpark with or without partitions. You also saw how it differs from rank() and row_number() and when to use each. dense_rank() is ideal for scenarios that require gap-free ranking, such as leaderboards and top-N selections by group.

Happy Learning!!

References