Pandas Series rank() Function

In pandas. the series.rank() function is used to compute numerical data ranks along a specified axis (either rows or columns). It assigns ranks to each element in the Series based on their values. If there are ties, it can handle them in different ways, like by assigning the average rank or the minimum rank to the tied values. This function by default assigns ranks in ascending order.

Series rank() Introduction

Following is the syntax of the pandas series rank() function.


# Syntax of Series rank() function
Series.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

Parameters of the Series rank()

Following are the parameters of the series rank() function

axis – Specifies whether to rank along the index (rows) or the columns. It can take values 0 or index for rows, and 1 or columns for columns. The default value is 0.
method – Determines how to handle ties. Options include average (default), min, max, and first.
numeric_only – A boolean parameter indicating whether to rank only numeric values. If set to True, only numeric values are ranked. If False, all values are ranked. If set to None (default), numeric values are ranked if the Series is of a numeric type.
na_option – Determines how to handle NaN (missing) values. Options include keep (default), top, and bottom.
ascending – A boolean parameter that specifies whether to rank in ascending (True) or descending (False) order. The default value is True.
pct – A boolean parameter that, if True, computes the percentage rank of the data. The default value is False.

Return Value

It returns a new Series containing the ranks of the elements in the original Series.

Pandas Series.rank() Function Using Basic Ranking of a Series

To perform a basic ranking of a Series, you can use the rank() function in pandas. This function assigns ranks to the elements in the Series based on their values.

First, let’s create a Pandas Series from a list.


import pandas as pd

# Create a sample Series
series = pd.Series([20, 10, 20, 40, 30, 10, 50])
print("Original Series:\n",series)

Yields below output.

To calculate the ranks of the elements in the given pandas Series series using the rank() function.


# Get the ranks
ser2 = series.rank()
print("Getting the value of the ranks:\n",ser2)

In the resulting ranked Series, each element is replaced by its rank. Ties are handled by assigning the average rank to the tied values. For instance, both elements with the value 10 are assigned a rank of 1.5, as they are tied for the lowest rank.

Get the Ranks Using the Min Method

Alternatively, if you want to use the min method for ranking ties, you can pass the method parameter with the value min to the rank() function.


# Get the ranks using the min method
ser2 = series.rank(method='min')
print("\nRanking of the Series with 'min' method:\n", ser2)

# Output:
# Ranking of the Series with 'min' method:
# 0    3.0
#1    1.0
#2    3.0
#3    6.0
#4    5.0
#5    1.0
#6    7.0
#dtype: float64

In the above example, the min method is used to rank the Series. Ties are handled by assigning the minimum rank among the tied values. So, for instance, both elements with the value 10 are assigned a rank of 1.0, as they are tied for the lowest rank, and min method ensures that the smallest rank is assigned.

Ranking in Descending Order

Use the ascending parameter of the rank() method to rank the Series in descending order. By setting ascending=False, the ranks will be assigned in descending order.


# Get the ranks in descending order
ser2 = series.rank(ascending=False)
print("Ranking of the Series in descending order:\n", ser2)

# Output:
# Ranking of the Series in descending order:
# 0    4.5
#1    6.5
#2    4.5
#3    2.0
#4    3.0
#5    6.5
#6    1.0
#dtype: float64

In the above examples, the ranks are assigned in descending order, with the highest value receiving the lowest rank (1.0) and ties being handled by assigning the average rank.

Computing Percentage Ranks

Similarly, to compute percentage ranks in Pandas, you can use the pct_rank() method. This method calculates the percentage rank of each element in the Series, representing the percentage of values in the Series that are less than or equal to the given element.


# Compute percentage ranks
ser2 = series.rank(pct=True)
print("Percentage ranks of the Series:\n",ser2)

# Output:
# Percentage ranks of the Series:
# 0    0.500000
#1    0.214286
#2    0.500000
#3    0.857143
#4    0.714286
#5    0.214286
#6    1.000000
#dtype: float64

In the above example, the percentage ranks of the Series elements are computed, and the resulting percentage ranks are printed alongside the original Series. Each element’s percentage rank represents its position relative to the other elements in the Series.

Handling NaN Values by Placing them at the Bottom

To handle NaN values by placing them at the bottom when ranking a Series in Pandas, you can use the na_option parameter of the rank() method. By setting na_option=bottom, the NaN values will be placed at the bottom of the ranking.


import pandas as pd
import numpy as np

# Create a sample Series with NaN values
series = pd.Series([20, 10, 20, 40, None, 30, 10, 50, None])

# Rank the Series with NaN values placed at the bottom
ser2 = series.rank(na_option='bottom')
print("Ranks with NaN values at the bottom:\n", ser2)

# Output:
# Ranks with NaN values at the bottom:
# 0    4.0
#1    2.0
#2    4.0
#3    7.0
#4    NaN
#5    6.0
#6    2.0
#7    8.0
#8    NaN
#dtype: float64

In the above example, the rank() function is used to assign ranks to the Series elements, with NaN values placed at the bottom. The resulting ranks are then printed alongside the original Series.

Frequently Asked Questions on Pandas Series rank() Function

What is the purpose of the rank() function in Pandas?

The purpose of the rank() function in Pandas is to assign ranks to the elements of a Series based on their values. This function allows you to analyze and compare the relative positions of values within a Series.

How does the rank() function handle ties?

By default, the rank() function handles ties by assigning the average rank to tied values. However, you can specify different tie-breaking methods using the method parameter, such as average, min, max, first, or dense.

What does the pct=True parameter do?

Setting pct=True computes percentage ranks instead of actual ranks. This calculates the percentage of values in the Series that are less than or equal to each element.

How does the na_option parameter work?

The na_option parameter determines how NaN (missing) values are handled. Setting na_option='bottom' places NaN values at the bottom of the ranking.

Conclusion

In this article, I have explained the Pandas series rank() function by using its syntax, parameters, usage and how we can return a new Series containing the ranks of the elements in the original Series. Each element in the resulting Series corresponds to the rank of the corresponding element in the original Series.

Happy Learning!!

References

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html