In pandas. the series.rank()
function is used to compute numerical data ranks along a specified axis (either rows or columns). It assigns ranks to each element in the Series based on their values. If there are ties, it can handle them in different ways, like by assigning the average rank or the minimum rank to the tied values. This function by default assigns ranks in ascending order.
In this article, I will explain the series.rank()
function by using its syntax, parameters, and usage how we can return a Series with ranks as its values. This function can handle ties either by assigning the average rank, minimum rank, maximum rank, or ranks based on the order of appearance in the data.
Key Points –
pandas.Series.rank()
function assigns ranks to the elements of a Series based on their values, facilitating analysis of relative positions within the Series.rank()
assigns ranks to the elements in a Series based on their values, by default in ascending order.- It offers flexibility in handling tied values through the method parameter, with options like
average
,min
,max
, andfirst
. - The
na_option
parameter specifies how NaN (missing) values are treated during ranking. - You can choose to rank numeric values only using the
numeric_only
parameter, or rank all values in the Series.
Series rank() Introduction
Following is the syntax of the pandas series rank() function.
# Syntax of Series rank() function
Series.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)
Parameters of the Series rank()
Following are the parameters of the series rank() function
axis
– Specifies whether to rank along the index (rows) or the columns. It can take values 0 orindex
for rows, and 1 orcolumns
for columns. The default value is 0.method
– Determines how to handle ties. Options includeaverage
(default),min
,max
, andfirst
.numeric_only
– A boolean parameter indicating whether to rank only numeric values. If set to True, only numeric values are ranked. If False, all values are ranked. If set to None (default), numeric values are ranked if the Series is of a numeric type.na_option
– Determines how to handle NaN (missing) values. Options includekeep
(default),top
, andbottom
.ascending
– A boolean parameter that specifies whether to rank in ascending (True) or descending (False) order. The default value is True.pct
– A boolean parameter that, if True, computes the percentage rank of the data. The default value is False.
Return Value
It returns a new Series containing the ranks of the elements in the original Series.
Pandas Series.rank() Function Using Basic Ranking of a Series
To perform a basic ranking of a Series, you can use the rank()
function in pandas. This function assigns ranks to the elements in the Series based on their values.
First, let’s create a Pandas Series from a list.
import pandas as pd
# Create a sample Series
series = pd.Series([20, 10, 20, 40, 30, 10, 50])
print("Original Series:\n",series)
Yields below output.
To calculate the ranks of the elements in the given pandas Series series
using the rank()
function.
# Get the ranks
ser2 = series.rank()
print("Getting the value of the ranks:\n",ser2)
In the resulting ranked Series, each element is replaced by its rank. Ties are handled by assigning the average rank to the tied values. For instance, both elements with the value 10
are assigned a rank of 1.5
, as they are tied for the lowest rank.
Get the Ranks Using the Min Method
Alternatively, if you want to use the min
method for ranking ties, you can pass the method
parameter with the value min
to the rank()
function.
# Get the ranks using the min method
ser2 = series.rank(method='min')
print("\nRanking of the Series with 'min' method:\n", ser2)
# Output:
# Ranking of the Series with 'min' method:
# 0 3.0
#1 1.0
#2 3.0
#3 6.0
#4 5.0
#5 1.0
#6 7.0
#dtype: float64
In the above example, the min
method is used to rank the Series. Ties are handled by assigning the minimum rank among the tied values. So, for instance, both elements with the value 10
are assigned a rank of 1.0
, as they are tied for the lowest rank, and min
method ensures that the smallest rank is assigned.
Ranking in Descending Order
Use the ascending
parameter of the rank()
method to rank the Series in descending order. By setting ascending=False
, the ranks will be assigned in descending order.
# Get the ranks in descending order
ser2 = series.rank(ascending=False)
print("Ranking of the Series in descending order:\n", ser2)
# Output:
# Ranking of the Series in descending order:
# 0 4.5
#1 6.5
#2 4.5
#3 2.0
#4 3.0
#5 6.5
#6 1.0
#dtype: float64
In the above examples, the ranks are assigned in descending order, with the highest value receiving the lowest rank (1.0) and ties being handled by assigning the average rank.
Computing Percentage Ranks
Similarly, to compute percentage ranks in Pandas, you can use the pct_rank()
method. This method calculates the percentage rank of each element in the Series, representing the percentage of values in the Series that are less than or equal to the given element.
# Compute percentage ranks
ser2 = series.rank(pct=True)
print("Percentage ranks of the Series:\n",ser2)
# Output:
# Percentage ranks of the Series:
# 0 0.500000
#1 0.214286
#2 0.500000
#3 0.857143
#4 0.714286
#5 0.214286
#6 1.000000
#dtype: float64
In the above example, the percentage ranks of the Series elements are computed, and the resulting percentage ranks are printed alongside the original Series. Each element’s percentage rank represents its position relative to the other elements in the Series.
Handling NaN Values by Placing them at the Bottom
To handle NaN values by placing them at the bottom when ranking a Series in Pandas, you can use the na_option
parameter of the rank()
method. By setting na_option=bottom
, the NaN values will be placed at the bottom of the ranking.
import pandas as pd
import numpy as np
# Create a sample Series with NaN values
series = pd.Series([20, 10, 20, 40, None, 30, 10, 50, None])
# Rank the Series with NaN values placed at the bottom
ser2 = series.rank(na_option='bottom')
print("Ranks with NaN values at the bottom:\n", ser2)
# Output:
# Ranks with NaN values at the bottom:
# 0 4.0
#1 2.0
#2 4.0
#3 7.0
#4 NaN
#5 6.0
#6 2.0
#7 8.0
#8 NaN
#dtype: float64
In the above example, the rank()
function is used to assign ranks to the Series elements, with NaN values placed at the bottom. The resulting ranks are then printed alongside the original Series.
Frequently Asked Questions on Pandas Series rank() Function
The purpose of the rank()
function in Pandas is to assign ranks to the elements of a Series based on their values. This function allows you to analyze and compare the relative positions of values within a Series.
By default, the rank()
function handles ties by assigning the average rank to tied values. However, you can specify different tie-breaking methods using the method
parameter, such as average
, min
, max
, first
, or dense
.
Setting pct=True
computes percentage ranks instead of actual ranks. This calculates the percentage of values in the Series that are less than or equal to each element.
The na_option
parameter determines how NaN (missing) values are handled. Setting na_option='bottom'
places NaN values at the bottom of the ranking.
Conclusion
In this article, I have explained the Pandas series rank()
function by using its syntax, parameters, usage and how we can return a new Series containing the ranks of the elements in the original Series. Each element in the resulting Series corresponds to the rank of the corresponding element in the original Series.
Happy Learning!!
Related Articles
- Pandas series.str.get() Function
- Pandas Series iloc[] Function
- Pandas Series.clip() Function
- Pandas Series map() Function
- Pandas Series.min() Function
- Pandas Convert Series to Json
- Pandas Series.min() Function
- Pandas Convert Series to Json
- Use pandas.to_numeric() Function
- Pandas Series Drop duplicates() Function
- What is a Pandas Series Explained With Examples