In Pandas, the rank()
method is used to assign ranks to entries in a DataFrame based on their values. By default, the ranking is done in ascending order. The method can be particularly useful for statistical analysis or when you need to compare data points in terms of their relative positions.
In this article, I will explain the Pandas DataFrame rank()
method by using its syntax, parameters, and usage, and how to return a DataFrame or Series of the same shape as the input, where each value is replaced by its rank. The ranks are assigned according to the specified ranking method and options.
Key Points –
rank()
supports various methods for ranking, includingaverage
,min
,max
,first
, anddense
.- The method specifies how to handle ties (duplicate values) in the ranking process.
- You can control the ranking order using the
ascending
parameter to rank in ascending or descending order. - The
na_option
parameter determines how NA values are treated during ranking. - The
axis
parameter allows you to rank data either across rows or columns. - By default,
rank()
ranks in ascending order and uses theaverage
method for ties.
Syntax of Pandas DataFrame rank() Method
Following is the syntax of the pandas DataFrame rank() method.
# Syntax of dataframe rank() method
DataFrame.rank(axis=0, method='average', numeric_only=False, na_option='keep', ascending=True, pct=False)
Parameters of the DataFrame rank()
Following are the parameters of the DataFrame rank() method.
axis
– Determines whether to rank along rows (axis=0
) or columns (axis=1
). Default is0
.method
– Specifies the method to use for ranking. Options includeaverage
(default) – Average rank of the group of duplicates.min
– Minimum rank of the group of duplicates.max
– Maximum rank of the group of duplicates.first
– Ranks assigned in the order they appear in the data.dense
– Ranks are assigned without gaps between ranks.
numeric_only
– bool, default False. For DataFrame objects, rank only numeric columns if set to True.na_option
– It includeskeep
,top
,bottom
, and the default iskeep
.keep
– Leave NA values as they are.top
– Assign smallest rank to NA values.bottom
-Assign largest rank to NA values.
ascending
– Boolean value that specifies whether to rank in ascending order (True
) or descending order (False
). Default isTrue
.pct
– Boolean value that determines whether to return the rank as a percentage of the total number of items. Default isFalse
.
Return Value
- It returns a Series or DataFrame with the ranks of the data as values.
Usage of Pandas DataFrame rank() Method
The rank()
method in Pandas is used to assign ranks to the values in a DataFrame or Series. The ranks start at 1 and can be calculated in different ways.
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are A
, B
, and C
.
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'A': [5, 8, 13, 15],
'B': [2, 9, 4, 6],
'C': [8, 6, 12, 3]
})
print("Original DataFrame:\n",df)
Yields below output.
To perform basic ranking on the DataFrame df
, you use the rank()
method.
# Apply basic ranking
df2 = df.rank()
print("Ranked DataFrame:\n", df2)
In the above example, each value in the DataFrame is replaced by its rank, with 1 being the smallest rank and ranks assigned in ascending order.
Ranking along rows
Alternatively, to rank the values along the rows in the DataFrame, you set the axis
parameter to 1.
# Apply ranking along rows
df2 = df.rank(axis=1)
print("Ranked DataFrame (along rows):\n", df2)
# Output:
# Ranked DataFrame (along rows):
# A B C
# 0 2.0 1.0 3.0
# 1 2.0 3.0 1.0
# 2 3.0 1.0 2.0
# 3 3.0 2.0 1.0
In the above example, the ranks are computed within each row, with the smallest value in each row receiving rank 1
.
Different Ranking Methods
To apply different ranking methods to the DataFrame, you can use the method
parameter in the rank()
method.
# Apply different ranking methods
df_ranked_min = df.rank(method='min')
print("Min method:\n", df_ranked_min)
df_ranked_max = df.rank(method='max')
print("Max method:\n", df_ranked_max)
df_ranked_average = df.rank(method='average')
print("Average method:\n", df_ranked_average)
df_ranked_first = df.rank(method='first')
print("First method:\n", df_ranked_first)
df_ranked_dense = df.rank(method='dense')
print("Dense method:\n", df_ranked_dense)
Here, each ranking method handles ties differently.
Min
– Assigns the smallest rank to all tied values.Max
– Assigns the largest rank to all tied values.Average
– Assigns the average rank to tied values.First
– Assigns ranks in the order the values appear.Dense
– Similar to min, but ranks are always incremented by 1 between groups.
Descending Order and Percentage Rank
Similarly, to apply descending order ranking and percentage ranking with the rank()
method, you can use the ascending and pct
parameters, respectively.
# Apply ranking in descending order
df_ranked_desc = df.rank(ascending=False)
print("Descending Order Ranking:\n", df_ranked_desc)
# Output:
# Descending Order Ranking:
# A B C
# 0 4.0 4.0 2.0
# 1 3.0 1.0 3.0
# 2 2.0 3.0 1.0
# 3 1.0 2.0 4.0
Descending Order Ranking (ascending=False)
: Ranks are assigned in descending order, where the highest value gets the rank of 1, and the lowest gets the highest rank.
# Apply percentage ranking
df_ranked_pct = df.rank(pct=True)
print("Percentage Ranking:\n", df_ranked_pct)
# Output:
# Percentage Ranking:
# A B C
# 0 0.25 0.25 0.75
# 1 0.50 1.00 0.50
# 2 0.75 0.50 1.00
# 3 1.00 0.75 0.25
Percentage Ranking (pct=True)
: Ranks are converted into percentages of the total number of elements, where the smallest value is ranked as 0.0 and the largest as 1.0, with intermediate values as fractions.
Handling NA values
Finally, to handle NA values during ranking in a DataFrame, you can use the na_option
parameter in the rank()
method.
import pandas as pd
# Sample DataFrame with NA values
df = pd.DataFrame({
'A': [5, 8, None, 15],
'B': [2, None, 4, 6],
'C': [8, 6, 12, None]
})
# Apply ranking with different NA options
df_ranked_keep = df.rank(na_option='keep')
print("Keep NA:\n", df_ranked_keep)
df_ranked_top = df.rank(na_option='top')
print("Top NA:\n", df_ranked_top)
df_ranked_bottom = df.rank(na_option='bottom')
print("Bottom NA:\n", df_ranked_bottom)
Here,
na_option=keep
: NA values remain NA in the output DataFrame and do not receive a rank.na_option=top
: NA values are assigned the smallest rank (highest priority).na_option=bottom
: NA values are assigned the largest rank (lowest priority).
# Output:
Keep NA:
A B C
0 1.0 1.0 2.0
1 2.0 NaN 1.0
2 NaN 2.0 3.0
3 3.0 3.0 NaN
Top NA:
A B C
0 2.0 2.0 3.0
1 3.0 1.0 2.0
2 1.0 3.0 4.0
3 4.0 4.0 1.0
Bottom NA:
A B C
0 1.0 1.0 2.0
1 2.0 4.0 1.0
2 4.0 2.0 3.0
3 3.0 3.0 4.0
FAQs on Pandas DataFrame rank() Method
The rank()
method assigns ranks to the values in a DataFrame or Series, with the rank indicating the position of each value relative to others. Ranks can be computed in various ways and can be sorted in ascending or descending order.
Use the axis parameter. Set axis=1
to rank values along rows. By default, axis=0
ranks along columns.
Set the ascending parameter to False
. By default, ascending=True
ranks in ascending order.
Set the pct parameter to True
. This will convert the ranks into a percentage of the total number of elements.
The default ranking method is average
. This means that ties receive the average of the ranks that they would have otherwise occupied.
Conclusion
In conclusion, the Pandas DataFrame rank()
method is a powerful tool for assigning ranks to values in a DataFrame or Series, with various options for handling ties, NA values, and ranking order. It provides flexibility with parameters like method
, na_option
, ascending
, and pct
to customize how ranks are calculated and presented. Understanding these options allows for effective data analysis and manipulation, enabling you to rank data in a way that best fits your analytical needs.
Happy Learning!!
Related Articles
- Pandas DataFrame mad() Method
- Pandas DataFrame mode() Method
- Pandas DataFrame div() Function
- Pandas DataFrame equals() Method
- Pandas DataFrame copy() Function
- Pandas DataFrame sample() Function
- Pandas DataFrame corrwith() Method
- Pandas DataFrame product() Method
- Pandas DataFrame pop() Method
- Pandas DataFrame abs() Method
- Pandas DataFrame dot() Method
- Pandas DataFrame mask() Method
- How to Compare Two Columns Using Pandas?