• Post author:
  • Post category:Pandas
  • Post last modified:December 6, 2024
  • Reading time:17 mins read
You are currently viewing Pandas DataFrame rank() Method

In Pandas, the rank() method is used to assign ranks to entries in a DataFrame based on their values. By default, the ranking is done in ascending order. The method can be particularly useful for statistical analysis or when you need to compare data points in terms of their relative positions.

Advertisements

In this article, I will explain the Pandas DataFrame rank() method by using its syntax, parameters, and usage, and how to return a DataFrame or Series of the same shape as the input, where each value is replaced by its rank. The ranks are assigned according to the specified ranking method and options.

Key Points –

  • rank() supports various methods for ranking, including average, min, max, first, and dense.
  • The method specifies how to handle ties (duplicate values) in the ranking process.
  • You can control the ranking order using the ascending parameter to rank in ascending or descending order.
  • The na_option parameter determines how NA values are treated during ranking.
  • The axis parameter allows you to rank data either across rows or columns.
  • By default, rank() ranks in ascending order and uses the average method for ties.

Syntax of Pandas DataFrame rank() Method

Following is the syntax of the pandas DataFrame rank() method.


# Syntax of dataframe rank() method
DataFrame.rank(axis=0, method='average', numeric_only=False, na_option='keep', ascending=True, pct=False)

Parameters of the DataFrame rank()

Following are the parameters of the DataFrame rank() method.

  • axis – Determines whether to rank along rows (axis=0) or columns (axis=1). Default is 0.
  • method – Specifies the method to use for ranking. Options include
    • average (default) – Average rank of the group of duplicates.
    • min – Minimum rank of the group of duplicates.
    • max – Maximum rank of the group of duplicates.
    • first – Ranks assigned in the order they appear in the data.
    • dense – Ranks are assigned without gaps between ranks.
  • numeric_only – bool, default False. For DataFrame objects, rank only numeric columns if set to True.
  • na_option – It includes keep, top, bottom, and the default is keep.
    • keep – Leave NA values as they are.
    • top – Assign smallest rank to NA values.
    • bottom -Assign largest rank to NA values.
  • ascending – Boolean value that specifies whether to rank in ascending order (True) or descending order (False). Default is True.
  • pct – Boolean value that determines whether to return the rank as a percentage of the total number of items. Default is False.

Return Value

  • It returns a Series or DataFrame with the ranks of the data as values.

Usage of Pandas DataFrame rank() Method

The rank() method in Pandas is used to assign ranks to the values in a DataFrame or Series. The ranks start at 1 and can be calculated in different ways.

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are AB, and C.


import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [5, 8, 13, 15],
    'B': [2, 9, 4, 6],
    'C': [8, 6, 12, 3] 
})

print("Original DataFrame:\n",df)

Yields below output.

To perform basic ranking on the DataFrame df, you use the rank() method.


# Apply basic ranking
df2 = df.rank()
print("Ranked DataFrame:\n", df2)

In the above example, each value in the DataFrame is replaced by its rank, with 1 being the smallest rank and ranks assigned in ascending order.

Ranking along rows

Alternatively, to rank the values along the rows in the DataFrame, you set the axis parameter to 1.


# Apply ranking along rows
df2 = df.rank(axis=1)
print("Ranked DataFrame (along rows):\n", df2)

# Output:
# Ranked DataFrame (along rows):
#      A    B    C
# 0  2.0  1.0  3.0
# 1  2.0  3.0  1.0
# 2  3.0  1.0  2.0
# 3  3.0  2.0  1.0

In the above example, the ranks are computed within each row, with the smallest value in each row receiving rank 1.

Different Ranking Methods

To apply different ranking methods to the DataFrame, you can use the method parameter in the rank() method.


# Apply different ranking methods
df_ranked_min = df.rank(method='min')
print("Min method:\n", df_ranked_min)

df_ranked_max = df.rank(method='max')
print("Max method:\n", df_ranked_max)

df_ranked_average = df.rank(method='average')
print("Average method:\n", df_ranked_average)

df_ranked_first = df.rank(method='first')
print("First method:\n", df_ranked_first)

df_ranked_dense = df.rank(method='dense')
print("Dense method:\n", df_ranked_dense)

Here, each ranking method handles ties differently.

  • Min – Assigns the smallest rank to all tied values.
  • Max – Assigns the largest rank to all tied values.
  • Average – Assigns the average rank to tied values.
  • First – Assigns ranks in the order the values appear.
  • Dense – Similar to min, but ranks are always incremented by 1 between groups.

Descending Order and Percentage Rank

Similarly, to apply descending order ranking and percentage ranking with the rank() method, you can use the ascending and pct parameters, respectively.


# Apply ranking in descending order
df_ranked_desc = df.rank(ascending=False)
print("Descending Order Ranking:\n", df_ranked_desc)

# Output:
# Descending Order Ranking:
#      A    B    C
# 0  4.0  4.0  2.0
# 1  3.0  1.0  3.0
# 2  2.0  3.0  1.0
# 3  1.0  2.0  4.0

Descending Order Ranking (ascending=False): Ranks are assigned in descending order, where the highest value gets the rank of 1, and the lowest gets the highest rank.


# Apply percentage ranking
df_ranked_pct = df.rank(pct=True)
print("Percentage Ranking:\n", df_ranked_pct)

# Output:
# Percentage Ranking:
#       A     B     C
# 0  0.25  0.25  0.75
# 1  0.50  1.00  0.50
# 2  0.75  0.50  1.00
# 3  1.00  0.75  0.25

Percentage Ranking (pct=True): Ranks are converted into percentages of the total number of elements, where the smallest value is ranked as 0.0 and the largest as 1.0, with intermediate values as fractions.

Handling NA values

Finally, to handle NA values during ranking in a DataFrame, you can use the na_option parameter in the rank() method.


import pandas as pd

# Sample DataFrame with NA values
df = pd.DataFrame({
    'A': [5, 8, None, 15],
    'B': [2, None, 4, 6],
    'C': [8, 6, 12, None] 
})

# Apply ranking with different NA options
df_ranked_keep = df.rank(na_option='keep')
print("Keep NA:\n", df_ranked_keep)

df_ranked_top = df.rank(na_option='top')
print("Top NA:\n", df_ranked_top)

df_ranked_bottom = df.rank(na_option='bottom')
print("Bottom NA:\n", df_ranked_bottom)

Here,

  • na_option=keep: NA values remain NA in the output DataFrame and do not receive a rank.
  • na_option=top: NA values are assigned the smallest rank (highest priority).
  • na_option=bottom: NA values are assigned the largest rank (lowest priority).

# Output:
Keep NA:
      A    B    C
0  1.0  1.0  2.0
1  2.0  NaN  1.0
2  NaN  2.0  3.0
3  3.0  3.0  NaN
Top NA:
      A    B    C
0  2.0  2.0  3.0
1  3.0  1.0  2.0
2  1.0  3.0  4.0
3  4.0  4.0  1.0
Bottom NA:
      A    B    C
0  1.0  1.0  2.0
1  2.0  4.0  1.0
2  4.0  2.0  3.0
3  3.0  3.0  4.0

FAQs on Pandas DataFrame rank() Method

What does the rank() method do?

The rank() method assigns ranks to the values in a DataFrame or Series, with the rank indicating the position of each value relative to others. Ranks can be computed in various ways and can be sorted in ascending or descending order.

How do I rank values along rows instead of columns?

Use the axis parameter. Set axis=1 to rank values along rows. By default, axis=0 ranks along columns.

How do I rank values in descending order?

Set the ascending parameter to False. By default, ascending=True ranks in ascending order.

Can I get percentage ranks instead of absolute ranks?

Set the pct parameter to True. This will convert the ranks into a percentage of the total number of elements.

What is the default ranking method if none is specified?

The default ranking method is average. This means that ties receive the average of the ranks that they would have otherwise occupied.

Conclusion

In conclusion, the Pandas DataFrame rank() method is a powerful tool for assigning ranks to values in a DataFrame or Series, with various options for handling ties, NA values, and ranking order. It provides flexibility with parameters like method, na_option, ascending, and pct to customize how ranks are calculated and presented. Understanding these options allows for effective data analysis and manipulation, enabling you to rank data in a way that best fits your analytical needs.

Happy Learning!!

Reference