• Post author:
  • Post category:Pandas
  • Post last modified:December 6, 2024
  • Reading time:16 mins read
You are currently viewing Pandas DataFrame mode() Method

In pandas, the mode() method is used to find the mode(s) of each column or row in a DataFrame. The mode is the value that appears most frequently in a set of data. This method returns a DataFrame that contains the mode(s) for each column or row.

Advertisements

In this article, I will explain the Pandas DataFrame mode() method by using its syntax, parameters, and usage, and how to return one or more modes if there are multiple values that are equally frequent.

Key Points –

  • The mode() method finds the mode(s), or most frequently occurring value(s), in each column or row of a DataFrame.
  • The axis parameter allows you to specify whether to apply the function along rows (axis=0) or columns (axis=1).
  • The numeric_only parameter, when set to True, includes only numeric data (int, float, and boolean) in the computation.
  • The dropna parameter, when set to True, ignores NaN values when calculating the mode.
  • If there are multiple modes, the mode() method returns all of them, and if no mode is found, the result will be NaN for that column or row.

Syntax of Pandas DataFrame mode() Method

Following is the syntax of the pandas DataFrame.mode() method.


# Syntax of DataFrame.mode() method
DataFrame.mode(axis=0, numeric_only=False, dropna=True)

Parameters of the mode()

Following are the parameters of the mode() method

  • axis – {0 or index, 1 or column}, default 0
    • 0 or index: Apply the function along the index (rows).
    • 1 or columns: Apply the function along the columns.
  • numeric_only – bool, default False
    • If True, it includes only float, int, and boolean data. If False, it includes all data types.
  • dropna – bool, default True
    • If True, it does not consider NaN values when finding the mode. If False, NaN values are considered.

Return Value

It returns a DataFrame with the mode(s) for each column or row.

Usage of Pandas DataFrame mode() Method

The mode() method in pandas is used to identify the most frequently occurring value(s) in each column or row of a DataFrame.

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are A, B, and C.


import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [2, 5, 3, 5, 6],
    'B': [4, 7, 4, 9, 2],
    'C': [12, 8, 8, 6, 8] 
})
print("Original DataFrame:\n",df)

Yields below output.

pandas mode

Finding Mode of Each Column

To find the mode of each column in a Pandas DataFrame, you can use the mode() function. This function returns the mode(s) of each column in the DataFrame. The mode is the value that appears most frequently in the column. If there are multiple values with the same highest frequency, mode() will return all of them.


# Find the mode of each column
df2 = df.mode()
print("Mode of each column:\n", df2)

Here,

  • Column A: The mode is 5 because it appears most frequently (twice).
  • Column B: The mode is 4 because it appears most frequently (twice).
  • Column C: The mode is 8 because it appears most frequently (three times).
pandas mode

Finding Mode of Each Row

Alternatively, to find the mode of each row in a Pandas DataFrame, you can use the mode() function with the axis=1 argument. This tells Pandas to operate along the rows (rather than columns). Each row’s mode will be calculated, and if there are multiple modes, all will be returned.


# Find the mode of each row
df2 = df.mode(axis=1)
print("Mode of each row:\n", df2)

Yields below output.


# Mode of each row:
   0  1   2
0  2  4  12
1  5  7   8
2  3  4   8
3  5  6   9
4  2  6   8

Mode Including NaN values

To handle NaN values while finding the mode in a DataFrame, you can use the dropna=False parameter with the mode() method. This parameter tells pandas to include NaN values in the mode calculation.


import pandas as pd

# Sample DataFrame with NaN values
df = pd.DataFrame({
    'A': [2, 5, 3, 5, None],
    'B': [4, 7, 4, None, 2],
    'C': [12, 8, 15, 6, 8]
})
print("Original DataFrame with NaN values:\n", df)

# Finding the mode including NaN values
df2 = df.mode(dropna=False)
print("Mode including NaN values:\n", df2)

The dropna=False option is useful when you want to include NaN values in your mode calculations, although typically NaN values are not included in mode calculations by default. This example yields the below output.


Original DataFrame with NaN values:
     A    B   C
0  2.0  4.0  12
1  5.0  7.0   8
2  3.0  4.0  15
3  5.0  NaN   6
4  NaN  2.0   8
Mode including NaN values:
      A    B  C
0  5.0  4.0  8

Mode with only Numeric Data (Excluding NoN-Numeric Types)

Similarly, When working with a DataFrame that contains a mix of numeric and non-numeric data, you may want to compute the mode while excluding non-numeric types. The numeric_only=True parameter in the mode() method allows you to include only numeric data in the computation.


import pandas as pd

# Sample DataFrame with numeric and non-numeric data
df_mixed = pd.DataFrame({
    'A': [2, 5, 3, 5, 'Pandas'],
    'B': [4, 7, 4, 9, 'Spark'],
    'C': [12, 8, 15, 6, 8]
})
print("Original DataFrame with mixed data types:\n", df_mixed)

# Finding the mode including only numeric data
df2 = df_mixed.mode(numeric_only=True)
print("Mode with only numeric data:\n", df2)

By using numeric_only=True, you ensure that only numeric columns are considered when computing the mode, which is useful when your DataFrame contains mixed data types. This example yields the below output.


Original DataFrame with mixed data types:
         A      B   C
0       2      4  12
1       5      7   8
2       3      4  15
3       5      9   6
4  Pandas  Spark   8
Mode with only numeric data:
    C
0  8

FAQ on Pandas DataFrame mode() Method

What does the mode() method do in pandas?

The mode() method in pandas finds the mode(s) of each column or row in a DataFrame. The mode is the value that appears most frequently.

How do you find the mode for each row?

To find the mode for each row in a pandas DataFrame, you can use the mode() method with the axis=1 parameter. This will calculate the mode along the rows instead of the default behavior, which calculates the mode along the columns.

Can the mode() method handle DataFrames with NaN values?

The mode() method can handle NaN values. By default, NaN values are ignored, but you can include them in the mode calculation by setting the dropna parameter to False:

How do you find the mode for only numeric data, excluding non-numeric types?

To find the mode for only numeric data, excluding non-numeric types, you can use the mode() method with the numeric_only=True parameter. This ensures that the mode calculation is performed only on numeric columns, ignoring any non-numeric data.

What happens if there are multiple modes in a column or row?

If there are multiple modes, the mode() method returns all of them. The result will have multiple rows for columns or rows with multiple modes.

Conclusion

In conclusion, the pandas mode() method is a versatile tool for identifying the most frequently occurring values in a DataFrame. It provides flexibility to handle various scenarios, including working with rows, columns, NaN values, and mixed data types. By utilizing parameters like axis, numeric_only, and dropna, you can customize the mode calculation to suit your specific needs.

Happy Learning!!

References