In pandas, the mode()
method is used to find the mode(s) of each column or row in a DataFrame. The mode is the value that appears most frequently in a set of data. This method returns a DataFrame that contains the mode(s) for each column or row.
In this article, I will explain the Pandas DataFrame mode()
method by using its syntax, parameters, and usage, and how to return one or more modes if there are multiple values that are equally frequent.
Key Points –
- The
mode()
method finds the mode(s), or most frequently occurring value(s), in each column or row of a DataFrame. - The
axis
parameter allows you to specify whether to apply the function along rows (axis=0) or columns (axis=1). - The
numeric_only
parameter, when set to True, includes only numeric data (int, float, and boolean) in the computation. - The
dropna
parameter, when set to True, ignores NaN values when calculating the mode. - If there are multiple modes, the
mode()
method returns all of them, and if no mode is found, the result will be NaN for that column or row.
Syntax of Pandas DataFrame mode() Method
Following is the syntax of the pandas DataFrame.mode() method.
# Syntax of DataFrame.mode() method
DataFrame.mode(axis=0, numeric_only=False, dropna=True)
Parameters of the mode()
Following are the parameters of the mode() method
axis
– {0 orindex
, 1 orcolumn
}, default 0- 0 or
index
: Apply the function along the index (rows). - 1 or
columns
: Apply the function along the columns.
- 0 or
numeric_only
– bool, default False- If True, it includes only float, int, and boolean data. If False, it includes all data types.
dropna
– bool, default True- If True, it does not consider NaN values when finding the mode. If False, NaN values are considered.
Return Value
It returns a DataFrame with the mode(s) for each column or row.
Usage of Pandas DataFrame mode() Method
The mode()
method in pandas is used to identify the most frequently occurring value(s) in each column or row of a DataFrame.
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are A
, B
, and C
.
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'A': [2, 5, 3, 5, 6],
'B': [4, 7, 4, 9, 2],
'C': [12, 8, 8, 6, 8]
})
print("Original DataFrame:\n",df)
Yields below output.
Finding Mode of Each Column
To find the mode of each column in a Pandas DataFrame, you can use the mode()
function. This function returns the mode(s) of each column in the DataFrame. The mode is the value that appears most frequently in the column. If there are multiple values with the same highest frequency, mode()
will return all of them.
# Find the mode of each column
df2 = df.mode()
print("Mode of each column:\n", df2)
Here,
Column A
: The mode is5
because it appears most frequently (twice).Column B
: The mode is4
because it appears most frequently (twice).Column C
: The mode is8
because it appears most frequently (three times).
Finding Mode of Each Row
Alternatively, to find the mode of each row in a Pandas DataFrame, you can use the mode()
function with the axis=1
argument. This tells Pandas to operate along the rows (rather than columns). Each row’s mode will be calculated, and if there are multiple modes, all will be returned.
# Find the mode of each row
df2 = df.mode(axis=1)
print("Mode of each row:\n", df2)
Yields below output.
# Mode of each row:
0 1 2
0 2 4 12
1 5 7 8
2 3 4 8
3 5 6 9
4 2 6 8
Mode Including NaN values
To handle NaN values while finding the mode in a DataFrame, you can use the dropna=False
parameter with the mode()
method. This parameter tells pandas to include NaN values in the mode calculation.
import pandas as pd
# Sample DataFrame with NaN values
df = pd.DataFrame({
'A': [2, 5, 3, 5, None],
'B': [4, 7, 4, None, 2],
'C': [12, 8, 15, 6, 8]
})
print("Original DataFrame with NaN values:\n", df)
# Finding the mode including NaN values
df2 = df.mode(dropna=False)
print("Mode including NaN values:\n", df2)
The dropna=False
option is useful when you want to include NaN values in your mode calculations, although typically NaN values are not included in mode calculations by default. This example yields the below output.
Original DataFrame with NaN values:
A B C
0 2.0 4.0 12
1 5.0 7.0 8
2 3.0 4.0 15
3 5.0 NaN 6
4 NaN 2.0 8
Mode including NaN values:
A B C
0 5.0 4.0 8
Mode with only Numeric Data (Excluding NoN-Numeric Types)
Similarly, When working with a DataFrame that contains a mix of numeric and non-numeric data, you may want to compute the mode while excluding non-numeric types. The numeric_only=True
parameter in the mode()
method allows you to include only numeric data in the computation.
import pandas as pd
# Sample DataFrame with numeric and non-numeric data
df_mixed = pd.DataFrame({
'A': [2, 5, 3, 5, 'Pandas'],
'B': [4, 7, 4, 9, 'Spark'],
'C': [12, 8, 15, 6, 8]
})
print("Original DataFrame with mixed data types:\n", df_mixed)
# Finding the mode including only numeric data
df2 = df_mixed.mode(numeric_only=True)
print("Mode with only numeric data:\n", df2)
By using numeric_only=True
, you ensure that only numeric columns are considered when computing the mode, which is useful when your DataFrame contains mixed data types. This example yields the below output.
Original DataFrame with mixed data types:
A B C
0 2 4 12
1 5 7 8
2 3 4 15
3 5 9 6
4 Pandas Spark 8
Mode with only numeric data:
C
0 8
FAQ on Pandas DataFrame mode() Method
The mode()
method in pandas finds the mode(s) of each column or row in a DataFrame. The mode is the value that appears most frequently.
To find the mode for each row in a pandas DataFrame, you can use the mode()
method with the axis=1
parameter. This will calculate the mode along the rows instead of the default behavior, which calculates the mode along the columns.
The mode()
method can handle NaN values. By default, NaN values are ignored, but you can include them in the mode calculation by setting the dropna
parameter to False
:
To find the mode for only numeric data, excluding non-numeric types, you can use the mode()
method with the numeric_only=True
parameter. This ensures that the mode calculation is performed only on numeric columns, ignoring any non-numeric data.
If there are multiple modes, the mode()
method returns all of them. The result will have multiple rows for columns or rows with multiple modes.
Conclusion
In conclusion, the pandas mode()
method is a versatile tool for identifying the most frequently occurring values in a DataFrame. It provides flexibility to handle various scenarios, including working with rows, columns, NaN values, and mixed data types. By utilizing parameters like axis
, numeric_only
, and dropna
, you can customize the mode calculation to suit your specific needs.
Happy Learning!!
Related Articles
- Pandas DataFrame median() Method
- Pandas DataFrame tail() Method
- Pandas DataFrame pivot() Method
- Pandas DataFrame abs() Method
- Pandas DataFrame mask() Method
- Pandas DataFrame explode() Method
- Pandas DataFrame nunique() Method
- Pandas DataFrame pop() Method
- Pandas DataFrame clip() Method
- Pandas DataFrame sum() Method
- Pandas DataFrame shift() Function
- Pandas DataFrame info() Function
- Pandas DataFrame head() Method
- Pandas DataFrame product() Method