In pandas, the nunique()
method is used to count the number of unique values along a specified axis of a DataFrame. This method is useful for data exploration and understanding the distribution of unique values within your dataset.
In this article, I will explain the Pandas DataFrame nunique()
method by using its syntax, parameters, and usage, and how to return a series with the count of unique values for each column or row.
Key Points –
- Counts the number of unique values in each column or row of a DataFrame.
- Accepts
axis
to specify whether to count along columns (0 or ‘index’) or rows (1 or ‘columns’), anddropna
to include or exclude NA/null values. - Commonly used for data exploration to understand the uniqueness and distribution of data.
- Can handle missing values by either including or excluding them in the count, depending on the
dropna
parameter.
Pandas DataFrame nunique() Introduction
Let’s know the syntax of the nunique() method.
# Syntax of Pandas DataFrame nunique()
DataFrame.nunique(axis=0, dropna=True)
Parameters of the DataFrame nunique()
Following are the parameters of the DataFrame nunique() function.
axis
– {0 or ‘index’, 1 or ‘columns’}, default 00
or'index'
: Count unique values for each column.1
or'columns'
: Count unique values for each row.
dropna
– bool, default True- If
True
, NaN values are excluded from the count. - If
False
, NaN values are included in the count.
- If
Return Value
It returns a Series with the count of unique values for each column or row, depending on the axis specified.
Usage of Pandas DataFrame nunique() Method
The nunique()
method in pandas is used to count the number of unique values along an axis of a DataFrame or Series
To run some examples of the Pandas DataFrame nunique() method, let’s create two Pandas DataFrames using data from Python dictionaries.
# Create DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[5, 10, 5, 15],
'B': [4, 6, 4, 2],
'C': [3, 7, 5, 9]})
print("Original DataFrame:\n",df)
Yields below output.
To count the unique values for each column in the DataFrame, you can use the nunique()
method.
# Counting unique values for each column
df2 = df.nunique()
print("Unique values for each column:\n", df2)
# Counting unique values for each column
df2 = df.nunique(axis=0)
print("Unique values for each column:\n", df2)
Yields below output.
Counting Unique Values for Each Row
Alternatively, to count the unique values for each row in the DataFrame, you can use the nunique()
method with axis=1
.
# Counting unique values for each row
df2 = df.nunique(axis=1)
print("Unique values for each row:\n", df2)
# Output:
# Unique values for each row:
# 0 3
# 1 3
# 2 2
# 3 3
# dtype: int64
Including NaN Values in the Count
To include NaN values in the count of unique values for each column or row in a DataFrame, you can set the dropna
parameter of the nunique()
method to False
.
Counting Unique Values for Each Column Including NaN
To count the unique values for each column in a DataFrame, including NaN values, you can use the nunique()
method with the dropna=False
parameter.
import pandas as pd
import numpy as np
# Creating the DataFrame with NaN values
df = pd.DataFrame({
'A': [5, 10, 5, np.nan],
'B': [4, 6, 4, 2],
'C': [3, 7, np.nan, 9]
})
# Counting unique values for each column, including NaN values
df2 = df.nunique(axis=0, dropna=False)
print("Unique values for each column (including NaN):\n", df2)
# Output:
# Unique values for each column (including NaN):
# A 3
# B 3
# C 4
# dtype: int64
Counting Unique Values for Each Row Including NaN
To count the unique values for each row in a DataFrame, including NaN values, you can use the nunique()
method with axis=1
and dropna=False
.
# Counting unique values for each row, including NaN values
df2 = df.nunique(axis=1, dropna=False)
print("Unique values for each row (including NaN):\n", df2)
# Output:
# Unique values for each row (including NaN):
# 0 3
# 1 3
# 2 3
# 3 3
# dtype: int64
Frequently Asked Questions Pandas DataFrame nunique() Method
The nunique()
method in Pandas is used to count the number of unique values along a specified axis (rows or columns) of a DataFrame.
You can use df.nunique(axis=0)
to count unique values in each column (axis=0
is the default for columns).
By using df.nunique(axis=1)
, you can count unique values in each row of the DataFrame (axis=1
for rows).
By default, the nunique()
method excludes NaN values (dropna=True
). You can include NaN values in the count by setting dropna=False
.
Use nunique()
when you need to quickly understand the diversity and distribution of unique values within your dataset, which is useful for data exploration and initial data analysis tasks.
Conclusion
In this article, I have explained the Pandas DataFrame nunique()
function by using its syntax, parameters, usage, and how to return a Pandas Series object with the count of unique values for each column or row, depending on the specified axis.
Happy Learning!!
Related Articles
- Pandas DataFrame tail() Method
- Pandas DataFrame pivot() Method
- Pandas DataFrame sum() Method
- Pandas DataFrame shift() Function
- Pandas DataFrame info() Function
- Pandas DataFrame head() Method
- Pandas DataFrame equals() Method
- Pandas DataFrame sample() Function
- Pandas DataFrame describe() Method
- Pandas DataFrame explode() Method