• Post author:
• Post category:Pandas

How to count the unique values of a column in Pandas DataFrame? – When working on machine learning or data analysis with Pandas we are often required to get the count of unique or distinct values from a single column or multiple columns.

You can get the number of unique values in the column of pandas DataFrame using several ways like using functions `Series.unique.size`, `Series.nunique()`, and `Series.drop_duplicates().size()`. Since the DataFrame column is internally represented as a Series, you can use these functions to perform the operation.

## 1. Quick Examples of Count Unique Values in Column

Following are quick examples of how to count unique values in the column.

``````
# Below are the quick examples

# Example 1: Get Unique Count using Series.unique()
count = df.Courses.unique().size

# Example 2: Using Series.nunique()
count = df.Courses.nunique()

# Example 3: Get frequency of each value
frequence = df.Courses.value_counts()

# Example 4: By using drop_duplicates()
count = df.Courses.drop_duplicates().size

# Example 5: Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print(count)

# Example 6: Count unique on multiple columns
count = df[['Courses','Fee']].nunique()

# Example 7: Count unique values in each row
df.nunique(axis=1)
``````

Let’s create a DataFrame.

``````
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Python","Pandas","Python","Spark","Pandas"],
'Fee' :[20000,25000,22000,30000,25000,20000,30000],
'Duration':['30days','40days','35days','50days','40days','30days','50days'],
'Discount':[1000,2300,1200,2000,2300,1000,2000]
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
``````

Yields below output.

## 2. Pandas Count Unique Values

To count unique values in the Pandas DataFrame column use the `Series.unique()` function along with the size attribute. The series.unique() function returns all unique values from a column by removing duplicate values and the size attribute returns a count of unique values in a column of DataFrame.

``````
Syntax of unique()
Series.unique()
``````

Example:

``````
# Get Unique Count using Series.unique()
count = df.Courses.unique().size
print("Get the count of unique values in a columncoun:\n", "+ str(count))
``````

Yields below output.

## 3. Series.nunique()

Alternatively, you can also try using `Series.nunique()`, this returns the number of unique elements in the object excluding `NaN` values. If you want to include `NaN` values use `dropna` param to `False`.

``````
Syntax of nunique()
Series.nunique(dropna=True)
``````

Example:

``````
# Using Series.nunique()
count = df.Courses.nunique()
print("Unique values count : "+ str(count))

# Output:
# Unique values count : 4
``````

## 4. Using Series.drop_duplicates()

User Series drop_duplicates() to remove the duplicate values from the column and use the `size` attribute to get the count.

``````
# By using drop_duplicates()
count = df.Courses.drop_duplicates().size
print("Unique values count : "+ str(count))

# Output:
# Unique values count : 4
``````

## 5. Pandas Count Unique Values in Multiple Columns

In order to get the count of unique values on multiple columns use pandas `DataFrame.drop_duplicates()` which drop duplicate rows from pandas DataFrame. This eliminates duplicates and returns DataFrame with unique rows.

As a result use `shape` property to get the shape of the DataFrame which ideally returns a tuple with rows and columns, use shape[0] to get the row count.

``````
# Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print("Unique multiple columns : "+ str(count))

# Outputs:
# Unique multiple columns : 5
``````

## 6. Get How May Times Value Occur

In case you want to get the frequency of a column use `Series.value_counts()`. This function returns a Series with the counts of unique values in the specified column. The index of the Series contains unique values, and the corresponding values represent the counts of each unique value in the column.

``````
# Get count of each column
print(df.Courses.value_counts())

# Outputs:
# Spark      2
# Python     2
# Pandas     2
# PySpark    1
# Name: Courses, dtype: int64
``````

## 7. Count Unique Values in Row

You can use `df.nunique(axis=1)` to get the count of unique values in Row. For example,

``````
# Count unique values in each row
df.nunique(axis=1)
``````

## 8. Complete Example

``````
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Python","Pandas","Python","Spark","Pandas"],
'Fee' :[20000,25000,22000,30000,25000,20000,30000],
'Duration':['30days','40days','35days','50days','40days','30days','50days'],
'Discount':[1000,2300,1200,2000,2300,1000,2000]
}
df = pd.DataFrame(technologies)
print(df)

# Get Unique Count using Series.unique()
count = df.Courses.unique().size
print("Unique values count : "+ str(count))
# Using Series.nunique()
count = df.Courses.nunique()
print("Unique values count : "+ str(count))

# Get frequency of each value
frequency = df.Courses.value_counts()
print(frequency)

# By using drop_duplicates()
count = df.Courses.drop_duplicates().size
print("Unique values count : "+ str(count))

# Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print("Unique multiple columns : "+ str(count))

# Count unique on multiple columns
count = df[['Courses','Fee']].nunique()
print(count)

# Count unique values in each row
#df.nunique(axis=1)
``````

## Conclusion

In this article, you have learned how to get the count of unique values of a pandas DataFrame column using Series.unique(), Series.nunique(), and Series.drop_duplicates(). Also learned how to get the count of distinct values from multiple columns.