Pandas Count Unique Values in Column

How to count the unique values of a column in Pandas DataFrame? – When working on machine learning or data analysis with Pandas we are often required to get the count of unique or distinct values from a single column or multiple columns.

1. Quick Examples of Count Unique Values in Column

Following are quick examples of how to count unique values in the column.


# Below are the quick examples

# Example 1: Get Unique Count using Series.unique()
count = df.Courses.unique().size

# Example 2: Using Series.nunique()
count = df.Courses.nunique()

# Example 3: Get frequency of each value
frequence = df.Courses.value_counts()

# Example 4: By using drop_duplicates()
count = df.Courses.drop_duplicates().size

# Example 5: Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print(count)

# Example 6: Count unique on multiple columns
count = df[['Courses','Fee']].nunique()

# Example 7: Count unique values in each row
df.nunique(axis=1)

Let’s create a DataFrame.


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","Pandas","Python","Spark","Pandas"],
    'Fee' :[20000,25000,22000,30000,25000,20000,30000],
    'Duration':['30days','40days','35days','50days','40days','30days','50days'],
    'Discount':[1000,2300,1200,2000,2300,1000,2000]
              }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

2. Pandas Count Unique Values

To count unique values in the Pandas DataFrame column use the Series.unique() function along with the size attribute. The series.unique() function returns all unique values from a column by removing duplicate values and the size attribute returns a count of unique values in a column of DataFrame.


Syntax of unique() 
Series.unique()

Example:


# Get Unique Count using Series.unique()
count = df.Courses.unique().size
print("Get the count of unique values in a columncoun:\n", "+ str(count))

Yields below output.

3. Series.nunique()

Alternatively, you can also try using Series.nunique(), this returns the number of unique elements in the object excluding NaN values. If you want to include NaN values use dropna param to False.


Syntax of nunique() 
Series.nunique(dropna=True)

Example:


# Using Series.nunique()
count = df.Courses.nunique()
print("Unique values count : "+ str(count))

# Output:
# Unique values count : 4

4. Using Series.drop_duplicates()

User Series drop_duplicates() to remove the duplicate values from the column and use the size attribute to get the count.


# By using drop_duplicates()
count = df.Courses.drop_duplicates().size
print("Unique values count : "+ str(count))

# Output:
# Unique values count : 4

5. Pandas Count Unique Values in Multiple Columns

In order to get the count of unique values on multiple columns use pandas DataFrame.drop_duplicates() which drop duplicate rows from pandas DataFrame. This eliminates duplicates and returns DataFrame with unique rows.

As a result use shape property to get the shape of the DataFrame which ideally returns a tuple with rows and columns, use shape[0] to get the row count.


# Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print("Unique multiple columns : "+ str(count))

# Outputs:
# Unique multiple columns : 5

6. Get How May Times Value Occur

In case you want to get the frequency of a column use Series.value_counts(). This function returns a Series with the counts of unique values in the specified column. The index of the Series contains unique values, and the corresponding values represent the counts of each unique value in the column.


# Get count of each column
print(df.Courses.value_counts())

# Outputs:
# Spark      2
# Python     2
# Pandas     2
# PySpark    1
# Name: Courses, dtype: int64

7. Count Unique Values in Row

You can use df.nunique(axis=1) to get the count of unique values in Row. For example,


# Count unique values in each row
df.nunique(axis=1)

8. Complete Example


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","Pandas","Python","Spark","Pandas"],
    'Fee' :[20000,25000,22000,30000,25000,20000,30000],
    'Duration':['30days','40days','35days','50days','40days','30days','50days'],
    'Discount':[1000,2300,1200,2000,2300,1000,2000]
              }
df = pd.DataFrame(technologies)
print(df)


# Get Unique Count using Series.unique()
count = df.Courses.unique().size
print("Unique values count : "+ str(count))
# Using Series.nunique()
count = df.Courses.nunique()
print("Unique values count : "+ str(count))

# Get frequency of each value
frequency = df.Courses.value_counts()
print(frequency)

# By using drop_duplicates()
count = df.Courses.drop_duplicates().size
print("Unique values count : "+ str(count))

# Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print("Unique multiple columns : "+ str(count))

# Count unique on multiple columns
count = df[['Courses','Fee']].nunique()
print(count)

# Count unique values in each row
#df.nunique(axis=1)

Conclusion

In this article, you have learned how to get the count of unique values of a pandas DataFrame column using Series.unique(), Series.nunique(), and Series.drop_duplicates(). Also learned how to get the count of distinct values from multiple columns.

References

https://pandas.pydata.org/docs/reference/api/pandas.Series.nunique.html