Pandas Count Unique Values in Column

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:January 17, 2023
Spread the love

How to count unique values of a column in pandas DataFrame? – When working on machine learning or data analysis with pandas we are often required to get the count of unique or distinct values from a single column or multiple columns.

You can get the number of unique values in the column of pandas DataFrame using several ways like using functions Series.unique.size, Series.nunique(), Series.drop_duplicates().size(). Since the DataFrame column is internally represented as a Series, you can use these functions to perform the operation.

1. Quick Examples of Count Unique Values in Column

Following are quick examples of how to count unique values in column.


#Below are quick examples

# Get Unique Count using Series.unique()
count = df.Courses.unique().size

# Using Series.nunique()
count = df.Courses.nunique()

# Get frequency of each value
frequence = df.Courses.value_counts()

# By using drop_duplicates()
count = df.Courses.drop_duplicates().size

#Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print(count)

#Count unique on multiple columns
count = df[['Courses','Fee']].nunique()

#count unique values in each row
#df.nunique(axis=1)

Let’s create a DataFrame.


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","Pandas","Python","Spark","Pandas"],
    'Fee' :[20000,25000,22000,30000,25000,20000,30000],
    'Duration':['30days','40days','35days','50days','40days','30days','50days'],
    'Discount':[1000,2300,1200,2000,2300,1000,2000]
              }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses    Fee Duration  Discount
0    Spark  20000   30days      1000
1  PySpark  25000   40days      2300
2   Python  22000   35days      1200
3   Pandas  30000   50days      2000
4   Python  25000   40days      2300
5    Spark  20000   30days      1000
6   Pandas  30000   50days      2000

2. Pandas Count Unique Values

To count unique values in the pandas dataframe column use Series.unique() function and then call the size to get the count. Series.unique() function get all unique values from a column by removing duplicate values and this function returns a ndarray with unique value in the order of appearance and the results are not sorted.


Syntax: Series.unique()

Example:


# Get Unique Count using Series.unique()
count = df.Courses.unique().size
print("Unique values count : "+ str(count))

# Output
# Unique values count : 4

3. Series.nunique()

Alternatively, you can also try using Series.nunique(), this returns the number of unique elements in the object excluding NaN values. If you wanted to include NaN values use dropna param to False.


Syntax: Series.nunique(dropna=True)

Example:


# Using Series.nunique()
count = df.Courses.nunique()
print("Unique values count : "+ str(count))

# Outputs
# Unique values count : 4

4. Using Series.drop_duplicates()

User Series drop_duplicates() to remove the distinct values from coumn and use the size to get the count.


# By using drop_duplicates()
count = df.Courses.drop_duplicates().size
print("Unique values count : "+ str(count))

# Outputs
# Unique values count : 4

4. Pandas Count Unique Values in Multiple Columns

In order to get the count of unique values on multiple columns use pandas DataFrame.drop_duplicates() which drop duplicate rows from pandas DataFrame. This eliminates duplicates and return DataFrame with unique rows.

On the result use shape property to get the shape of the DataFrame which ideally returns a tuple with rows and columns, use shape[0] to get the row count.


# Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print("Unique multiple columns : "+ str(count))

# Outputs
# Unique multiple columns : 5

4. Get How May Times Value Occur

In case you want to get the frequency of a column use Series.value_counts(), This returns the Count of Frequency of a Value in Column


print(df.Courses.value_counts())

# Outputs
Spark      2
Python     2
Pandas     2
PySpark    1
Name: Courses, dtype: int64

5. Count Unique Values in Row

Use axis=1 to get the count of unique values in Row.


# Count unique values in each row
df.nunique(axis=1)

7. Complete Example


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","Pandas","Python","Spark","Pandas"],
    'Fee' :[20000,25000,22000,30000,25000,20000,30000],
    'Duration':['30days','40days','35days','50days','40days','30days','50days'],
    'Discount':[1000,2300,1200,2000,2300,1000,2000]
              }
df = pd.DataFrame(technologies)
print(df)


# Get Unique Count using Series.unique()
count = df.Courses.unique().size
print("Unique values count : "+ str(count))
# Using Series.nunique()
count = df.Courses.nunique()
print("Unique values count : "+ str(count))

# Get frequency of each value
frequency = df.Courses.value_counts()
print(frequency)

# By using drop_duplicates()
count = df.Courses.drop_duplicates().size
print("Unique values count : "+ str(count))

#Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print("Unique multiple columns : "+ str(count))

#Count unique on multiple columns
count = df[['Courses','Fee']].nunique()
print(count)

#count unique values in each row
#df.nunique(axis=1)

Conclusion

In this article, you have learned how to get the count of unique values of a pandas DataFrame column using Series.unique(), Series.nunique(), Series.drop_duplicates() and also learned how to get the distinct values from multiple columns.

Related Articles

References

Leave a Reply

You are currently viewing Pandas Count Unique Values in Column