Pandas Count Unique Values in Column

How to count unique values of a column in pandas DataFrame? – When working on machine learning or data analysis with pandas we are often required to get the count of unique or distinct values from a single column or multiple columns.

You can get the number of unique values in the column of pandas DataFrame using several ways like using functions Series.unique.size, Series.nunique(), Series.drop_duplicates().size(). Since the DataFrame column is internally represented as a Series, you can use these functions to perform the operation.

1. Quick Examples of Count Unique Values in Column

Following are quick examples of how to count unique values in column.


#Below are quick examples

# Get Unique Count using Series.unique()
count = df.Courses.unique().size

# Using Series.nunique()
count = df.Courses.nunique()

# Get frequency of each value
frequence = df.Courses.value_counts()

# By using drop_duplicates()
count = df.Courses.drop_duplicates().size

#Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print(count)

#Count unique on multiple columns
count = df[['Courses','Fee']].nunique()

#count unique values in each row
#df.nunique(axis=1)

Let’s create a DataFrame.


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","Pandas","Python","Spark","Pandas"],
    'Fee' :[20000,25000,22000,30000,25000,20000,30000],
    'Duration':['30days','40days','35days','50days','40days','30days','50days'],
    'Discount':[1000,2300,1200,2000,2300,1000,2000]
              }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses    Fee Duration  Discount
0    Spark  20000   30days      1000
1  PySpark  25000   40days      2300
2   Python  22000   35days      1200
3   Pandas  30000   50days      2000
4   Python  25000   40days      2300
5    Spark  20000   30days      1000
6   Pandas  30000   50days      2000

2. Use Series.unique() – Count Unique Values

To get a count of unique values in a column use pandas, first use Series.unique() function to get unique values from column by removing duplidate values and then call the size to get the count. unique() function returns a ndarray with unique value in order of appearance and the results are not sorted.


Syntax: Series.unique()

Example:


# Get Unique Count using Series.unique()
count = df.Courses.unique().size
print("Unique values count : "+ str(count))

# Output
# Unique values count : 4

3. Series.nunique()

Alternatively, you can also try using Series.nunique(), this returns number of unique elements in the object excluding NaN values. If you wanted to include NaN values use dropna param to False.


Syntax: Series.nunique(dropna=True)

Example:


# Using Series.nunique()
count = df.Courses.nunique()
print("Unique values count : "+ str(count))

# Outputs
# Unique values count : 4

4. Using Series.drop_duplicates()

User Series drop_duplicates() to remove the distinct values from coumn and use the size to get the count.


# By using drop_duplicates()
count = df.Courses.drop_duplicates().size
print("Unique values count : "+ str(count))

# Outputs
# Unique values count : 4

4. Count Unique Values in Multiple Columns

In order to get the count of unique values on multiple columns use pandas DataFrame.drop_duplicates() which drop duplicate rows from pandas DataFrame. This eliminates duplicates and return DataFrame with unique rows.

On the result use shape property to get the shape of the DataFrame which ideally returns a tuple with rows and columns, use shape[0] to get the row count.


# Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print("Unique multiple columns : "+ str(count))

# Outputs
# Unique multiple columns : 5

4. Get How May Times Value Ocuur

In case if you want to get the frequency of a column use Series.value_counts(), This returns the

Count of Frequency of a Value in Column


print(df.Courses.value_counts())

# Outputs
Spark      2
Python     2
Pandas     2
PySpark    1
Name: Courses, dtype: int64

5. Count Unique Values in Row

Use axis=1 to get the count of unique values in Row.


# Count unique values in each row
df.nunique(axis=1)

7. Complete Example


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Python","Pandas","Python","Spark","Pandas"],
    'Fee' :[20000,25000,22000,30000,25000,20000,30000],
    'Duration':['30days','40days','35days','50days','40days','30days','50days'],
    'Discount':[1000,2300,1200,2000,2300,1000,2000]
              }
df = pd.DataFrame(technologies)
print(df)


# Get Unique Count using Series.unique()
count = df.Courses.unique().size
print("Unique values count : "+ str(count))
# Using Series.nunique()
count = df.Courses.nunique()
print("Unique values count : "+ str(count))

# Get frequency of each value
frequency = df.Courses.value_counts()
print(frequency)

# By using drop_duplicates()
count = df.Courses.drop_duplicates().size
print("Unique values count : "+ str(count))

#Count unique on multiple columns
count = df[['Courses','Fee']].drop_duplicates().shape[0]
print("Unique multiple columns : "+ str(count))

#Count unique on multiple columns
count = df[['Courses','Fee']].nunique()
print(count)

#count unique values in each row
#df.nunique(axis=1)

Conclusion

In this article, you have learned how to get the number of unique values of a column using Series.unique(), Series.nunique(), Series.drop_duplicates() and also learned how to get the unique values form multiple columns.

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Pandas Count Unique Values in Column