Pandas Count The Frequency of a Value in Column

  • Post author:
  • Post category:Pandas
  • Post last modified:March 3, 2024
  • Reading time:18 mins read

In Pandas you can count the frequency of unique values that occur in a DataFrame column by using the Series.value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method. Both these methods get you the occurrence of a value by counting a value in each row and return you by grouping on the requested column.

In this article, I will explain how to count the frequency of unique values in a column of pandas DataFrame on single, multiple columns, by index column e.t.c

1. Quick Examples of Count The Frequency of a Value in Column

Below are some of the quick examples of how to count the frequency that a value occurs in a DataFrame column


# Quick examples of count the frequency of a value in column

# Example 1: Using pandas way, Series.value_counts()
df1 = df['Courses'].value_counts()
df1 = df.Courses.value_counts

# Example 2: Using groupby() & count()
df1 = df.groupby('Courses').count()

# Example 3: Add frequecy count as new column to DataFrame
df['freq_count'] = df.groupby('Courses')['Courses'].transform('count')

# Example 4: Getting value counts of multiple columns
df1 = df.groupby(['Courses', 'Fee']).count()

# Example 5: Get occurence of value by index (row label)
df1 = df.index.value_counts()

# Example 6: Include NaN, None, Null values in the count.
df1 = df.['Courses'].value_counts(dropna=False)

Now let’s create a DataFrame, run these, and explore the output. Our DataFrame contains just two columns Courses, Fee.


# Create DataFrame
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
    'Fee' :[24000, 25000, 25000, 24000, 24000, 25000, 25000, 24000]
               }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

Pandas Count Frequency Value

2. Series.value_counts() to Count Frequency of Value in a Column

Series.values_count() method counts the frequency of unique values that occur in a column of pandas DataFrame. To use this first, you need to get the Series object from DataFrame. df['column_name'] returns you a Series object.

For example df['Courses'].values returns a list of all values including duplicates ['Spark' 'PySpark' 'Hadoop' 'Python' 'pandas' 'PySpark' 'Python' 'pandas'] .

Now using df['Courses'].value_counts() to get the frequency count of unique values in the Courses column.


# Using series value_counts()
df1 = df['Courses'].value_counts()
print("Count the frequency of unique values:\n", df1)

Yields below output. In case, if the DataFrame contains any NULL/None/np.NaN values, the values_counts() function can ignore the count of these values.

Pandas Count Frequency Value

If you don’t have spaces in columns, you can also get the same using df.Courses.value_counts. To get the frequency count of multiple columns in Pandas, pass a list of columns as a list.


# Get Frequency of multiple columns
df1 = df[['Courses','Fee']].value_counts()
print("Count the frequency of unique values:\n", df1)

Yields below output.


# Output:
# Count the frequency of unique values:
Courses  Fee  
PySpark  25000    2
pandas   24000    2
Hadoop   25000    1
Python   24000    1
         25000    1
Spark    24000    1
dtype: int64

3. Count Frequency Value Using GroupBy.count()

If you are coming from a SQL background, you would be familiar with groupby() and count() to get the number of times the value is present in a column (frequency of column values), you can use a similar approach on pandas as well. DataFrame.groupby() method groups data on a specified column by collecting/grouping all similar values together and count() on top of that gives the number of times each value is repeated.


# Using GroupBy.count() frequency value
df1 = df.groupby('Courses').count()
print("Count the frequency of unique values:\n", df1)

Yields below output.


# Output:
# Count the frequency of unique values:
         Fee
Courses     
Hadoop     1
PySpark    2
Python     2
Spark      1
pandas     2

If you wanted to add a frequency count back to the DataFrame.


# Add frequency count to DataFrame
df['freq_count'] = df.groupby('Courses')['Courses'].transform('count')
print("Count the frequency of unique values:\n", df)

Outputs


# Output:
# Count the frequency of unique values:
    Courses    Fee  freq_count
r1    Spark  24000           1
r2  PySpark  25000           2
r3   Hadoop  25000           1
r4   Python  24000           2
r5   pandas  24000           2
r2  PySpark  25000           2
r4   Python  25000           2
r5   pandas  24000           2

4. Count Frequency Value Using GroupBy.size()

Using df.groupby().size() function to get the count frequency of single or multiple columns, when you are trying with multiple columns use the size() method. We can apply the size() function on the resulting Groupby() object to get a frequency count.


# Count frequency value using GroupBy.size()
df1 = df.groupby(['Courses', 'Fee']).size()
print("Count the frequency of unique values:\n", df1)

Yields below output.


# Output:
# Count the frequency of unique values:
Courses  Fee  
Hadoop   25000    1
PySpark  25000    2
Python   24000    1
         25000    1
Spark    24000    1
pandas   24000    2
dtype: int64

5. Get Occurance of a Value from an Index

If you have a label to Index, you can also get how many times an index value occurred in a Pandas DataFrame using DataFrame.index.value_counts() as DataFrame.index returns a series object.


# Get Number of occurance  by Index
df1 = df.index.value_counts()
print("Count the frequency of unique values:\n", df1)

Yields below output.


# Output:
# Count the frequency of unique values:
r4    2
r2    2
r5    2
r1    1
r3    1
dtype: int64

6. Get the Frequency of a Column with NaN, None in DataFrame

As I already explained above, the value_counts() method by default ignores NaN, None, and Null values from the count. Use pandas.Series.value_counts(dropna=False) to include None, Nan & Null values in the count of the frequency of a value in DataFrame column.


# Include NaN, None & Null in frequency count
df1 = df.['Courses'].value_counts(dropna=False)
print("Count the frequency of unique values:\n", df1)

7. Get the Frequency of values as a percentage

All examples explained above return a count of the frequency of a value that occurred in DataFrame, but sometimes you may need the occurrence of a percentage.


# Get percentage instead of count
df1 = df['Courses'].value_counts(normalize =True)
print("Count the frequency of unique values as percentage:\n", df1)

Yields below output.


# Output:
# Count the frequency of unique values as percentage:
PySpark    0.250
pandas     0.250
Python     0.250
Hadoop     0.125
Spark      0.125
Name: Courses, dtype: float64

8. Using df.apply(pd.value_counts).fillna(0) to Count Frequency Value

You can use df.apply(pd.value_counts) to count the frequency of all columns. If you set axis=1, you get the frequency in every row. value_counts – returns object containing counts of unique values. using fillna(0) fills zero for NaN or None values. See how to replace NaN with zero in pandas.


# Using df.apply method to count frequency value
df1 = df.apply(pd.value_counts).fillna(0)
print("Count the frequency of unique values:\n", df1)

Yields below output.


# Output:
# Count the frequency of unique values:
         Courses  Fee
22000        0.0  1.0
24000        0.0  3.0
25000        0.0  4.0
Hadoop       1.0  0.0
PySpark      2.0  0.0
Python       2.0  0.0
Spark        1.0  0.0
pandas       2.0  0.0

9. Complete Examples


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
    'Fee' :[24000,25000,25000,24000,24000,25000,25000,24000]
               }
index_labels=['r1','r2','r3','r4','r5','r2','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

df1 = df['Courses'].value_counts()
print(df1)

df1= df.Courses.value_counts
print(df1)

df1=df[['Courses','Fee']].value_counts()
print(df1)

df1 = df['Courses'].value_counts(normalize =True)
print(df1)

df1 = df.index.value_counts()
print(df1)

df1 = df.groupby('Courses').count()
print(df1)

df1 = df.groupby(['Courses', 'Fee']).size()
print(df1)

df1 = df.groupby(['Courses', 'Fee']).count()
print(df1)

df['freq_count'] = df.groupby('Courses')['Courses'].transform('count')
print(df)

df1 = df.apply(pd.value_counts).fillna(0)
print(df1)

Frequently Asked Questions

How can I count the frequency of each unique value in a specific column?

To count the frequency of each unique value in a specific column using Pandas, you can use the value_counts() method.

How do I count the frequency of a specific value in a column?

To count the frequency of a specific value in a column, you can use the value_counts() method and access the count for the desired value.

Can I include missing values in the frequency count?

By default, the value_counts() method in Pandas excludes missing values (NaN). If you want to include missing values in the frequency count, you can set the dropna parameter to False.

How can I normalize the frequency counts to get proportions or percentages?

You can normalize the frequency counts to obtain proportions or percentages using the normalize parameter in the value_counts() method. Setting normalize=True will return the relative frequencies.

Is there a way to sort the frequency counts in descending order?

You can sort the frequency counts in descending order using the sort_values() method in Pandas. For example, value_counts() provides the frequency counts, and sort_values(ascending=False) sorts them in descending order. The resulting sorted_counts will be a Pandas Series where the index represents unique values, and the values represent the frequency counts in descending order.

Conclusion

In this article, you have learned how to count the frequency of a value that occurs in Pandas DataFrame columns using Series.value_counts(), GroupBy.count() and GroupBy.size() method. Also, you have learned to count the frequency by including nulls and frequency of all values from all selected columns.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply

This Post Has 2 Comments

  1. Anonymous

    The information in this page is relevant.

  2. Anonymous

    The information in this page is relevant.