Pandas Count The Frequency of a Value in Column

Spread the love

In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series.value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method. Both these methods get you the occurrence of a value by counting a value in each row and return you by grouping on the requested column.

In this article, I will explain how to count the frequency of a value in a column of pandas DataFrame on single, multiple columns, by index column e.t.c

1. Quick Examples of Count The Frequency of a Value in Column

Below are some of the quick examples of how to count the frequency that a value occurs in a DataFrame column


# Below are quick examples
 
# Using pandas way, Series.value_counts()
df1 = df['Courses'].value_counts()
df1 = df.Courses.value_counts

# Using groupby() & count()
df1 = df.groupby('Courses').count()

# Add frequecy count as new column to DataFrame
df['freq_count'] = df.groupby('Courses')['Courses'].transform('count')
print(df)


# Getting value counts of multiple columns
df1 = df.groupby(['Courses', 'Fee']).count()


# Get occurence of value by index (row label)
df1 = df.index.value_counts()

# Include NaN, None, Null values in the count.
df1 = df.['Courses'].value_counts(dropna=False)

Now let’s create a DataFrame, run these and explore the output. out DataFrame contains just two columns Courses, Fee.


# Create DataFrame
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
    'Fee' :[24000,25000,25000,24000,24000,25000,25000,24000]
               }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses    Fee
0    Spark  24000
1  PySpark  25000
2   Hadoop  25000
3   Python  24000
4   pandas  24000
5  PySpark  25000
6   Python  25000
7   pandas  24000

2. Series.value_counts() to Count Frequency of Value in a Column

Series.values_count() method gets you the count of the frequency of a value that occurs in a column of pandas DataFrame. In order to use this first, you need to get the Series object from DataFrame. df['column_name'] returns you a Series object.

For example df['Courses'].values returns a list of all values including duplicates ['Spark' 'PySpark' 'Hadoop' 'Python' 'pandas' 'PySpark' 'Python' 'pandas'] .

Now using df['Courses'].value_counts() to get the frequency counts of values in the Courses column.


# Using series value_counts()
df1 = df['Courses'].value_counts()
print(df1)

Yields below output. In case if you have any NULL/None/np.NaN values values_counts() function ignores these on frequency count.


# Output:
PySpark    2
pandas     2
Python     2
Spark      1
Hadoop     1
Name: Courses, dtype: int64

If you don’t have spaces in columns, you can also get the same using df.Courses.value_counts. To get the frequency count of multiple columns in pandas, pass a list of columns as a list.


# Get Frequency of multiple columns
print( df[['Courses','Fee']].value_counts())

Yields below output.


# Output:
Courses  Fee  
PySpark  25000    2
pandas   24000    2
Hadoop   25000    1
Python   24000    1
         25000    1
Spark    24000    1
dtype: int64

3. Count Frequency Value Using GroupBy.count()

If you are coming from a SQL background, you would be familiar with GROUP BY and COUNT to get the number of times the value present in a column (frequency of column values), you can use a similar approach on pandas as well. DataFrame.groupby() method groups data on a specified column by collecting/grouping all similar values together and count() on top of that gives the number of times each value is repeated.

Note that panda.DataFrame.groupby() return GroupBy object and count() is a method in GroupBy.


# Using GroupBy.count() frequency value
df1 = df.groupby('Courses').count()
print(df1)

Yields below output.


# Output:
         Fee
Courses     
Hadoop     1
PySpark    2
Python     2
Spark      1
pandas     2

If you wanted to add a frequency count back to the DataFrame.


# Add frequency count to DataFrame
df['freq_count'] = df.groupby('Courses')['Courses'].transform('count')
print(df)

Outputs


# Output:
    Courses    Fee  freq_count
r1    Spark  24000           1
r2  PySpark  25000           2
r3   Hadoop  25000           1
r4   Python  24000           2
r5   pandas  24000           2
r2  PySpark  25000           2
r4   Python  25000           2
r5   pandas  24000           2

4. Count Frequency Value Using GroupBy.size()

Using df.groupby().size() function to get the count frequency of single or multiple columns, when you are trying with multiple columns use the size() method. We can apply the size() function on the resulting Groupby() object to get a frequency count.


# Count frequency value using GroupBy.size()
df1 = df.groupby(['Courses', 'Fee']).size()
print(df1)

Yields below output.


# Output:
Courses  Fee  
Hadoop   25000    1
PySpark  25000    2
Python   24000    1
         25000    1
Spark    24000    1
pandas   24000    2
dtype: int64

5. Get Occurance of a Value from Index

If you have a label to Index, you can also get how many times an index value occurred in a panda DataFrame using DataFrame.index.value_counts() as DataFrame.index returns a series object.


# Get Number of occurance  by Index
df1 = df.index.value_counts()
print(df1)

Yields below output.


# Output:
r4    2
r2    2
r5    2
r1    1
r3    1
dtype: int64

6. Get Frequency of a Column with NaN, None in DataFrame

As I already explained above, value_counts() method by default ignores NaN, None, Null values from the count. Use pandas.Series.value_counts(dropna=False) to include None, Nan & Null values in the count of the frequency of a value in DataFrame column.


# Include NaN, None & Null in frequency count
df1 = df.['Courses'].value_counts(dropna=False)
print(df1)

7. Get Frequency of values as percentage

All examples explained above returns a count of the frequency of a value that occurred in DataFrame, but sometimes you may need the occurrence of a percentage.


# Get percentage instead of count
df1 = df['Courses'].value_counts(normalize =True)
print(df1)

Yields below output.


# Output:
PySpark    0.250
pandas     0.250
Python     0.250
Hadoop     0.125
Spark      0.125
Name: Courses, dtype: float64

8. Using df.apply(pd.value_counts).fillna(0) to Count Frequency Value

You can use df.apply(pd.value_counts) to count frequency of all columns. If you set axis=1, you get the frequency in every row. value_counts – returns object containing counts of unique values. using fillna(0) fills zero for NaN or None values. See how to replace NaN with zero in pandas.


# Using df.apply method to count frequency value
df1 = df.apply(pd.value_counts).fillna(0)
print(df1)

Yields below output.


# Output:
         Courses  Fee
22000        0.0  1.0
24000        0.0  3.0
25000        0.0  4.0
Hadoop       1.0  0.0
PySpark      2.0  0.0
Python       2.0  0.0
Spark        1.0  0.0
pandas       2.0  0.0

9. Compelte Examples


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
    'Fee' :[24000,25000,25000,24000,24000,25000,25000,24000]
               }
index_labels=['r1','r2','r3','r4','r5','r2','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

df1 = df['Courses'].value_counts()
print(df1)

df1= df.Courses.value_counts
print(df1)

df1=df[['Courses','Fee']].value_counts()
print(df1)

df1 = df['Courses'].value_counts(normalize =True)
print(df1)

df1 = df.index.value_counts()
print(df1)

df1 = df.groupby('Courses').count()
print(df1)

df1 = df.groupby(['Courses', 'Fee']).size()
print(df1)

df1 = df.groupby(['Courses', 'Fee']).count()
print(df1)

df['freq_count'] = df.groupby('Courses')['Courses'].transform('count')
print(df)

df1 = df.apply(pd.value_counts).fillna(0)
print(df1)

Conclusion

In this article, you have learned how to count the frequency of a value that occurs in Pandas DataFrame columns using Series.value_counts(), GroupBy.count() and GroupBy.size() method. Also, you have learned to count the frequency by including nulls and frequency of all values from all selected columns.

Happy Learning !!

References

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

This Post Has 2 Comments

  1. Anonymous

    The information in this page is relevant.

  2. Anonymous

    The information in this page is relevant.

You are currently viewing Pandas Count The Frequency of a Value in Column