In Pandas you can count the frequency of unique values that occur in a DataFrame column by using the Series.value_counts()
method, alternatively, If you have a SQL background you can also get using groupby()
and count()
method. Both these methods get you the occurrence of a value by counting a value in each row and return you by grouping on the requested column.
In this article, I will explain how to count the frequency of unique values in a column of pandas DataFrame on single, multiple columns, by index column e.t.c
1. Quick Examples of Count The Frequency of a Value in Column
Below are some of the quick examples of how to count the frequency that a value occurs in a DataFrame column
# Quick examples of count the frequency of a value in column
# Example 1: Using pandas way, Series.value_counts()
df1 = df['Courses'].value_counts()
df1 = df.Courses.value_counts
# Example 2: Using groupby() & count()
df1 = df.groupby('Courses').count()
# Example 3: Add frequecy count as new column to DataFrame
df['freq_count'] = df.groupby('Courses')['Courses'].transform('count')
# Example 4: Getting value counts of multiple columns
df1 = df.groupby(['Courses', 'Fee']).count()
# Example 5: Get occurence of value by index (row label)
df1 = df.index.value_counts()
# Example 6: Include NaN, None, Null values in the count.
df1 = df.['Courses'].value_counts(dropna=False)
Now let’s create a DataFrame, run these, and explore the output. Our DataFrame contains just two columns Courses
, Fee.
# Create DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
'Fee' :[24000, 25000, 25000, 24000, 24000, 25000, 25000, 24000]
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
2. Series.value_counts() to Count Frequency of Value in a Column
Series.values_count()
method counts the frequency of unique values that occur in a column of pandas DataFrame. To use this first, you need to get the Series object from DataFrame. df['column_name']
returns you a Series object.
For example df['Courses'].values
returns a list of all values including duplicates ['Spark' 'PySpark' 'Hadoop' 'Python' 'pandas' 'PySpark' 'Python' 'pandas']
.
Now using df['Courses'].value_counts()
to get the frequency count of unique values in the Courses
column.
# Using series value_counts()
df1 = df['Courses'].value_counts()
print("Count the frequency of unique values:\n", df1)
Yields below output. In case, if the DataFrame contains any NULL/None/np.NaN
values, the values_counts()
function can ignore the count of these values.
If you don’t have spaces in columns, you can also get the same using df.Courses.value_counts
. To get the frequency count of multiple columns in Pandas, pass a list of columns as a list.
# Get Frequency of multiple columns
df1 = df[['Courses','Fee']].value_counts()
print("Count the frequency of unique values:\n", df1)
Yields below output.
# Output:
# Count the frequency of unique values:
Courses Fee
PySpark 25000 2
pandas 24000 2
Hadoop 25000 1
Python 24000 1
25000 1
Spark 24000 1
dtype: int64
3. Count Frequency Value Using GroupBy.count()
If you are coming from a SQL background, you would be familiar with groupby() and count() to get the number of times the value is present in a column (frequency of column values), you can use a similar approach on pandas as well. DataFrame.groupby() method groups data on a specified column by collecting/grouping all similar values together and count() on top of that gives the number of times each value is repeated.
# Using GroupBy.count() frequency value
df1 = df.groupby('Courses').count()
print("Count the frequency of unique values:\n", df1)
Yields below output.
# Output:
# Count the frequency of unique values:
Fee
Courses
Hadoop 1
PySpark 2
Python 2
Spark 1
pandas 2
If you wanted to add a frequency count back to the DataFrame.
# Add frequency count to DataFrame
df['freq_count'] = df.groupby('Courses')['Courses'].transform('count')
print("Count the frequency of unique values:\n", df)
Outputs
# Output:
# Count the frequency of unique values:
Courses Fee freq_count
r1 Spark 24000 1
r2 PySpark 25000 2
r3 Hadoop 25000 1
r4 Python 24000 2
r5 pandas 24000 2
r2 PySpark 25000 2
r4 Python 25000 2
r5 pandas 24000 2
4. Count Frequency Value Using GroupBy.size()
Using df.groupby().size()
function to get the count frequency of single or multiple columns, when you are trying with multiple columns use the size()
method. We can apply the size()
function on the resulting Groupby()
object to get a frequency count.
# Count frequency value using GroupBy.size()
df1 = df.groupby(['Courses', 'Fee']).size()
print("Count the frequency of unique values:\n", df1)
Yields below output.
# Output:
# Count the frequency of unique values:
Courses Fee
Hadoop 25000 1
PySpark 25000 2
Python 24000 1
25000 1
Spark 24000 1
pandas 24000 2
dtype: int64
5. Get Occurance of a Value from an Index
If you have a label to Index, you can also get how many times an index value occurred in a Pandas DataFrame using DataFrame.index.value_counts()
as DataFrame.index returns a series object.
# Get Number of occurance by Index
df1 = df.index.value_counts()
print("Count the frequency of unique values:\n", df1)
Yields below output.
# Output:
# Count the frequency of unique values:
r4 2
r2 2
r5 2
r1 1
r3 1
dtype: int64
6. Get the Frequency of a Column with NaN, None in DataFrame
As I already explained above, the value_counts()
method by default ignores NaN, None, and Null values from the count. Use pandas.Series.value_counts(dropna=False)
to include None, Nan & Null values in the count of the frequency of a value in DataFrame column.
# Include NaN, None & Null in frequency count
df1 = df.['Courses'].value_counts(dropna=False)
print("Count the frequency of unique values:\n", df1)
7. Get the Frequency of values as a percentage
All examples explained above return a count of the frequency of a value that occurred in DataFrame, but sometimes you may need the occurrence of a percentage.
# Get percentage instead of count
df1 = df['Courses'].value_counts(normalize =True)
print("Count the frequency of unique values as percentage:\n", df1)
Yields below output.
# Output:
# Count the frequency of unique values as percentage:
PySpark 0.250
pandas 0.250
Python 0.250
Hadoop 0.125
Spark 0.125
Name: Courses, dtype: float64
8. Using df.apply(pd.value_counts).fillna(0) to Count Frequency Value
You can use df.apply(pd.value_counts)
to count the frequency of all columns. If you set axis=1
, you get the frequency in every row. value_counts – returns object containing counts of unique values. using fillna(0)
fills zero for NaN or None values. See how to replace NaN with zero in pandas.
# Using df.apply method to count frequency value
df1 = df.apply(pd.value_counts).fillna(0)
print("Count the frequency of unique values:\n", df1)
Yields below output.
# Output:
# Count the frequency of unique values:
Courses Fee
22000 0.0 1.0
24000 0.0 3.0
25000 0.0 4.0
Hadoop 1.0 0.0
PySpark 2.0 0.0
Python 2.0 0.0
Spark 1.0 0.0
pandas 2.0 0.0
9. Complete Examples
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
'Fee' :[24000,25000,25000,24000,24000,25000,25000,24000]
}
index_labels=['r1','r2','r3','r4','r5','r2','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
df1 = df['Courses'].value_counts()
print(df1)
df1= df.Courses.value_counts
print(df1)
df1=df[['Courses','Fee']].value_counts()
print(df1)
df1 = df['Courses'].value_counts(normalize =True)
print(df1)
df1 = df.index.value_counts()
print(df1)
df1 = df.groupby('Courses').count()
print(df1)
df1 = df.groupby(['Courses', 'Fee']).size()
print(df1)
df1 = df.groupby(['Courses', 'Fee']).count()
print(df1)
df['freq_count'] = df.groupby('Courses')['Courses'].transform('count')
print(df)
df1 = df.apply(pd.value_counts).fillna(0)
print(df1)
Frequently Asked Questions
To count the frequency of each unique value in a specific column using Pandas, you can use the value_counts()
method.
To count the frequency of a specific value in a column, you can use the value_counts()
method and access the count for the desired value.
By default, the value_counts()
method in Pandas excludes missing values (NaN
). If you want to include missing values in the frequency count, you can set the dropna
parameter to False
.
You can normalize the frequency counts to obtain proportions or percentages using the normalize
parameter in the value_counts()
method. Setting normalize=True
will return the relative frequencies.
You can sort the frequency counts in descending order using the sort_values()
method in Pandas. For example, value_counts()
provides the frequency counts, and sort_values(ascending=False)
sorts them in descending order. The resulting sorted_counts
will be a Pandas Series where the index represents unique values, and the values represent the frequency counts in descending order.
Conclusion
In this article, you have learned how to count the frequency of a value that occurs in Pandas DataFrame columns using Series.value_counts()
, GroupBy.count()
and GroupBy.size()
method. Also, you have learned to count the frequency by including nulls and frequency of all values from all selected columns.
Happy Learning !!
Related Articles
- Pandas Count Unique Values in Column
- How to Get Pandas Columns Count
- Count NaN Values in Pandas DataFrame
- How to Create Pandas Pivot Table Count
- Pandas Count Distinct Values DataFrame
- How to Count Duplicates in Pandas DataFrame
- How to add/insert row to Pandas DataFrame?
- Pandas Count Rows with Condition
- How to Add New Column to Existing Pandas DataFrame
- How to Get Count of Each Row of Pandas DataFrame
- Pandas Rename Column | Multiple Columns
- Different Ways to Iterate Over Rows in Pandas DataFrame
- Remap Values in Column with a Dictionary (Dict) in Panda
The information in this page is relevant.
The information in this page is relevant.