In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series.value_counts()
method, alternatively, If you have a SQL background you can also get using groupby()
and count()
method. Both these methods get you the occurrence of a value by counting a value in each row and return you by grouping on the requested column.
In this article, I will explain how to count the frequency of a value in a column of pandas DataFrame on single, multiple columns, by index column e.t.c
1. Quick Examples of Count The Frequency of a Value in Column
Below are some of the quick examples of how to count the frequency that a value occurs in a DataFrame column
# Below are quick examples
# Using pandas way, Series.value_counts()
df1 = df['Courses'].value_counts()
df1 = df.Courses.value_counts
# Using groupby() & count()
df1 = df.groupby('Courses').count()
# Add frequecy count as new column to DataFrame
df['freq_count'] = df.groupby('Courses')['Courses'].transform('count')
print(df)
# Getting value counts of multiple columns
df1 = df.groupby(['Courses', 'Fee']).count()
# Get occurence of value by index (row label)
df1 = df.index.value_counts()
# Include NaN, None, Null values in the count.
df1 = df.['Courses'].value_counts(dropna=False)
Now let’s create a DataFrame, run these and explore the output. out DataFrame contains just two columns Courses
, Fee.
# Create DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
'Fee' :[24000,25000,25000,24000,24000,25000,25000,24000]
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
# Output:
Courses Fee
0 Spark 24000
1 PySpark 25000
2 Hadoop 25000
3 Python 24000
4 pandas 24000
5 PySpark 25000
6 Python 25000
7 pandas 24000
2. Series.value_counts() to Count Frequency of Value in a Column
Series.values_count()
method gets you the count of the frequency of a value that occurs in a column of pandas DataFrame. In order to use this first, you need to get the Series object from DataFrame. df['column_name']
returns you a Series object.
For example df['Courses'].values
returns a list of all values including duplicates ['Spark' 'PySpark' 'Hadoop' 'Python' 'pandas' 'PySpark' 'Python' 'pandas']
.
Now using df['Courses'].value_counts()
to get the frequency counts of values in the Courses
column.
# Using series value_counts()
df1 = df['Courses'].value_counts()
print(df1)
Yields below output. In case if you have any NULL/None/np.NaN
values values_counts()
function ignores these on frequency count.
# Output:
PySpark 2
pandas 2
Python 2
Spark 1
Hadoop 1
Name: Courses, dtype: int64
If you don’t have spaces in columns, you can also get the same using df.Courses.value_counts
. To get the frequency count of multiple columns in pandas, pass a list of columns as a list.
# Get Frequency of multiple columns
print( df[['Courses','Fee']].value_counts())
Yields below output.
# Output:
Courses Fee
PySpark 25000 2
pandas 24000 2
Hadoop 25000 1
Python 24000 1
25000 1
Spark 24000 1
dtype: int64
3. Count Frequency Value Using GroupBy.count()
If you are coming from a SQL background, you would be familiar with GROUP BY
and COUNT
to get the number of times the value present in a column (frequency of column values), you can use a similar approach on pandas as well. DataFrame.groupby()
method groups data on a specified column by collecting/grouping all similar values together and count()
on top of that gives the number of times each value is repeated.
Note that panda.DataFrame.groupby() return GroupBy object and count() is a method in GroupBy.
# Using GroupBy.count() frequency value
df1 = df.groupby('Courses').count()
print(df1)
Yields below output.
# Output:
Fee
Courses
Hadoop 1
PySpark 2
Python 2
Spark 1
pandas 2
If you wanted to add a frequency count back to the DataFrame.
# Add frequency count to DataFrame
df['freq_count'] = df.groupby('Courses')['Courses'].transform('count')
print(df)
Outputs
# Output:
Courses Fee freq_count
r1 Spark 24000 1
r2 PySpark 25000 2
r3 Hadoop 25000 1
r4 Python 24000 2
r5 pandas 24000 2
r2 PySpark 25000 2
r4 Python 25000 2
r5 pandas 24000 2
4. Count Frequency Value Using GroupBy.size()
Using df.groupby().size()
function to get the count frequency of single or multiple columns, when you are trying with multiple columns use the size() method. We can apply the size()
function on the resulting Groupby()
object to get a frequency count.
# Count frequency value using GroupBy.size()
df1 = df.groupby(['Courses', 'Fee']).size()
print(df1)
Yields below output.
# Output:
Courses Fee
Hadoop 25000 1
PySpark 25000 2
Python 24000 1
25000 1
Spark 24000 1
pandas 24000 2
dtype: int64
5. Get Occurance of a Value from Index
If you have a label to Index, you can also get how many times an index value occurred in a panda DataFrame using DataFrame.index.value_counts()
as DataFrame.index returns a series object.
# Get Number of occurance by Index
df1 = df.index.value_counts()
print(df1)
Yields below output.
# Output:
r4 2
r2 2
r5 2
r1 1
r3 1
dtype: int64
6. Get Frequency of a Column with NaN, None in DataFrame
As I already explained above, value_counts()
method by default ignores NaN, None, Null values from the count. Use pandas.Series.value_counts(dropna=False)
to include None, Nan & Null values in the count of the frequency of a value in DataFrame column.
# Include NaN, None & Null in frequency count
df1 = df.['Courses'].value_counts(dropna=False)
print(df1)
7. Get Frequency of values as percentage
All examples explained above returns a count of the frequency of a value that occurred in DataFrame, but sometimes you may need the occurrence of a percentage.
# Get percentage instead of count
df1 = df['Courses'].value_counts(normalize =True)
print(df1)
Yields below output.
# Output:
PySpark 0.250
pandas 0.250
Python 0.250
Hadoop 0.125
Spark 0.125
Name: Courses, dtype: float64
8. Using df.apply(pd.value_counts).fillna(0) to Count Frequency Value
You can use df.apply(pd.value_counts)
to count frequency of all columns. If you set axis=1, you get the frequency in every row. value_counts – returns object containing counts of unique values. using fillna(0)
fills zero for NaN or None values. See how to replace NaN with zero in pandas.
# Using df.apply method to count frequency value
df1 = df.apply(pd.value_counts).fillna(0)
print(df1)
Yields below output.
# Output:
Courses Fee
22000 0.0 1.0
24000 0.0 3.0
25000 0.0 4.0
Hadoop 1.0 0.0
PySpark 2.0 0.0
Python 2.0 0.0
Spark 1.0 0.0
pandas 2.0 0.0
9. Compelte Examples
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas","PySpark","Python","pandas"],
'Fee' :[24000,25000,25000,24000,24000,25000,25000,24000]
}
index_labels=['r1','r2','r3','r4','r5','r2','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
df1 = df['Courses'].value_counts()
print(df1)
df1= df.Courses.value_counts
print(df1)
df1=df[['Courses','Fee']].value_counts()
print(df1)
df1 = df['Courses'].value_counts(normalize =True)
print(df1)
df1 = df.index.value_counts()
print(df1)
df1 = df.groupby('Courses').count()
print(df1)
df1 = df.groupby(['Courses', 'Fee']).size()
print(df1)
df1 = df.groupby(['Courses', 'Fee']).count()
print(df1)
df['freq_count'] = df.groupby('Courses')['Courses'].transform('count')
print(df)
df1 = df.apply(pd.value_counts).fillna(0)
print(df1)
Conclusion
In this article, you have learned how to count the frequency of a value that occurs in Pandas DataFrame columns using Series.value_counts()
, GroupBy.count()
and GroupBy.size()
method. Also, you have learned to count the frequency by including nulls and frequency of all values from all selected columns.
Happy Learning !!
Related Articles
- How to Add New Column to Existing Pandas DataFrame
- How to Get Count of Each Row of Pandas DataFrame
- Pandas Rename Column | Multiple Columns
- Different Ways to Iterate Over Rows in Pandas DataFrame
- Remap Values in Column with a Dictionary (Dict) in Pandas
- Pandas Count Distinct Values DataFrame
- Pandas Count Unique Values in Column
- How to Count Duplicates in Pandas DataFrame
- Pandas DataFrame count() Function
The information in this page is relevant.
The information in this page is relevant.