In Pandas, the Series.value_counts()
function is used to count the occurrences of unique values in a Pandas Series. It returns a new Series object where the unique values from the original Series are the index labels, and the corresponding values are their respective counts.
In this article, I will explain the Series.value_counts()
function by using its syntax, parameters, and usage that return a Series containing counts of unique values in a given Series object. It can be particularly useful for exploring the distribution of values within a dataset with examples.
Key Points –
- Pandas
Series.value_counts()
is particularly useful for exploring the distribution of values within a Series object, providing insight into the data’s frequency distribution. value_counts()
is a convenient method in Pandas for counting the occurrences of unique values in a Series object.- It returns a new Series object where the unique values in the original Series are the index labels, and the corresponding values are their respective counts.
- The
dropna
parameter allows you to include or exclude missing values (NaN) from the counts. Settingdropna=False
includes NaN values in the count. - Additional parameters like
normalize
can be used to return the relative frequencies instead of raw counts.
Syntax of Pandas Series.value_counts() Function
Let’s know the syntax of the Series.value_counts() function.
# Syntax of Series.value_counts() function
Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)
Parameters of the Series.value_counts()
Following are the parameters of the Series.value_counts() function.
normalize
– Optional. If True, returns the relative frequencies of unique values instead of counts. Default is False.sort
– Optional. If True, sorts the counts in descending order. Default is True.ascending
– Optional. If True and sort is True, sorts the counts in ascending order. Default is False.bins
– Optional. Specifies the bin edges for binning the data. Only applicable for numeric data. Default is None.dropna
– Optional. If False, includes NaN values in the counts. Default is True.
Return Value
The Series.value_counts()
function returns a Pandas Series object containing the counts of unique values in the original Series. The index of the returned Series contains the unique values from the original Series, and the corresponding values are their respective counts.
Create Pandas Series
Pandas Series can be created in several ways by using Python lists & dictionaries, below example create a Series from lists. To use Pandas first, you need to import using import pandas as pd
.
import pandas as pd
# Create Pandas Series
Courses = ['PySpark', 'Spark', 'Python', 'Spark', 'Pandas', 'Python', 'Spark']
series = pd.Series(Courses)
print("Original Series:\n",series)
Yields below output.
Counting Unique Values in a Series
To count unique values in a Pandas Series, you can use the value_counts()
method. For example,
# Counting unique values in the Series
unique_value_counts = series.value_counts()
print("Counting unique values:\n",unique_value_counts)
Here,
- The
value_counts()
method is applied to the Pandas Seriesseries
. - It returns a new Series object where the index contains unique values present in the original Series, and the values represent their respective counts. This example yields the below output.
Normalization using Series.value_counts()
Alternatively, to specify normalization with the value_counts()
method in Pandas, you can set the normalize
parameter to True
.
# Counting unique values in the Series and normalizing
result = series.value_counts(normalize=True)
print("Counting unique values and normalizing:\n",result)
# Output:
# Counting unique values and normalizing:
# Spark 0.428571
# Python 0.285714
# PySpark 0.142857
# Pandas 0.142857
# dtype: float64
Here,
- By setting
normalize=True
, thevalue_counts()
method returns the relative frequencies of unique values instead of counts. - The returned Series now contains the proportion of each unique value relative to the total number of values in the original Series.
Sorting Values using Series.value_counts()
You can also sort the values returned by the value_counts()
method in Pandas, you can use the sort_values()
method.
# Counting unique values in the Series and sorting in ascending order
result = series.value_counts().sort_values()
print("Counting unique values & sorting in ascending order:\n",result)
# Output:
# Counting unique values & sorting in ascending order:
# Pandas 1
# PySpark 1
# Python 2
# Spark 3
# dtype: int64
Here,
- After applying
value_counts()
, the resulting Series is sorted in descending order by default. - We then apply
sort_values()
to sort the counts in ascending order.
Including NaN Values
Similarly, to include NaN (missing) values in the counts when using the value_counts()
method in Pandas, you can set the dropna
parameter to False
.
import pandas as pd
import numpy as np
# Creating a Pandas Series with NaN values
Courses = ['PySpark', 'Spark', np.nan, 'Spark', 'Pandas', 'Python', np.nan,'Spark']
series = pd.Series(Courses)
# Counting occurrences of each unique value, including NaN
result = series.value_counts(dropna=False)
print("Counting occurrences of each unique value:\n",result)
# Output:
# Counting occurrences of each unique value:
# Spark 3
# NaN 2
# Python 1
# Pandas 1
# PySpark 1
# dtype: int64
Here,
- By setting
dropna=False
, thevalue_counts()
method includes NaN values in the count. - The returned Series now contains the counts of unique values including NaN values.
Bin Counting in a Numeric Series
To perform bin counting in a numeric series using the value_counts()
method in Pandas, you can utilize the bins
parameter. This allows you to specify the bin edges for binning the data.
import pandas as pd
# Creating a Pandas Series of numeric data
data = pd.Series([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])
# Counting occurrences in bins
bin_counts = data.value_counts(bins=[1, 2, 3, 4, 5])
print("Counting occurrences in bins:\n",bin_counts)
# Output:
# Counting occurrences in bins:
# (3.0, 4.0] 4
# (2.0, 3.0] 3
# (0.999, 2.0] 3
# (4.0, 5.0] 0
# dtype: int64
Here,
- The
bins
parameter is specified as[1, 2, 3, 4, 5]
, which defines the bin edges. - The
value_counts()
method counts the occurrences of values falling within each bin. - The result is a Series where the index represents the bin intervals, and the values represent the counts of values falling within each bin.
Frequently Asked Questions on Pandas Series.value_counts()
In Pandas, the value_counts()
method is used to count the occurrences of unique values in a Series. It returns a new Series containing the counts of unique values, sorted in descending order by default.
To sort the result of the value_counts()
function in Pandas, you can use the sort_values()
method. By default, value_counts()
returns the counts in descending order, but you can further sort them in ascending or descending order as needed.
To normalize the counts returned by the value_counts()
function in Pandas, you can set the normalize
parameter to True
. This will return the relative frequencies of each unique value instead of raw counts.
value_counts()
in Pandas can handle missing values (NaN). By default, missing values are excluded from the count. However, you can include them by setting the dropna
parameter to False
.
The value_counts()
method in Pandas counts each unique value separately, regardless of whether it appears multiple times in the Series. It counts the occurrences of each unique value independently.
Conclusion
In this article, you have learned the value_counts()
method in Pandas is a powerful tool for analyzing the distribution of values within a Series. It efficiently counts the occurrences of unique values, handles numeric and categorical data, and provides options for sorting, normalization, and handling missing values with some examples.
Happy Learning!!
Related Articles
- Pandas Series count() Function
- Pandas Series mode() Function
- Pandas Series.clip() Function
- Pandas Series iloc[] Function
- Pandas Series.rolling() Function
- Pandas Series rank() Function
- Pandas Series where() Function
- Use pandas.to_numeric() Function