• Post author:
  • Post category:Pandas
  • Post last modified:June 15, 2024
  • Reading time:17 mins read

In Pandas, the Series.value_counts() function is used to count the occurrences of unique values in a Pandas Series. It returns a new Series object where the unique values from the original Series are the index labels, and the corresponding values are their respective counts.

Advertisements

In this article, I will explain the Series.value_counts() function by using its syntax, parameters, and usage that return a Series containing counts of unique values in a given Series object. It can be particularly useful for exploring the distribution of values within a dataset with examples.

Key Points –

  • Pandas Series.value_counts() is particularly useful for exploring the distribution of values within a Series object, providing insight into the data’s frequency distribution.
  • value_counts() is a convenient method in Pandas for counting the occurrences of unique values in a Series object.
  • It returns a new Series object where the unique values in the original Series are the index labels, and the corresponding values are their respective counts.
  • The dropna parameter allows you to include or exclude missing values (NaN) from the counts. Setting dropna=False includes NaN values in the count.
  • Additional parameters like normalize can be used to return the relative frequencies instead of raw counts.

Syntax of Pandas Series.value_counts() Function

Let’s know the syntax of the Series.value_counts() function.


# Syntax of Series.value_counts() function
Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

Parameters of the Series.value_counts()

Following are the parameters of the Series.value_counts() function.

  • normalize – Optional. If True, returns the relative frequencies of unique values instead of counts. Default is False.
  • sort – Optional. If True, sorts the counts in descending order. Default is True.
  • ascending – Optional. If True and sort is True, sorts the counts in ascending order. Default is False.
  • bins – Optional. Specifies the bin edges for binning the data. Only applicable for numeric data. Default is None.
  • dropna – Optional. If False, includes NaN values in the counts. Default is True.

Return Value

The Series.value_counts() function returns a Pandas Series object containing the counts of unique values in the original Series. The index of the returned Series contains the unique values from the original Series, and the corresponding values are their respective counts.

Create Pandas Series

Pandas Series can be created in several ways by using Python lists & dictionaries, below example create a Series from lists. To use Pandas first, you need to import using import pandas as pd.


import pandas as pd

# Create Pandas Series
Courses = ['PySpark', 'Spark', 'Python', 'Spark', 'Pandas', 'Python', 'Spark']
series = pd.Series(Courses)
print("Original Series:\n",series)

Yields below output.

Pandas Series value counts

Counting Unique Values in a Series

To count unique values in a Pandas Series, you can use the value_counts() method. For example,


# Counting unique values in the Series
unique_value_counts = series.value_counts()
print("Counting unique values:\n",unique_value_counts)

Here,

  • The value_counts() method is applied to the Pandas Series series.
  • It returns a new Series object where the index contains unique values present in the original Series, and the values represent their respective counts. This example yields the below output.
Pandas Series value counts

Normalization using Series.value_counts()

Alternatively, to specify normalization with the value_counts() method in Pandas, you can set the normalize parameter to True.


# Counting unique values in the Series and normalizing
result = series.value_counts(normalize=True)
print("Counting unique values and normalizing:\n",result)

# Output:
# Counting unique values and normalizing:
#  Spark      0.428571
# Python     0.285714
# PySpark    0.142857
# Pandas     0.142857
# dtype: float64

Here,

  • By setting normalize=True, the value_counts() method returns the relative frequencies of unique values instead of counts.
  • The returned Series now contains the proportion of each unique value relative to the total number of values in the original Series.

Sorting Values using Series.value_counts()

You can also sort the values returned by the value_counts() method in Pandas, you can use the sort_values() method.


# Counting unique values in the Series and sorting in ascending order
result = series.value_counts().sort_values()
print("Counting unique values & sorting in ascending order:\n",result)

# Output:
# Counting unique values & sorting in ascending order:
#  Pandas     1
# PySpark    1
# Python     2
# Spark      3
# dtype: int64

Here,

  • After applying value_counts(), the resulting Series is sorted in descending order by default.
  • We then apply sort_values() to sort the counts in ascending order.

Including NaN Values

Similarly, to include NaN (missing) values in the counts when using the value_counts() method in Pandas, you can set the dropna parameter to False.


import pandas as pd
import numpy as np

# Creating a Pandas Series with NaN values
Courses = ['PySpark', 'Spark', np.nan, 'Spark', 'Pandas', 'Python', np.nan,'Spark']
series = pd.Series(Courses)

# Counting occurrences of each unique value, including NaN
result = series.value_counts(dropna=False)
print("Counting occurrences of each unique value:\n",result)

# Output:
# Counting occurrences of each unique value:
#  Spark      3
# NaN        2
# Python     1
# Pandas     1
# PySpark    1
# dtype: int64

Here,

  • By setting dropna=False, the value_counts() method includes NaN values in the count.
  • The returned Series now contains the counts of unique values including NaN values.

Bin Counting in a Numeric Series

To perform bin counting in a numeric series using the value_counts() method in Pandas, you can utilize the bins parameter. This allows you to specify the bin edges for binning the data.


import pandas as pd

# Creating a Pandas Series of numeric data
data = pd.Series([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])

# Counting occurrences in bins
bin_counts = data.value_counts(bins=[1, 2, 3, 4, 5])
print("Counting occurrences in bins:\n",bin_counts)

# Output:
# Counting occurrences in bins:
#  (3.0, 4.0]      4
# (2.0, 3.0]      3
# (0.999, 2.0]    3
# (4.0, 5.0]      0
# dtype: int64

Here,

  • The bins parameter is specified as [1, 2, 3, 4, 5], which defines the bin edges.
  • The value_counts() method counts the occurrences of values falling within each bin.
  • The result is a Series where the index represents the bin intervals, and the values represent the counts of values falling within each bin.

Frequently Asked Questions on Pandas Series.value_counts()

What does value_counts() do in Pandas?

In Pandas, the value_counts() method is used to count the occurrences of unique values in a Series. It returns a new Series containing the counts of unique values, sorted in descending order by default.

How to sort the result of value_counts()?

To sort the result of the value_counts() function in Pandas, you can use the sort_values() method. By default, value_counts() returns the counts in descending order, but you can further sort them in ascending or descending order as needed.

How to normalize the counts returned by value_counts()?

To normalize the counts returned by the value_counts() function in Pandas, you can set the normalize parameter to True. This will return the relative frequencies of each unique value instead of raw counts.

Can value_counts() handle missing values (NaN)?

value_counts() in Pandas can handle missing values (NaN). By default, missing values are excluded from the count. However, you can include them by setting the dropna parameter to False.

How does value_counts() handle duplicates in a Series?

The value_counts() method in Pandas counts each unique value separately, regardless of whether it appears multiple times in the Series. It counts the occurrences of each unique value independently.

Conclusion

In this article, you have learned the value_counts() method in Pandas is a powerful tool for analyzing the distribution of values within a Series. It efficiently counts the occurrences of unique values, handles numeric and categorical data, and provides options for sorting, normalization, and handling missing values with some examples.

Happy Learning!!

Reference