• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:20 mins read
You are currently viewing Pandas Series sum() Function

The sum() function in Pandas is used to calculate the sum of all elements in the Series. It is a convenient function to quickly obtain the total of all the elements in a Series. In this article, I will explain the sum() function and using its syntax, parameters, and usage how we can return the sum of the values in a Series.

Advertisements

Key Points –

  • The sum() function calculates the sum of all elements in a Pandas Series, providing a convenient way to obtain the total of numeric values in the Series.
  • The sum() function includes a skipna parameter, which allows users to control whether NaN (Not a Number) values in the Series should be included or excluded during the summation. This parameter helps manage missing or undefined values in the data.
  • Through boolean indexing, users can perform conditional summation using the sum() function. This enables the calculation of the sum for specific elements in the Series that satisfy certain conditions, providing flexibility in data analysis.
  • For a Series with a MultiIndex (hierarchical index), the sum() function can be applied along a specific level, collapsing the data and providing a sum for each level.
  • The sum operation is flexible to data types. It can handle both numeric and non-numeric data, providing a sum result based on the inherent data types present in the Series.

Syntax of Pandas Series sum() Function

Following is the syntax of the Pandas series sum() function.


# Syntax of Series sum() function
Series.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)

Parameters of the Series sum()

Following are the parameters of the series sum() function.

  • axis (int, optional) – Axis for the sum operation. By default, it is None, and the sum is calculated over all elements in the Series.
  • skipna (bool, optional) – If True, exclude NA/null values during the sum. If False, include NA/null values. The default is True.
  • level (int or level name, optional) – If the axis is a MultiIndex (hierarchical), sum along a particular level, collapsing into a DataFrame.
  • numeric_only (bool, optional) – If True, the sum only includes numeric data. If False, all data types are included. The default is None.
  • min_count (int, optional) – The required number of valid values to perform the sum. If fewer than min_count non-NA values are present, the result will be NA.
  • **kwargs – Additional keyword arguments are accepted and passed to the numpy.sum() function.

Return Value

The sum() function in Pandas returns the sum of the values in a Series. The return value is a scalar (single value) representing the sum of all elements in the Series. If the Series contains only numeric values, the result is a numeric value. If the Series contains non-numeric values, the result may be of a different data type, depending on the nature of the data.

Let’s Create a Pandas Series which contains the values [10, 20, 30, 40, 50].


import pandas as pd

# Create a Pandas Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print("Create Pandas Series\n:", series)

Yields below output.

pandas series sum

Get the Sum of Pandas Series

To calculate the sum of all elements in a Pandas Series, you can use the sum() function. For instance, the sum() function is applied to the given series object, which will return the sum of the elements of the given Series.


import pandas as pd
# Calculate the sum of all elements in the Series
result = series.sum()
print("Sum of all elements in the Series:",result)

This program creates a Series with values [10, 20, 30, 40, 50], uses the sum() function to calculate the sum of all elements, and then prints the result. The output will be “Sum of all elements in the Series: 150”. This example yields the below output.

pandas series sum

Handling NaN Values during Summation

Alternatively, to handle NaN (Not a Number) values during summation in a Pandas Series, you can use the skipna parameter of the sum() function. By default, skipna is set to True, which means NaN values will be excluded from the summation.


import pandas as pd

# Create a Series with NaN values
data = [2, 4, 7, None, 9]
series = pd.Series(data)

# Calculate the sum, excluding NaN values
result = series.sum(skipna=True)
print("Calculate the sum:",result)

# Output:
# Calculate the sum: 22.0

In the above example, result will be the sum of the non-NaN elements in the Series [2, 4, 7, 9]. The output will be 22. If you set skipna=False, NaN values will be included in the summation, and the result will be nan.

Get the Sum of Selected Elements Based on the Condition

Similarly, to calculate the sum of selected elements in a Pandas Series based on a condition, you can use boolean indexing.


import pandas as pd

# Create a Pandas Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data)

# Calculate the sum of elements greater than 20
result = series[series > 20].sum()
print("Sum of elements greater than 20:",result)

# Output:
# Sum of elements greater than 20: 120

In the above example, result will be the sum of elements in the Series [10, 20, 30, 40, 50], which are greater than 20. The output will be 120. You can adjust the condition inside the square brackets based on your specific criteria.

Pandas Series sum() Function using Cumulative Sum

You can calculate the cumulative sum of a Pandas Series, you can use the cumsum() function. The cumulative sum is the running total of the elements in the Series. For instance, cumulative_sum will be a new Series containing the cumulative sum of the original Series.


import pandas as pd

# Create a Series
data = [5, 10, 15, 20, 25]
series = pd.Series(data)

# Calculate the cumulative sum
cumulative_sum = series.cumsum()
print("Calculate the cumulative sum:\n",cumulative_sum)

# Output:
# Calculate the cumulative sum:
#  0     5
# 1    15
# 2    30
# 3    50
# 4    75
# dtype: int64

Each element in the cumulative_sum Series is the sum of all the preceding elements in the original Series, including itself.

Get the Sum of the DateTime Index

When working with a Pandas Series that has a DateTime index, you can use the sum() function to calculate the sum of the values.


import pandas as pd

# Create a Series with a DateTime index
date_index = pd.date_range(start='2022-01-01', end='2022-01-05', freq='D')
data = [2, 4, 6, 8, 10]
series = pd.Series(data, index=date_index)

# Calculate the sum of values in the Series
total_sum = series.sum()
print("Sum of values in the Series:",total_sum)

# Output:
# Sum of values in the Series: 30

In this example, total_sum will be the sum of all values in the Series with a DateTime index. The output will be the sum of [2, 4, 6, 8, 10], which is 30. The DateTime index can provide additional context when working with time series data.

Get the Sum of the MultiIndex Series

When working with a Pandas Series that has a MultiIndex, you can use the sum() function to calculate the sum of values along a specific level or collapse the data. For instance, result will be a new Series containing the sum of values along the ‘letters’ level of the MultiIndex.


import pandas as pd

# Create a Series with a MultiIndex
arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
multi_index = pd.MultiIndex.from_arrays(arrays, names = ('letters', 'numbers'))
data = [5, 10, 20, 30]
series = pd.Series(data, index=multi_index)

# Calculate the sum along the 'letters' level
result = series.sum(level='letters')
print(result)

# Output:
# letters
# A    15
# B    50
# dtype: int64

Frequently Asked Questions on Pandas Series sum() Function

What is the purpose of the sum() function in Pandas Series?

The sum() function in Pandas Series is used to calculate the sum of all elements in the Series. It is particularly useful for numeric data, but it can also handle various data types.

How does the sum() function handle missing values (NaN) in a Series?

By default, the sum() function excludes missing values (NaN) during summation. This behavior can be modified using the skipna parameter, allowing the inclusion of NaN values if needed.

What is the difference between sum() and consume () in Pandas?

While sum() calculates the sum of all elements in a Series, cumsum() computes the cumulative sum, generating a new Series where each element represents the running total up to that point.

Is it possible to calculate the sum along specific levels in a MultiIndex Series?

It is possible to calculate the sum along specific levels in a MultiIndex Series in pandas. The sum() function can be used with the level parameter to perform the sum operation along a specific level of the MultiIndex.

What is the purpose of the skipna parameter, and how can I use it?

The skipna parameter in the Pandas sum() function controls whether to exclude or include missing values (NaN) during summation. By default, skipna is set to True, meaning that NaN values are excluded from the sum. If skipna is set to False, NaN values are considered, and the sum result becomes NaN if there are any NaN values in the Series.

Can the sum() function handle DateTime indexes in a Series?

The sum() function can handle DateTime indexes in a Pandas Series. When you apply the sum() function to a Series with a DateTime index, it calculates the sum of the numeric values associated with each unique DateTime index.

Conclusion

In this article, I have explained the sum() function and using its syntax, parameters, and usage how we can compute the sum of the Pandas Series is a versatile tool for calculating the sum of elements in a Series with examples.

Happy Learning!!

References