• Post author:
• Post category:Pandas

The `sum()` function in Pandas is used to calculate the sum of all elements in the Series. It is a convenient function to quickly obtain the total of all the elements in a Series. In this article, I will explain the `sum()` function and using its syntax, parameters, and usage how we can return the sum of the values in a Series.

Key Points –

• The `sum()` function calculates the sum of all elements in a Pandas Series, providing a convenient way to obtain the total of numeric values in the Series.
• The `sum()` function includes a `skipna` parameter, which allows users to control whether NaN (Not a Number) values in the Series should be included or excluded during the summation. This parameter helps manage missing or undefined values in the data.
• Through boolean indexing, users can perform conditional summation using the `sum()` function. This enables the calculation of the sum for specific elements in the Series that satisfy certain conditions, providing flexibility in data analysis.
• For a Series with a MultiIndex (hierarchical index), the `sum()` function can be applied along a specific level, collapsing the data and providing a sum for each level.
• The `sum` operation is flexible to data types. It can handle both numeric and non-numeric data, providing a sum result based on the inherent data types present in the Series.

## Syntax of Pandas Series sum() Function

Following is the syntax of the Pandas series sum() function.

``````
# Syntax of Series sum() function
Series.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
``````

### Parameters of the Series sum()

Following are the parameters of the series sum() function.

• `axis (int, optional)` – Axis for the sum operation. By default, it is `None`, and the sum is calculated over all elements in the Series.
• `skipna (bool, optional)` – If True, exclude NA/null values during the sum. If False, include NA/null values. The default is True.
• `level (int or level name, optional)` – If the axis is a MultiIndex (hierarchical), sum along a particular level, collapsing into a DataFrame.
• `numeric_only (bool, optional)` – If True, the sum only includes numeric data. If False, all data types are included. The default is None.
• `min_count (int, optional)` – The required number of valid values to perform the sum. If fewer than min_count non-NA values are present, the result will be NA.
• `**kwargs` – Additional keyword arguments are accepted and passed to the `numpy.sum()` function.

### Return Value

The `sum()` function in Pandas returns the sum of the values in a Series. The return value is a scalar (single value) representing the sum of all elements in the Series. If the Series contains only numeric values, the result is a numeric value. If the Series contains non-numeric values, the result may be of a different data type, depending on the nature of the data.

Let’s Create a Pandas Series which contains the values `[10, 20, 30, 40, 50]`.

``````
import pandas as pd

# Create a Pandas Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print("Create Pandas Series\n:", series)
``````

Yields below output.

## Get the Sum of Pandas Series

To calculate the sum of all elements in a Pandas Series, you can use the `sum()` function. For instance, the `sum()` function is applied to the given `series` object, which will return the sum of the elements of the given Series.

``````
import pandas as pd
# Calculate the sum of all elements in the Series
result = series.sum()
print("Sum of all elements in the Series:",result)
``````

This program creates a Series with values `[10, 20, 30, 40, 50]`, uses the `sum()` function to calculate the sum of all elements, and then prints the result. The output will be “Sum of all elements in the Series: 150”. This example yields the below output.

## Handling NaN Values during Summation

Alternatively, to handle NaN (Not a Number) values during summation in a Pandas Series, you can use the `skipna` parameter of the `sum()` function. By default, `skipna` is set to `True`, which means NaN values will be excluded from the summation.

``````
import pandas as pd

# Create a Series with NaN values
data = [2, 4, 7, None, 9]
series = pd.Series(data)

# Calculate the sum, excluding NaN values
result = series.sum(skipna=True)
print("Calculate the sum:",result)

# Output:
# Calculate the sum: 22.0
``````

In the above example, `result` will be the sum of the non-NaN elements in the Series `[2, 4, 7, 9]`. The output will be `22`. If you set `skipna=False`, NaN values will be included in the summation, and the result will be `nan`.

## Get the Sum of Selected Elements Based on the Condition

Similarly, to calculate the sum of selected elements in a Pandas Series based on a condition, you can use boolean indexing.

``````
import pandas as pd

# Create a Pandas Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data)

# Calculate the sum of elements greater than 20
result = series[series > 20].sum()
print("Sum of elements greater than 20:",result)

# Output:
# Sum of elements greater than 20: 120
``````

In the above example, `result` will be the sum of elements in the Series `[10, 20, 30, 40, 50]`, which are greater than 20. The output will be `120`. You can adjust the condition inside the square brackets based on your specific criteria.

## Pandas Series sum() Function using Cumulative Sum

You can calculate the cumulative sum of a Pandas Series, you can use the `cumsum()` function. The cumulative sum is the running total of the elements in the Series. For instance, `cumulative_sum` will be a new Series containing the cumulative sum of the original Series.

``````
import pandas as pd

# Create a Series
data = [5, 10, 15, 20, 25]
series = pd.Series(data)

# Calculate the cumulative sum
cumulative_sum = series.cumsum()
print("Calculate the cumulative sum:\n",cumulative_sum)

# Output:
# Calculate the cumulative sum:
#  0     5
# 1    15
# 2    30
# 3    50
# 4    75
# dtype: int64
``````

Each element in the `cumulative_sum` Series is the sum of all the preceding elements in the original Series, including itself.

## Get the Sum of the DateTime Index

When working with a Pandas Series that has a DateTime index, you can use the `sum()` function to calculate the sum of the values.

``````
import pandas as pd

# Create a Series with a DateTime index
date_index = pd.date_range(start='2022-01-01', end='2022-01-05', freq='D')
data = [2, 4, 6, 8, 10]
series = pd.Series(data, index=date_index)

# Calculate the sum of values in the Series
total_sum = series.sum()
print("Sum of values in the Series:",total_sum)

# Output:
# Sum of values in the Series: 30
``````

In this example, `total_sum` will be the sum of all values in the Series with a DateTime index. The output will be the sum of `[2, 4, 6, 8, 10]`, which is `30`. The DateTime index can provide additional context when working with time series data.

## Get the Sum of the MultiIndex Series

When working with a Pandas Series that has a MultiIndex, you can use the `sum()` function to calculate the sum of values along a specific level or collapse the data. For instance, `result` will be a new Series containing the sum of values along the ‘letters’ level of the MultiIndex.

``````
import pandas as pd

# Create a Series with a MultiIndex
arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
multi_index = pd.MultiIndex.from_arrays(arrays, names = ('letters', 'numbers'))
data = [5, 10, 20, 30]
series = pd.Series(data, index=multi_index)

# Calculate the sum along the 'letters' level
result = series.sum(level='letters')
print(result)

# Output:
# letters
# A    15
# B    50
# dtype: int64
``````

## Frequently Asked Questions on Pandas Series sum() Function

What is the purpose of the sum() function in Pandas Series?

The `sum()` function in Pandas Series is used to calculate the sum of all elements in the Series. It is particularly useful for numeric data, but it can also handle various data types.

How does the sum() function handle missing values (NaN) in a Series?

By default, the `sum()` function excludes missing values (NaN) during summation. This behavior can be modified using the `skipna` parameter, allowing the inclusion of NaN values if needed.

What is the difference between sum() and consume () in Pandas?

While `sum()` calculates the sum of all elements in a Series, `cumsum()` computes the cumulative sum, generating a new Series where each element represents the running total up to that point.

Is it possible to calculate the sum along specific levels in a MultiIndex Series?

It is possible to calculate the sum along specific levels in a MultiIndex Series in pandas. The `sum()` function can be used with the `level` parameter to perform the sum operation along a specific level of the MultiIndex.

What is the purpose of the skipna parameter, and how can I use it?

The `skipna` parameter in the Pandas `sum()` function controls whether to exclude or include missing values (NaN) during summation. By default, `skipna` is set to `True`, meaning that NaN values are excluded from the sum. If `skipna` is set to `False`, NaN values are considered, and the sum result becomes NaN if there are any NaN values in the Series.

Can the sum() function handle DateTime indexes in a Series?

The `sum()` function can handle DateTime indexes in a Pandas Series. When you apply the `sum()` function to a Series with a DateTime index, it calculates the sum of the numeric values associated with each unique DateTime index.

## Conclusion

In this article, I have explained the `sum()` function and using its syntax, parameters, and usage how we can compute the sum of the Pandas Series is a versatile tool for calculating the sum of elements in a Series with examples.

Happy Learning!!

## References

### Malli

Malli is an experienced technical writer with a passion for translating complex Python concepts into clear, concise, and user-friendly articles. Over the years, he has written hundreds of articles in Pandas, NumPy, Python, and takes pride in ability to bridge the gap between technical experts and end-users.