The sum()
function in Pandas is used to calculate the sum of all elements in the Series. It is a convenient function to quickly obtain the total of all the elements in a Series. In this article, I will explain the sum()
function and using its syntax, parameters, and usage how we can return the sum of the values in a Series.
Key Points –
- The
sum()
function calculates the sum of all elements in a Pandas Series, providing a convenient way to obtain the total of numeric values in the Series. - The
sum()
function includes askipna
parameter, which allows users to control whether NaN (Not a Number) values in the Series should be included or excluded during the summation. This parameter helps manage missing or undefined values in the data. - Through boolean indexing, users can perform conditional summation using the
sum()
function. This enables the calculation of the sum for specific elements in the Series that satisfy certain conditions, providing flexibility in data analysis. - For a Series with a MultiIndex (hierarchical index), the
sum()
function can be applied along a specific level, collapsing the data and providing a sum for each level. - The
sum
operation is flexible to data types. It can handle both numeric and non-numeric data, providing a sum result based on the inherent data types present in the Series.
Syntax of Pandas Series sum() Function
Following is the syntax of the Pandas series sum() function.
# Syntax of Series sum() function
Series.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
Parameters of the Series sum()
Following are the parameters of the series sum() function.
axis (int, optional)
– Axis for the sum operation. By default, it isNone
, and the sum is calculated over all elements in the Series.skipna (bool, optional)
– If True, exclude NA/null values during the sum. If False, include NA/null values. The default is True.level (int or level name, optional)
– If the axis is a MultiIndex (hierarchical), sum along a particular level, collapsing into a DataFrame.numeric_only (bool, optional)
– If True, the sum only includes numeric data. If False, all data types are included. The default is None.min_count (int, optional)
– The required number of valid values to perform the sum. If fewer than min_count non-NA values are present, the result will be NA.**kwargs
– Additional keyword arguments are accepted and passed to thenumpy.sum()
function.
Return Value
The sum()
function in Pandas returns the sum of the values in a Series. The return value is a scalar (single value) representing the sum of all elements in the Series. If the Series contains only numeric values, the result is a numeric value. If the Series contains non-numeric values, the result may be of a different data type, depending on the nature of the data.
Let’s Create a Pandas Series which contains the values [10, 20, 30, 40, 50]
.
import pandas as pd
# Create a Pandas Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print("Create Pandas Series\n:", series)
Yields below output.
Get the Sum of Pandas Series
To calculate the sum of all elements in a Pandas Series, you can use the sum()
function. For instance, the sum()
function is applied to the given series
object, which will return the sum of the elements of the given Series.
import pandas as pd
# Calculate the sum of all elements in the Series
result = series.sum()
print("Sum of all elements in the Series:",result)
This program creates a Series with values [10, 20, 30, 40, 50]
, uses the sum()
function to calculate the sum of all elements, and then prints the result. The output will be “Sum of all elements in the Series: 150”. This example yields the below output.
Handling NaN Values during Summation
Alternatively, to handle NaN (Not a Number) values during summation in a Pandas Series, you can use the skipna
parameter of the sum()
function. By default, skipna
is set to True
, which means NaN values will be excluded from the summation.
import pandas as pd
# Create a Series with NaN values
data = [2, 4, 7, None, 9]
series = pd.Series(data)
# Calculate the sum, excluding NaN values
result = series.sum(skipna=True)
print("Calculate the sum:",result)
# Output:
# Calculate the sum: 22.0
In the above example, result
will be the sum of the non-NaN elements in the Series [2, 4, 7, 9]
. The output will be 22
. If you set skipna=False
, NaN values will be included in the summation, and the result will be nan
.
Get the Sum of Selected Elements Based on the Condition
Similarly, to calculate the sum of selected elements in a Pandas Series based on a condition, you can use boolean indexing.
import pandas as pd
# Create a Pandas Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
# Calculate the sum of elements greater than 20
result = series[series > 20].sum()
print("Sum of elements greater than 20:",result)
# Output:
# Sum of elements greater than 20: 120
In the above example, result
will be the sum of elements in the Series [10, 20, 30, 40, 50]
, which are greater than 20. The output will be 120
. You can adjust the condition inside the square brackets based on your specific criteria.
Pandas Series sum() Function using Cumulative Sum
You can calculate the cumulative sum of a Pandas Series, you can use the cumsum()
function. The cumulative sum is the running total of the elements in the Series. For instance, cumulative_sum
will be a new Series containing the cumulative sum of the original Series.
import pandas as pd
# Create a Series
data = [5, 10, 15, 20, 25]
series = pd.Series(data)
# Calculate the cumulative sum
cumulative_sum = series.cumsum()
print("Calculate the cumulative sum:\n",cumulative_sum)
# Output:
# Calculate the cumulative sum:
# 0 5
# 1 15
# 2 30
# 3 50
# 4 75
# dtype: int64
Each element in the cumulative_sum
Series is the sum of all the preceding elements in the original Series, including itself.
Get the Sum of the DateTime Index
When working with a Pandas Series that has a DateTime index, you can use the sum()
function to calculate the sum of the values.
import pandas as pd
# Create a Series with a DateTime index
date_index = pd.date_range(start='2022-01-01', end='2022-01-05', freq='D')
data = [2, 4, 6, 8, 10]
series = pd.Series(data, index=date_index)
# Calculate the sum of values in the Series
total_sum = series.sum()
print("Sum of values in the Series:",total_sum)
# Output:
# Sum of values in the Series: 30
In this example, total_sum
will be the sum of all values in the Series with a DateTime index. The output will be the sum of [2, 4, 6, 8, 10]
, which is 30
. The DateTime index can provide additional context when working with time series data.
Get the Sum of the MultiIndex Series
When working with a Pandas Series that has a MultiIndex, you can use the sum()
function to calculate the sum of values along a specific level or collapse the data. For instance, result
will be a new Series containing the sum of values along the ‘letters’ level of the MultiIndex.
import pandas as pd
# Create a Series with a MultiIndex
arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
multi_index = pd.MultiIndex.from_arrays(arrays, names = ('letters', 'numbers'))
data = [5, 10, 20, 30]
series = pd.Series(data, index=multi_index)
# Calculate the sum along the 'letters' level
result = series.sum(level='letters')
print(result)
# Output:
# letters
# A 15
# B 50
# dtype: int64
Frequently Asked Questions on Pandas Series sum() Function
The sum()
function in Pandas Series is used to calculate the sum of all elements in the Series. It is particularly useful for numeric data, but it can also handle various data types.
By default, the sum()
function excludes missing values (NaN) during summation. This behavior can be modified using the skipna
parameter, allowing the inclusion of NaN values if needed.
While sum()
calculates the sum of all elements in a Series, cumsum()
computes the cumulative sum, generating a new Series where each element represents the running total up to that point.
It is possible to calculate the sum along specific levels in a MultiIndex Series in pandas. The sum()
function can be used with the level
parameter to perform the sum operation along a specific level of the MultiIndex.
The skipna
parameter in the Pandas sum()
function controls whether to exclude or include missing values (NaN) during summation. By default, skipna
is set to True
, meaning that NaN values are excluded from the sum. If skipna
is set to False
, NaN values are considered, and the sum result becomes NaN if there are any NaN values in the Series.
The sum()
function can handle DateTime indexes in a Pandas Series. When you apply the sum()
function to a Series with a DateTime index, it calculates the sum of the numeric values associated with each unique DateTime index.
Conclusion
In this article, I have explained the sum()
function and using its syntax, parameters, and usage how we can compute the sum of the Pandas Series is a versatile tool for calculating the sum of elements in a Series with examples.
Happy Learning!!
Related Articles
- Pandas Iterate Over Series
- Pandas Series.diff() Function
- Pandas Series.isin() Function
- Convert Pandas Series to String
- How to Rename a Pandas Series
- Pandas.Series.combine() Function
- Pandas Series.mean() Function
- Pandas Series any() Function
- Pandas Series.shift() Function
- Pandas Series.quantile() Function
- Convert Pandas Series to DataFrame
- How To Get Value From Pandas Series?
- How to Get the Length of a Series in Pandas?
- Pandas Series groupby() Function with Examples
- Pandas Series unique() Function with Examples
- Pandas Series groupby() Function with Examples