In pandas, the sum()
method is used to compute the sum of the values over the requested axis in a DataFrame. It can be applied to both rows and columns, and it can handle both numerical and non-numerical data types.
In this article, I will explain the Pandas DataFrame sum()
function by using its syntax, parameters, usage, and how we can return a Series containing the sum of the values for the specified axis.
Key Points –
- The
sum()
method can aggregate data along the specified axis, whereaxis=0
sums down the columns (default), andaxis=1
sums across the rows. - By default, the
sum()
method skipsNaN
values (skipna=True
). Ifskipna=False
, the result will beNaN
for any axis with aNaN
value. - Use
numeric_only
parameter and set it toTrue
to include only float, int, and boolean columns in the summation, ignoring non-numeric data types. - The
min_count
parameter specifies the minimum number of valid (non-NA) values required to perform the sum. If fewer thanmin_count
non-NA values are present, the result isNaN
.
Syntax of Pandas DataFrame sum() Method
Let’s know the syntax of the Pandas DataFrame sum() method.
# Syntax of Pandas dataframe sum()
DataFrame.sum(axis=None, skipna=True, level=None, numeric_only=None, min_count=0, **kwargs)
Parameters of the DataFrame.sum()
Following are the parameters of the DataFrame.sum() method.
axis
– {index (0), columns (1)}, default 0. The axis to sum over. If 0 orindex
, sum over rows (columns are reduced). If 1 orcolumns
, sum over columns (rows are reduced).skipna
– bool, default True. Exclude NA/null values when computing the result. If an entire row/column is NA andskipna
is True, the result will be 0.level
– int or level name, default None. If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame.numeric_only
– bool, default None. Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.min_count
– int, default 0. The required number of valid values to perform the operation. If fewer thanmin_count
non-NA values are present, the result will be NA.kwargs
– Additional keyword arguments.
Usage of Pandas DataFrame sum()
The sum()
method in Pandas DataFrame is used to calculate the sum of the values along a given axis (rows or columns). This method returns a Series containing the sum of the values for the specified axis.
To run some examples of the Pandas DataFrame sum() method, let’s create a Pandas DataFrame using data from a dictionary.
# Create DataFrame
import pandas as pd
studentdetails = {
"Studentname":["Ram", "Sam", "Scott", "Ann", "John"],
"Mathematics" :[80,90,85,70,95],
"Science" :[85,95,80,90,75],
"English" :[90,85,80,70,95]
}
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(studentdetails ,index=index_labels)
print("Create DataFrame:\n", df)
Yields below output.
Sum of Each Column
To calculate the sum of each column (default behavior) in the DataFrame, you can use the sum()
function provided by pandas.
# Sum of each column
column_sum = df.sum()
print("Sum of each column:\n", column_sum)
Yields below output.
The Studentname
column sums to a concatenation of the names because it contains strings, while the numeric columns sum to their respective totals. If you want to exclude the Studentname
column from the sum operation, you can specify the numeric columns explicitly
# Calculate the sum of each numeric column
numeric_column_sums = df[['Mathematics', 'Science', 'English']].sum()
print("Sum of Each Numeric Column:\n", numeric_column_sums)
# Output:
# Sum of Each Numeric Column:
# Mathematics 420
# Science 425
# English 420
# dtype: int64
Sum of Each Row
To calculate the sum of each row in the DataFrame, you can use the sum()
function with the axis=1
parameter.
# Sum of each row
row_sum = df.sum(axis=1)
print("Sum of each row:\n", row_sum)
# Output:
# Sum of each row:
# r1 255
# r2 270
# r3 245
# r4 230
# r5 265
# dtype: int64
Sum with Missing Values (Handling NaN)
To calculate the sum of each row or column while handling missing values (NaN), you can use the sum()
function.
import pandas as pd
import numpy as np
# Create a DataFrame with missing values (NaN)
studentdetails_with_nan = {
"Studentname": ["Ram", "Sam", "Scott", "Ann", "John"],
"Mathematics": [80, 90, 85, np.nan, 95],
"Science": [85, 95, np.nan, 90, 75],
"English": [90, np.nan, 80, 70, 95]
}
index_labels = ['r1', 'r2', 'r3', 'r4', 'r5']
df = pd.DataFrame(studentdetails_with_nan, index=index_labels)
print("DataFrame with NaN values:\n", df)
# Output:
# DataFrame with NaN values:
# Studentname Mathematics Science English
# r1 Ram 80.0 85.0 90.0
# r2 Sam 90.0 95.0 NaN
# r3 Scott 85.0 NaN 80.0
# r4 Ann NaN 90.0 70.0
# r5 John 95.0 75.0 95.0
When using the sum()
method in pandas DataFrames, skipna=True
is the default behavior. This means that any missing values (NaN
) are automatically excluded from the sum calculation.
# Sum of each column with skipna=True (default behavior)
df2 = df.sum(skipna=True)
print("Sum with skipna=True:\n", df2)
# Output:
# Sum with skipna=True:
# Studentname RamSamScottAnnJohn
# Mathematics 350
# Science 345
# English 335
# dtype: object
In the above examples, The sum()
method computes the sum of each column (Mathematics
, Science
, English
) while skipping any NaN
values. The resulting sums (350.0
, 345.0
, 335.0
) represent the total of the non-NaN
values in each respective column.
Sum with skipna=False
When skipna=False
is specified with the sum()
method in pandas DataFrames, NaN
values are not ignored during the sum calculation. Instead, if any NaN
values are present in a column, the sum for that column will be NaN
.
# Sum of each column with skipna=False
df2 = df.sum(skipna=False)
print("\nSum with skipna=False:\n", df2)
# Output:
# Sum with skipna=False:
# Studentname RamSamScottAnnJohn
# Mathematics NaN
# Science NaN
# English NaN
# dtype: object
In the above example, the sum()
method computes the sum of each column (Mathematics
, Science
, English
) including NaN
values. Since each column has at least one NaN
value, the sum for each column is NaN
.
Frequently Asked Questions Pandas DataFrame sum() Method
The sum()
method calculates the sum of values along the specified axis of a DataFrame. By default, it sums the values in each column (axis=0).
By default, the sum()
method skips NaN
values when computing the sum (skipna=True
).
Set skipna=False
to include NaN
values in the calculation. If any NaN
values are present, the sum will be NaN
.
The sum()
method primarily works on numeric data. If numeric_only=None
, it attempts to sum all data, and if it encounters non-numeric data, it skips those columns. You can enforce this with numeric_only=True
.
To sum specific columns only in a pandas DataFrame, you need to select those columns and then apply the sum()
method on the resulting DataFrame.
Conclusion
In this article, you have learned the Pandas DataFrame sum()
function by using its syntax, parameters, usage, and how you can calculate the total of the values along the specified axis.
Happy Learning!!
Related Articles
- Pandas Sum DataFrame Columns With Examples
- Pandas Sum DataFrame Rows With Examples
- Pandas Get Total / Sum of Columns
- Pandas DataFrame corr() Method
- Pandas DataFrame assign() Method
- Pandas DataFrame insert() Function
- Pandas DataFrame clip() Method
- Pandas DataFrame median() Method
- Pandas DataFrame div() Function
- Pandas DataFrame mode() Method
- Pretty Print Pandas DataFrame or Series?
- How to Compare Two Columns Using Pandas?