• Post author:
  • Post category:Pandas
  • Post last modified:June 24, 2024
  • Reading time:15 mins read

In pandas, the sum() method is used to compute the sum of the values over the requested axis in a DataFrame. It can be applied to both rows and columns, and it can handle both numerical and non-numerical data types.

Advertisements

In this article, I will explain the Pandas DataFrame sum() function by using its syntax, parameters, usage, and how we can return a Series containing the sum of the values for the specified axis.

Key Points –

  • The sum() method can aggregate data along the specified axis, where axis=0 sums down the columns (default), and axis=1 sums across the rows.
  • By default, the sum() method skips NaN values (skipna=True). If skipna=False, the result will be NaN for any axis with a NaN value.
  • Use numeric_only parameter and set it to True to include only float, int, and boolean columns in the summation, ignoring non-numeric data types.
  • The min_count parameter specifies the minimum number of valid (non-NA) values required to perform the sum. If fewer than min_count non-NA values are present, the result is NaN.

Syntax of Pandas DataFrame sum() Method

Let’s know the syntax of the Pandas DataFrame sum() method.


# Syntax of Pandas dataframe sum()
DataFrame.sum(axis=None, skipna=True, level=None, numeric_only=None, min_count=0, **kwargs)

Parameters of the DataFrame.sum()

Following are the parameters of the DataFrame.sum() method.

  • axis – {index (0), columns (1)}, default 0. The axis to sum over. If 0 or index, sum over rows (columns are reduced). If 1 or columns, sum over columns (rows are reduced).
  • skipna – bool, default True. Exclude NA/null values when computing the result. If an entire row/column is NA and skipna is True, the result will be 0.
  • level – int or level name, default None. If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame.
  • numeric_only – bool, default None. Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.
  • min_count – int, default 0. The required number of valid values to perform the operation. If fewer than min_count non-NA values are present, the result will be NA.
  • kwargs – Additional keyword arguments.

Usage of Pandas DataFrame sum()

The sum() method in Pandas DataFrame is used to calculate the sum of the values along a given axis (rows or columns). This method returns a Series containing the sum of the values for the specified axis.

To run some examples of the Pandas DataFrame sum() method, let’s create a Pandas DataFrame using data from a dictionary.


# Create DataFrame
import pandas as pd
studentdetails = {
       "Studentname":["Ram", "Sam", "Scott", "Ann", "John"],
       "Mathematics" :[80,90,85,70,95],
       "Science" :[85,95,80,90,75],
       "English" :[90,85,80,70,95]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(studentdetails ,index=index_labels)
print("Create DataFrame:\n", df)

Yields below output.

pandas dataframe sum

Sum of Each Column

To calculate the sum of each column (default behavior) in the DataFrame, you can use the sum() function provided by pandas.


# Sum of each column
column_sum = df.sum()
print("Sum of each column:\n", column_sum)

Yields below output.

pandas dataframe sum

The Studentname column sums to a concatenation of the names because it contains strings, while the numeric columns sum to their respective totals. If you want to exclude the Studentname column from the sum operation, you can specify the numeric columns explicitly


# Calculate the sum of each numeric column
numeric_column_sums = df[['Mathematics', 'Science', 'English']].sum()
print("Sum of Each Numeric Column:\n", numeric_column_sums)

# Output:
# Sum of Each Numeric Column:
#  Mathematics    420
# Science        425
# English        420
# dtype: int64

Sum of Each Row

To calculate the sum of each row in the DataFrame, you can use the sum() function with the axis=1 parameter.


# Sum of each row
row_sum = df.sum(axis=1)
print("Sum of each row:\n", row_sum)

# Output:
# Sum of each row:
# r1    255
# r2    270
# r3    245
# r4    230
# r5    265
# dtype: int64

Sum with Missing Values (Handling NaN)

To calculate the sum of each row or column while handling missing values (NaN), you can use the sum() function.


import pandas as pd
import numpy as np

# Create a DataFrame with missing values (NaN)
studentdetails_with_nan = {
    "Studentname": ["Ram", "Sam", "Scott", "Ann", "John"],
    "Mathematics": [80, 90, 85, np.nan, 95],
    "Science": [85, 95, np.nan, 90, 75],
    "English": [90, np.nan, 80, 70, 95]
}

index_labels = ['r1', 'r2', 'r3', 'r4', 'r5']
df = pd.DataFrame(studentdetails_with_nan, index=index_labels)
print("DataFrame with NaN values:\n", df)

# Output:
# DataFrame with NaN values:
#     Studentname  Mathematics  Science  English
# r1         Ram         80.0     85.0     90.0
# r2         Sam         90.0     95.0      NaN
# r3       Scott         85.0      NaN     80.0
# r4         Ann          NaN     90.0     70.0
# r5        John         95.0     75.0     95.0

When using the sum() method in pandas DataFrames, skipna=True is the default behavior. This means that any missing values (NaN) are automatically excluded from the sum calculation.


# Sum of each column with skipna=True (default behavior)
df2 = df.sum(skipna=True)
print("Sum with skipna=True:\n", df2)

# Output:
# Sum with skipna=True:
#  Studentname    RamSamScottAnnJohn
# Mathematics                   350
# Science                       345
# English                       335
# dtype: object

In the above examples, The sum() method computes the sum of each column (Mathematics, Science, English) while skipping any NaN values. The resulting sums (350.0, 345.0, 335.0) represent the total of the non-NaN values in each respective column.

Sum with skipna=False

When skipna=False is specified with the sum() method in pandas DataFrames, NaN values are not ignored during the sum calculation. Instead, if any NaN values are present in a column, the sum for that column will be NaN.


# Sum of each column with skipna=False
df2 = df.sum(skipna=False)
print("\nSum with skipna=False:\n", df2)

# Output:
# Sum with skipna=False:
#  Studentname    RamSamScottAnnJohn
# Mathematics                   NaN
# Science                       NaN
# English                       NaN
# dtype: object

In the above example, the sum() method computes the sum of each column (Mathematics, Science, English) including NaN values. Since each column has at least one NaN value, the sum for each column is NaN.

Frequently Asked Questions Pandas DataFrame sum() Method

What does the sum() method do in a pandas DataFrame?

The sum() method calculates the sum of values along the specified axis of a DataFrame. By default, it sums the values in each column (axis=0).

How does the sum() method handle missing values (NaN) by default?

By default, the sum() method skips NaN values when computing the sum (skipna=True).

What if I want the sum to be NaN if there are any NaN values in the data

Set skipna=False to include NaN values in the calculation. If any NaN values are present, the sum will be NaN.

Does the sum() method work on non-numeric data?

The sum() method primarily works on numeric data. If numeric_only=None, it attempts to sum all data, and if it encounters non-numeric data, it skips those columns. You can enforce this with numeric_only=True.

How do I sum specific columns only?

To sum specific columns only in a pandas DataFrame, you need to select those columns and then apply the sum() method on the resulting DataFrame.

Conclusion

In this article, you have learned the Pandas DataFrame sum() function by using its syntax, parameters, usage, and how you can calculate the total of the values along the specified axis.

Happy Learning!!

Reference