Pandas Get Count of Each Row of DataFrame

  • Post author:
  • Post category:Pandas
  • Post last modified:February 23, 2024
  • Reading time:10 mins read

In Pandas, You can get the count of each row of DataFrame using DataFrame.count() method. In order to get the row count you should use axis='columns' as an argument to the count() method. Note that the count() method ignores all None & nan values from the count.

Key Points –

  • Use the axis parameter with a value of 1 to count along the rows (horizontally).
  • Computing row counts with Pandas’ count(axis=1) method is efficient, especially for large datasets, as it leverages vectorized operations.
  • Counting non-null values in each row provides a quick integrity check, helping identify missing or incomplete data within the DataFrame.
  • Pandas automatically handles NaN (Not a Number) values in the DataFrame.
  • The result of count(axis=1) is a Pandas Series containing the counts for each row.

Syntax of df.count()

Following is the syntax of df.count().


# Syntax of df.count()
df.count(axis='columns')

Now let’s create a DataFrame, run these, and explore the output. Our DataFrame contains just two columns CoursesCourses Fee, Duration, and Discount.


import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Courses Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses  Courses Fee Duration  Discount
0    Spark        22000   30days      1000
1  PySpark        25000   50days      2300
2   Hadoop        23000   30days      1000
3   Python        24000     None      1200
4   Pandas        26000      NaN      2500

Pandas Get Count of Each DataFrame Row

Now, let’s run the DatFrame.count() to get the count of each row by ignoring None and Nan values. For instance,
the count() method in Pandas can be used to count the number of non-null values along a specified axis. If you’re interested in counting the non-null values in each row, you would use axis=1 or axis='columns'. However, the correct usage is with axis=1 rather than axis='columns'.


# Get count of each dataframe row 
df2 = df.count(axis='columns')
print(df2)

Yields below output. Note that Rows 3 and 4 are 3 as these two rows have None or Nan values.


# Output:
0    4
1    4
2    4
3    3
4    3

Similarly, you can get the count of non-null values in each row of a DataFrame using Pandas. This will give you a Series containing the count of non-null values in each row of the DataFrame df.


# Get count of each DataFrame row
row_counts = df.count(axis=1)
print(row_counts)

In the above example, df.count(axis=1) is used to count the number of non-null values in each row of the DataFrame df, and the resulting counts are stored in the row_counts Series. Yields the same output as above.

Frequently Asked Questions on Get Count of Each Row of DataFrame

What does count(axis=1) do in Pandas?

The count(axis=1) method in Pandas counts the number of non-null values in each row of a DataFrame along the specified axis.

How do I count non-null values in each row of a DataFrame?

You can use the count(axis=1) method in Pandas. It returns a Series containing the count of non-null values for each row.

How can I handle missing values while counting each row in a DataFrame?

Pandas automatically handles missing values (NaN) when counting non-null values in each row using the count(axis=1) method. It ignores NaN values during the count.

What is the performance impact of using count(axis=1) on large DataFrames?

The count(axis=1) method in Pandas is designed to be efficient, especially for large datasets, as it leverages vectorized operations, making it suitable for performance-critical tasks.

Can I customize the counting process for specific requirements?

While count(axis=1) provides a straightforward way to count non-null values in each row, you can customize the counting process further by combining it with other Pandas methods or functions based on specific requirements

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply