Pandas Count Rows with Condition

How to get pandas count rows with a condition? To count the number of rows that satisfy single/multiple conditions in pandas DataFrame using shape(), len(), df.index, and apply() with lambda functions. In this article, I will explain how to count the number of rows with conditions in DataFrame by using these functions with examples.

Quick Examples of Count Rows with Condition

If you are in a hurry, below are some quick examples of how to get pandas count rows with conditions.


# Quick examples of count rows with condition

# Example 1: Use len() function 
# To count rows with a single condition
df2 = len(df[df["Courses"]=="Pandas"])

# Example 2: Use len() function 
# To count rows with multiple conditions
df2 = len(df[(df["Courses"]=="Pandas") & 
             (df["Fee"]==35000)])

# Example 3: Count rows with multiple conditions
df2 = len(df[(df["Courses"]=="Pandas") & 
             (df["Fee"]==35000) & 
             (df["Duration"]>= "35days")])

# Example 4: Use Dataframe.apply() & lambda function
df2 = df.apply(lambda x : True
            if x['Courses'] == "Spark" else False, axis = 1)
df3 = len(df2[df2 == True].index)

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Courses, Fee, Duration and Discount.


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas","Spark","PySpark", "Pandas"],
    'Fee': [22000,25000,30000,35000,22000,25000,35000],
    'Duration':['30days','50days','40days','35days','30days','50days','60days'],
    'Discount':[1000,2000,2500,1500,1000,2000,1500]
              })
index_labels=['r1','r2','r3','r4','r5','r6','r7']
df = pd.DataFrame(technologies,index=index_labels)
print("Create DataFrame:\n", df)

Yields below output.

Pandas len() Function to Count Rows by Condition

To count the number of rows that satisfy the condition, you should use first df[] to filter the rows and then use the len() to count the rows after the rows are filtered with the condition. You need to select the “Courses” column in DataFrame to check if any value of the “Courses” column is equal to “Pandas”. When it condition matched len() function counts the number of rows that contain it.


# Use len() function 
# To count rows with single condition
df2 = len(df[df["Courses"]=="Pandas"])
print("Get the rows count with condition:\n", df2)

Yields below output.

Use len() Function to Count Rows with Multiple Conditions

Alternatively, you can also use len() function to count the rows after filtering rows by multiple conditions in DataFrame. Here, I apply the multiple conditions with"Courses" column and the "Fee" column and then get the count after the filter. The condition I use is "Courses" column checks the values are equal to "Pandas". Whereas, the condition on “Fee” checks the values equal to 35000.


# Use len() function to count rows with multiple condition
df2 = len(df[(df["Courses"]=="Pandas") & 
         (df["Fee"]==35000)])
print("Get the rows count of with multiple conditions:\n", df2)

# Output:
# Get the rows count of with multiple conditions:
# 2

# Count rows with multiple condition
df2 = len(df[(df["Courses"]=="Pandas") & 
         (df["Fee"]==35000) & 
         (df["Duration"]>= "35days")])
print("Get the rows count of with multiple conditions:\n", df2)

# Output:
# Get the rows count of with multiple conditions:
# 2

Use Dataframe.apply() & Lambda Function

Similarly, pass a lambda expression with conditions into DataFrame.apply() function to flag the rows that need to be filtered and then apply the len() to get the count.


# Use Dataframe.apply() & lambda Function
df2 = df.apply(lambda x : True
            if x['Courses'] == "Spark" else False, axis = 1)
df3 = len(df2[df2 == True].index)
print("Get the rows count of with condition:\n", df2)

# Output:
# Get the rows count of with condition:
# 2

Complete Example For Count Rows with Condition


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas","Spark","PySpark", "Pandas"],
    'Fee': [22000,25000,30000,35000,22000,25000,35000],
    'Duration':['30days','50days','40days','35days','30days','50days','60days'],
    'Discount':[1000,2000,2500,1500,1000,2000,1500]
              })
index_labels=['r1','r2','r3','r4','r5','r6','r7']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Pandas count rows dataframe.index
df2 = len(df.index)
print(df2)

# Pandas count rows using len()
df2 = len(df)
print(df2)

# Use len() function to count rows with single condition
df2 = len(df[df["Courses"]=="Pandas"])
print(df2)

# Use len() function to count rows with multiple condition
df2 = len(df[(df["Courses"]=="Pandas") & 
          (df["Fee"]==35000)])
print(df2)

# Count rows with multiple condition
df2 = len(df[(df["Courses"]=="Pandas") & 
         (df["Fee"]==35000) & 
         (df["Duration"]>= "35days")])
print(df2)

# Use Dataframe.apply() & Lambda Function
df2 = df.apply(lambda x : True
            if x['Courses'] == "Spark" else False, axis = 1)
df3 = len(df2[df2 == True].index)
print(df3)

Frequently Asked Questions on Pandas Count Rows with Condition

How do I count rows that meet multiple conditions in Pandas?

You can use the logical AND (&) or logical OR (|) operators to specify the multiple conditions in order to count the number of rows. For example, row_count = ((df['col1'] > col1_value) & (df['col2'] == 'col2_value')).sum()

Can I count missing values in a specific column?

You can count missing (NaN) values in a specific column using the isna() function. For example, row_count = df['column'].isna().sum()

Conclusion

In this article, I have explained how to count the number of rows that meet specific single/ multiple conditions in pandas DataFrame using DataFrame.shape(), len(), DataFrrame.index, and Dataframe.apply() & lambda function with examples.

Happy Learning !!

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.count.html