Pandas Count Rows with Condition

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:December 18, 2022

How to get pandas count rows with a condition? To get the count rows with a single condition and multiple conditions in pandas DataFrame using either shape(), len(), df.index, and apply() with lambda functions. In this article, I will explain how to count the number of rows with conditions in DataFrame by using these functions with examples.

1. Quick Examples of Count Rows with Condition

If you are in a hurry, below are some quick examples of how to get pandas count rows with conditions.


# Below are a quick example

# Example 1: Use len() function 
# to count rows with a single condition
df2 = len(df[df["Courses"]=="Pandas"])

# Example 2: Use len() function 
# to count rows with multiple conditions
df2 = len(df[(df["Courses"]=="Pandas") & 
             (df["Fee"]==35000)])

# Example 3:  Count rows with multiple conditions
df2 = len(df[(df["Courses"]=="Pandas") & 
             (df["Fee"]==35000) & 
             (df["Duration"]>= "35days")])

# Example 4:  Use Dataframe.apply() & lambda function
df2 = df.apply(lambda x : True
            if x['Courses'] == "Spark" else False, axis = 1)
df3 = len(df2[df2 == True].index)

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are CoursesFeeDuration and Discount.


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas","Spark","PySpark", "Pandas"],
    'Fee': [22000,25000,30000,35000,22000,25000,35000],
    'Duration':['30days','50days','40days','35days','30days','50days','60days'],
    'Discount':[1000,2000,2500,1500,1000,2000,1500]
              })
index_labels=['r1','r2','r3','r4','r5','r6','r7']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

Yields below output.


    Courses    Fee Duration  Discount
r1    Spark  22000   30days      1000
r2  PySpark  25000   50days      2000
r3   Hadoop  30000   40days      2500
r4   Pandas  35000   35days      1500
r5    Spark  22000   30days      1000
r6  PySpark  25000   50days      2000
r7   Pandas  35000   60days      1500

2. Pandas len() Function to Count Rows by Condition

To get the number of rows to count that matches the condition, you should use first df[] to filter the rows and then us the len() to count the rows after rows are filtered with the condition. You need to select the "Courses" column in DataFrame to check if any value of the "Courses" column is equal to "Pandas". When it condition matched len() function counts the number of rows that contains it.


# Use len() function to count rows with single condition
df2 = len(df[df["Courses"]=="Pandas"])
print(df2)

# Output
# 2

5. Use len() Function to Count Rows with Multiple Conditions

Similarly, you can also use len() function to count the rows after filtering rows by multiple conditions in DataFrame. Here, I apply the multiple conditions with"Courses" column and the "Fee" column and then get the count after the filter. The condition I use is "Courses" column checks the values are equal to "Pandas". Whereas, the condition on “Fee” checks the values equal to 35000.


# Use len() function to count rows with multiple condition
df2 = len(df[(df["Courses"]=="Pandas") & 
         (df["Fee"]==35000)])
print(df2)

# Output
# 2

# Count rows with multiple condition
df2 = len(df[(df["Courses"]=="Pandas") & 
         (df["Fee"]==35000) & 
         (df["Duration"]>= "35days")])
print(df2)

# Output
# 2

6. Use Dataframe.apply() & Lambda Function

Pass a lambda expression with conditions into Dataframe.apply() function to flag the rows that need to filter and then apply the len() to get the count.


# Use Dataframe.apply() & Lambda Function
df2 = df.apply(lambda x : True
            if x['Courses'] == "Spark" else False, axis = 1)
df3 = len(df2[df2 == True].index)
print(df3)

# Output
# 2

7. Complete Example For Count Rows with Condition


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas","Spark","PySpark", "Pandas"],
    'Fee': [22000,25000,30000,35000,22000,25000,35000],
    'Duration':['30days','50days','40days','35days','30days','50days','60days'],
    'Discount':[1000,2000,2500,1500,1000,2000,1500]
              })
index_labels=['r1','r2','r3','r4','r5','r6','r7']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Pandas count rows dataframe.index
df2 = len(df.index)
print(df2)

# Pandas count rows using len()
df2 = len(df)
print(df2)

# Use len() function to count rows with single condition
df2 = len(df[df["Courses"]=="Pandas"])
print(df2)

# Use len() function to count rows with multiple condition
df2 = len(df[(df["Courses"]=="Pandas") & 
          (df["Fee"]==35000)])
print(df2)

# Count rows with multiple condition
df2 = len(df[(df["Courses"]=="Pandas") & 
         (df["Fee"]==35000) & 
         (df["Duration"]>= "35days")])
print(df2)

# Use Dataframe.apply() & Lambda Function
df2 = df.apply(lambda x : True
            if x['Courses'] == "Spark" else False, axis = 1)
df3 = len(df2[df2 == True].index)
print(df3)

8. Conclusion

In this article, I have explained how to get count rows with single and multiple conditions in pandas DataFrame using DataFrame.shape(), len(), DataFrrame.index, and Dataframe.apply() & lambda function with examples.

Happy Learning !!

Related Articles

References

Leave a Reply