Pandas loc[] Multiple Conditions

When you wanted to select rows based on multiple conditions use pandas loc. It is a DataFrame property that is used to select rows and columns based on labels. Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows. Selecting columns from DataFrame results in a new DataFrame containing only specified selected columns from the original DataFrame.

In this article, I will explain how to select rows using pandas loc with multiple conditions.

1. Quick Examples of pandas loc[] with Multiple Conditions

Below are some quick examples of pandas.DataFrame.loc[] to select rows by checking multiple conditions


# Example 1 - Using loc[] with multiple conditions
df2=df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]

# Example 2
df2=df.loc[(df['Discount'] >= 1200) | (df['Fee'] >= 23000 )]
print(df2)

Let’s create a DataFrame and explore how to use pandas loc[].


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Outputs
#r1    Spark  20000    30day      1000
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200
#r4   Python  22000   40days      2500
#r5   pandas  24000   60days      2000

2. Using loc[] by Multiple Conditions

By using loc[] you can apply multiple conditions. Make sure you surround each condition with brac. Not using this will get you incorrect results.


df2=df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]
print(df2)

Yields below output.


   Courses    Fee Duration  Discount
r1   Spark  20000    30day      1000
r3  Hadoop  26000   35days      1200
r5  pandas  24000   60days      2000

let’s look at another example using or operator


df2=df.loc[(df['Discount'] >= 1200) | (df['Fee'] >= 23000 )]
print(df2)

Yields below output.


    Courses    Fee Duration  Discount
r2  PySpark  25000   40days      2300
r3   Hadoop  26000   35days      1200
r4   Python  22000   40days      2500
r5   pandas  24000   60days      2000

3. Complete Examples of pandas loc[] With Multiple Conditions


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Example 1 - Using loc[] with multiple conditions
df2=df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]
print(df2)

# Example 2
df2=df.loc[(df['Discount'] >= 1200) | (df['Fee'] >= 23000 )]
print(df2)

Conclusion

In this article, you have learned how to use loc[] property to filter or select DataFrame rows with many conditions.

Happy Learning !!

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Pandas loc[] Multiple Conditions