Pandas Select Rows Based on Column Values

  • Post author:
  • Post category:Pandas
  • Post last modified:November 3, 2023

You can select the Rows from Pandas DataFrame based on column values or based on multiple conditions either using DataFrame.loc[] attribute, DataFrame.query() or DataFrame.apply() method to use lambda function. In this article, I will explain how to select rows based on single or multiple column values (values from the list) and also how to select rows that have no None or Nan values.

1. Quick Examples of Select Rows Based on Column Values

If you are in hurry, below are some examples of how to select rows based on column values in pandas DataFrame.


# Select Rows Based on column Values
df[df["Courses"] == 'Spark'] 
df.loc[df['Courses'] == value]
df.query("Courses == 'Spark'")
df.loc[df['Courses'] != 'Spark']
df.loc[df['Courses'].isin(values)]
df.loc[~df['Courses'].isin(values)]

# Select Multiple Conditions using Multiple Columns
df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]
df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000 )]

# Using lambda function
df.apply(lambda row: row[df['Courses'].isin(['Spark','PySpark'])])

# Select columns that have no None & nana values
df.dropna()

# Other examples
df[df['Courses'].str.contains("Spark")]
df[df['Courses'].str.lower().str.contains("spark")]
df[df['Courses'].str.startswith("P")]

If you are a learner, Let’s see with sample data and run through these examples and explore the output to understand better. First, create a panda DataFrame from Dict.


import pandas as pd
import numpy as np

technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print(df)

Note that the above DataFrame also contains None and Nan values on Duration column that I would be using in my examples below to select rows that has None & Nan values or select ignoring these.

2. Select Rows Based on Column Values

You can use df[df["Courses"] == 'Spark'] to select rows. Not that this expression returns a new DataFrame with selected rows.


# Select Rows Based on Column Values
df2=df[df["Courses"] == 'Spark'] 
print(df2)

Yields below output


# Output:
  Courses    Fee Duration  Discount
0   Spark  22000   30days      1000

You can also write the above statement with a variable.


value="Spark"
df2=df[df["Courses"] == value] 

If you wanted to select based on column value not equals then use != operator. df[df["Courses"] != 'Spark']


df[df["Courses"] != 'Spark']

3. Using DataFrame.query()

Using query() method you can filter Pandas DataFrame rows using an expression, below is a simple example. You can use query() pretty much to run any example explained in this article.


# Using DataFrame.query() 
df2=df.query("Courses == 'Spark'")
print(df2)

Yields same output as above. You can also try other examples explained above with this approach.

4. Using DataFrame.loc to select based on Column Values

By using DataFrame.loc[].


# Using DataFrame.loc to select based on Column Values
df2=df.loc[df['Courses'] == "Spark"]
print(df2)

Yields same output as above. You can also try other examples explained above with this approach.

5. Select Rows Based on List of Column Values

If you have values in a list and wanted to select the rows based on the list of values use isin() method.


# Select Rows Based on List of Column Values
values=["Spark","PySpark"]
print(df[df["Courses"].isin(values)] )
# Using df.loc
print(df.loc[df['Courses'].isin(values)]) 

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300

Select rows from not in a list of column values can be done using ~ operator


df2=df.loc[~df['Courses'].isin(values)]
print(df2)

6. Using Multiple Column Conditions

Most of the time we would need to select the rows based on multiple conditions applying on multiple columns, you can do that in Pandas as below. Note that the parentheses are needed for each condition expression due to Python’s operator precedence rules. & operator binds more tightly than <= and >=. not using parenthesis will have unexpected results.


# Select Rows based on multiple conditions
print(df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)])
print(df[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)])
print(df.query("Discount >= 1000 & Discount <= 2000"))

Yields below output.


# Output:
  Courses    Fee Duration  Discount
0   Spark  22000   30days      1000
2  Hadoop  23000   30days      1000
3  Python  24000     None      1200

You can also use multiple columns to select Pandas DataFrame rows.


# Using Multiple Column Conditions 
df2=df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000 )]
print(df2)

Yields below output


# Output:
   Courses    Fee Duration  Discount
1  PySpark  25000   50days      2300
3   Python  24000     None      1200
4   Pandas  26000      NaN      2500

7. Select Rows Using DataFrame.apply()

DataFrame.apply() method is used to apply the lambda function row-by-row and return the rows that matched with the values.


# By using lambda function
print(df.apply(lambda row: row[df['Courses'].isin(['Spark','PySpark'])]))

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300

8. Using With None & nan Data

In case you wanted to drop rows that have None or nan on column values, use DataFrame.dropna() method.


# select rows by ignoreing columns that have None & Nan values
print(df.dropna())

Yields below output


# Output:
   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2300
2   Hadoop  23000   30days      1000

In case if you wanted to drop columns when column values are None or nan. To delete columns, I have covered some examples on how to drop Pandas DataFrame columns


print(df.dropna(axis='columns'))

Yields below output.


# Output:
   Courses    Fee  Discount
0    Spark  22000      1000
1  PySpark  25000      2300
2   Hadoop  23000      1000
3   Python  24000      1200
4   Pandas  26000      2500

9. Other Examples of Select Rows Based On Column Values


# Select based on value contains
print(df[df['Courses'].str.contains("Spark")])

# Select after converting values
print(df[df['Courses'].str.lower().str.contains("spark")])

#Select startswith
print(df[df['Courses'].str.startswith("P")])

Conclusion

In this article, I have explained 10 different examples to select Pandas Rows based on column values. Remember that when you select DataFrame Rows, it always returns a new DataFrame with selected rows. I hope this article helps you learn Pandas.

Happy Learning !!

References

Naveen

I am a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, I have honed my expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. My journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. I have started this SparkByExamples.com to share my experiences with the data as I come across. You can learn more about me at LinkedIn

Leave a Reply

You are currently viewing Pandas Select Rows Based on Column Values