pandas DataFrame filter() – Usage & Examples

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:January 28, 2023
Spread the love

Pandas filter() function in Python is used the filters the DataFame rows and columns. The returned DataFrame contains only rows and columns that are specified with the function. It doesn’t update the existing DataFrame instead it always returns a new one.

In this article, I will explain how to filter multiple rows and columns from pandas DataFrame and also explain how to filter using regex (regular expression). DataFrame.loc[] is another way to select a group of rows and columns by indices and label(s) respectively.

Related:

1. filter() Syntax

Following is the syntax of pandas.DataFrame.filter()


# Syntax of DataFrame.filter()
DataFrame.filter(items=None, like=None, regex=None, axis=None)
  • item – Takes list of axis labels that you wanted to filter.
  • like – Takes axis string label that you wanted to filter
  • regex – regular expression
  • axis{0 or ‘index’, 1 or ‘columns’, None}, default None. When not specified it used columns.

Let’s create a pandas DataFrame from Dict and understand usage with examples.


import pandas as pd
technologies= {
    'Courses':["Spark","PySpark","Spark","Java","PySpark","PHP"],
    'Fee' :[22000,25000,23000,24000,26000,27000],
    'Duration':['30days','50days','30days','60days','35days','30days']
          }
df = pd.DataFrame(technologies)
print(df)

#Outputs
#   Courses    Fee Duration
#0    Spark  22000   30days
#1  PySpark  25000   50days
#2    Spark  23000   30days
#3     Java  24000   60days
#4  PySpark  26000   35days
#5      PHP  27000   30days

2. Pandas filter() Columns by Labels

By default pandas.DataFrame.filter() select the columns by labels you specified using item, like, and regex parameters. You can also explicitly specify axis=1 to select columns.


# Filter columns
df2=df.filter(items=['Courses','Fee'])
print(df2)

# Outputs
#   Courses    Fee
#0    Spark  22000
#1  PySpark  25000
#2    Spark  23000
#3     Java  24000
#4  PySpark  26000
#5      PHP  27000

Note that items param is used to match on exact values. Use like param to match substring.


# Filter Columns using like
df2 = df.filter(like='ration', axis=1)
print(df2)

# Outputs
#  Duration
#0   30days
#1   50days
#2   30days
#3   60days
#4   35days
#5   30days

To filter columns with regular expressions, use regex param. The below example filters column that ends with the character e.


# Filr column names by regex
df2 = df.filter(regex='e$', axis=1)
print(df2)

# Outputs
#     Fee
#0  22000
#1  25000
#2  23000
#3  24000
#4  26000
#5  27000

3. Pandas filter() Rows by Index

Use axis=0 on filter() function to filter rows by index (indices). The below example filters rows by index 3 and 5.


# Filter rows
df2=df.filter(items=[3,5], axis=0)
print(df2)

# Outputs
#  Courses    Fee Duration
#3    Java  24000   60days
#5     PHP  27000   30days

Use like param to filter rows that match with substring. For our example, this doesn’t make sense as we have a numeric index. however, below is an example that demonstrates the usage of like param.


# Filter row using like
df2 = df.filter(like='4', axis=0)
print(df2)

# Outputs
#   Courses    Fee Duration
#4  PySpark  26000   35days

Conclusion

In this article, you have learned how to filter rows by indices and columns by labels using DataFrame.filter(). Also learned how to filter on exact match, similar match, and regular expression.

References

Leave a Reply

This Post Has One Comment

  1. Ivo

    Thanks for your help.

You are currently viewing pandas DataFrame filter() – Usage & Examples