pandas DataFrame filter() – Usage & Examples

pandas filter() function filters the DataFame for rows and columns. The returned DataFrame contains only rows and columns that are specified with the function. It doesn’t update the existing DataFrame instead it always returns a new one.

In this article, I will explain how to filter multiple rows and columns from pandas DataFrame and also explain how to filter using regex (regular expression). DataFrame.loc[] is another way to select a group of rows and columns by indices and label(s) respectively.

Related:

1. pandas filter() Syntax

Following is the syntax of pandas.DataFrame.filter()


# Syntax of DataFrame.filter()
DataFrame.filter(items=None, like=None, regex=None, axis=None)
  • item – Takes list of axis labels that you wanted to filter.
  • like – Takes axis string label that you wanted to filter
  • regex – regular expression
  • axis{0 or ‘index’, 1 or ‘columns’, None}, default None. When not specified it used columns.

Let’s create a pandas DataFrame from Dict and understand usage with examples.


import pandas as pd
technologies= {
    'Courses':["Spark","PySpark","Spark","Java","PySpark","PHP"],
    'Fee' :[22000,25000,23000,24000,26000,27000],
    'Duration':['30days','50days','30days','60days','35days','30days']
          }
df = pd.DataFrame(technologies)
print(df)

#Outputs
#   Courses    Fee Duration
#0    Spark  22000   30days
#1  PySpark  25000   50days
#2    Spark  23000   30days
#3     Java  24000   60days
#4  PySpark  26000   35days
#5      PHP  27000   30days

2. filter() Columns by Labels

By default pandas.DataFrame.filter() select the columns by labels you specified using item, like, and regex parameters. You can also explicitly specify axis=1 to select columns.


# Filter columns
df2=df.filter(items=['Courses','Fee'])
print(df2)

# Outputs
#   Courses    Fee
#0    Spark  22000
#1  PySpark  25000
#2    Spark  23000
#3     Java  24000
#4  PySpark  26000
#5      PHP  27000

Note that items param is used to match on exact values. Use like param to match substring.


# Filter Columns using like
df2 = df.filter(like='ration', axis=1)
print(df2)

# Outputs
#  Duration
#0   30days
#1   50days
#2   30days
#3   60days
#4   35days
#5   30days

To filter columns with regular expressions, use regex param. The below example filters column that ends with the character e.


# Filr column names by regex
df2 = df.filter(regex='e$', axis=1)
print(df2)

# Outputs
#     Fee
#0  22000
#1  25000
#2  23000
#3  24000
#4  26000
#5  27000

3. pandas filter() by Index

Use axis=0 on filter() function to filter rows by index (indices). The below example filters rows by index 3 and 5.


# Filter rows
df2=df.filter(items=[3,5], axis=0)
print(df2)

# Outputs
#  Courses    Fee Duration
#3    Java  24000   60days
#5     PHP  27000   30days

Use like param to filter rows that match with substring. For our example, this doesn’t make sense as we have a numeric index. however, below is an example that demonstrates the usage of like param.


# Filter row using like
df2 = df.filter(like='4', axis=0)
print(df2)

# Outputs
#   Courses    Fee Duration
#4  PySpark  26000   35days

Conclusion

In this article, you have learned how to filter rows by indices and columns by labels using DataFrame.filter(). Also learned how to filter on exact match, similar match, and regular expression.

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing pandas DataFrame filter() – Usage & Examples