pandas DataFrame filter() – Usage & Examples

  • Post author:
  • Post category:Pandas
  • Post last modified:January 9, 2024
  • Reading time:16 mins read

Pandas filter() function in Python is used the filters the DataFame rows and columns. The returned DataFrame contains only rows and columns that are specified with the function. It doesn’t update the existing DataFrame instead it always returns a new one.

In this article, I will explain how to filter multiple rows and columns from Pandas DataFrame and also explain how to filter using regex (regular expression). DataFrame.loc[] is another way to select a group of rows and columns by indices and label(s) respectively.

Related:

1. Quick Examples of pandas DataFrame filter() Function

If you are in a hurry, below are some quick examples of how to pandas DataFrame filter() function.


# Quick examples of pandas DataFrame filter() 

# Example 1: Filter columns
df2=df.filter(items=['Courses','Fee'])

# Example 2: Filter Columns using like param
df2 = df.filter(like='ration', axis=1)

# Example 3: Filter column names by regex
df2 = df.filter(regex='e$', axis=1)

# Example 4: Filter rows by index
df2=df.filter(items=[3,5], axis=0)

# Example 5: Filter row using like
df2 = df.filter(like='4', axis=0)

2. Syntax of filter()

Following is the syntax of pandas.DataFrame.filter()


# Syntax of DataFrame.filter()
DataFrame.filter(items=None, like=None, regex=None, axis=None)

2.1 Parameters of filter()

Following are the parameters of the filter() function.

  • item – A list of labels to include. This parameter is mutually exclusive with the ‘like’ and ‘regex’ parameters.
  • like – Takes axis string label that you wanted to filter.
  • regex – A regular expression specifying a match. If specified, only columns with names matching the regular expression will be included.
  • axis{0 or ‘index’, 1 or ‘columns’, None}, default None. When not specified it used columns.

Let’s create a pandas DataFrame from Dict and understand usage with examples.


# Create DataFrame
import pandas as pd
technologies= {
    'Courses':["Spark","PySpark","Spark","Java","PySpark","PHP"],
    'Fee' :[22000,25000,23000,24000,26000,27000],
    'Duration':['30days','50days','30days','60days','35days','30days']
          }
df = pd.DataFrame(technologies)
print("Create DataFrame\n",df)

Yields below output.

pandas filter

3. Pandas filter() Columns by Labels

By default pandas.DataFrame.filter() select the columns by labels you specified using item, like, and regex parameters. You can also explicitly specify axis=1 to select columns.

In the below example, the filter(items=['Courses','Fee']) part of the code filters the DataFrame df to include only the ‘Courses’ and ‘Fee’ columns. The resulting DataFrame, df2, contains only these two columns.


# Filter columns
df2=df.filter(items=['Courses','Fee'])
print("Filter columns by labels:\n",df2)

Yields below output.

pandas filter

Note that items param is used to match on exact values. Using the filter() method with the like parameter to filter columns based on partial matching. Specifically, you are keeping columns that contain the substring ‘ration’.

In the below example, only the column with the label ‘Duration’ contains the substring ‘ration’, so it is the only column retained in the filtered DataFrame df2. The axis=1 parameter indicates that the filtering is applied to columns.


# Filter Columns using like param
df2 = df.filter(like='ration', axis=1)
print(df2)

# Output:
#  Duration
# 0   30days
# 1   50days
# 2   30days
# 3   60days
# 4   35days
# 5   30days

To use the filter() method with the regex parameter to filter columns based on a regular expression. Specifically, you are keeping columns whose names end with the letter ‘e’.

In the below example, only the column with the label ‘Fee’ matches the regular expression ‘e$’ (indicating that the column name ends with ‘e’), so it is the only column retained in the filtered DataFrame df2. The axis=1 parameter indicates that the filtering is applied to columns.


# Filter column names by regex
df2 = df.filter(regex='e$', axis=1)
print("Filter column names by regex:\n",df2)

# Output:
# Filter column names by regex:
#     Fee
# 0  22000
# 1  25000
# 2  23000
# 3  24000
# 4  26000
# 5  27000

4. Pandas filter() Rows by Index

Use axis=0 on filter() function to filter rows by index (indices). The below example filters rows by index 3 and 5.

However, it’s important to note that the filter() method is primarily designed for filtering columns based on labels, and it doesn’t directly filter rows by index labels.


# Filter rows by index
df2=df.filter(items=[3,5], axis=0)
print("Filter rows by index:\n",df2)

# Output:
# Filter rows by index:
#  Courses    Fee Duration
# 3    Java  24000   60days
# 5     PHP  27000   30days

Use like param to filter rows that match with substring. For our example, this doesn’t make sense as we have a numeric index. however, below is an example that demonstrates the usage of like param.


# Filter row using like
df2 = df.filter(like='4', axis=0)
print(df2)

# Output:
#   Courses    Fee Duration
# 4  PySpark  26000   35days

Frequently Asked Questions on Pandas DataFrame filter() Function

What is the purpose of the filter() function in Pandas?

The filter() function in Pandas is used to subset rows or columns of a DataFrame based on labels. It allows you to filter and select specific columns or rows using criteria such as column names, partial matching, or regular expressions.

How does the filter() function work?

The filter() function can be applied to a Pandas DataFrame to selectively include or exclude columns or rows based on specified labels, partial matches, or regular expressions. It provides flexibility in choosing subsets of data.

How do I filter columns based on partial matching using filter()?

Use the like parameter of the filter() method in Pandas to filter columns based on partial matching. This allows you to select columns that contain a specific substring in their names.

Can I filter columns based on regular expressions with filter()?

You can use the regex parameter of the filter() method in Pandas to filter columns based on regular expressions. This allows you to select columns whose names match a specific regular expression pattern.

How can I use filter() to select specific columns in a DataFrame?

You can use the filter() method in Pandas to select specific columns in a DataFrame. The items parameter of the filter() method allows you to specify a list of column names that you want to include in the resulting DataFrame.

Are there any alternatives to the filter() function in Pandas?

Alternatives include using boolean indexing, the loc method, or directly indexing the DataFrame with column or row labels. The choice of method depends on the specific use case and preference.

Conclusion

In this article, you have learned how to filter rows by indices and columns by labels using DataFrame.filter(). Also learned how to filter on exact match, similar match, and regular expression.

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply

This Post Has One Comment

  1. Ivo

    Thanks for your help.