Pandas filter() function in Python is used the filters the DataFame rows and columns. The returned DataFrame contains only rows and columns that are specified with the function. It doesn’t update the existing DataFrame instead it always returns a new one.
In this article, I will explain how to filter multiple rows and columns from Pandas DataFrame and also explain how to filter using regex (regular expression). DataFrame.loc[]
is another way to select a group of rows and columns by indices and label(s) respectively.
Key Points –
filter()
is used to subset a DataFrame based on labels, focusing on rows, columns, or both.- It provides three main parameters:
items
,like
, andregex
for selecting columns or rows. - The
items
parameter allows you to specify an exact list of column or row labels to include. - Use the
like
parameter to filter columns or rows containing a specific substring in their labels. - The
regex
parameter supports pattern matching, enabling selection based on complex label patterns. - By default,
filter()
works on columns, but you can setaxis=0
to apply filtering to row labels. - Combining
filter()
with other methods allows you to refine selections based on both labels and values.
Related:
- pandas.DataFrame.query() – To query rows based on column value (condition).
- pandas.DataFrame.loc[] – To select rows by indices label and column by name.
- pandas.DataFrame.iloc[] – To select rows by index and column by position.
- pandas.DataFrame.apply() – To custom select using lambda function.
1. Quick Examples of pandas DataFrame filter() Function
If you are in a hurry, below are some quick examples of how to pandas DataFrame filter() function.
# Quick examples of pandas DataFrame filter()
# Example 1: Filter columns
df2=df.filter(items=['Courses','Fee'])
# Example 2: Filter Columns using like param
df2 = df.filter(like='ration', axis=1)
# Example 3: Filter column names by regex
df2 = df.filter(regex='e$', axis=1)
# Example 4: Filter rows by index
df2=df.filter(items=[3,5], axis=0)
# Example 5: Filter row using like
df2 = df.filter(like='4', axis=0)
2. Syntax of filter()
Following is the syntax of pandas.DataFrame.filter()
# Syntax of DataFrame.filter()
DataFrame.filter(items=None, like=None, regex=None, axis=None)
2.1 Parameters of filter()
Following are the parameters of the filter() function.
item
– A list of labels to include. This parameter is mutually exclusive with the ‘like’ and ‘regex’ parameters.like
– Takes axis string label that you wanted to filter.regex
– A regular expression specifying a match. If specified, only columns with names matching the regular expression will be included.axis
– {0 or ‘index’, 1 or ‘columns’, None}, default None. When not specified it usedcolumns
.
Let’s create a pandas DataFrame from Dict and understand usage with examples.
# Create DataFrame
import pandas as pd
technologies= {
'Courses':["Spark","PySpark","Spark","Java","PySpark","PHP"],
'Fee' :[22000,25000,23000,24000,26000,27000],
'Duration':['30days','50days','30days','60days','35days','30days']
}
df = pd.DataFrame(technologies)
print("Create DataFrame\n",df)
Yields below output.
3. Pandas filter() Columns by Labels
By default pandas.DataFrame.filter()
select the columns by labels you specified using item
, like
, and regex
parameters. You can also explicitly specify axis=1
to select columns.
In the below example, the filter(items=['Courses','Fee'])
part of the code filters the DataFrame df
to include only the ‘Courses’ and ‘Fee’ columns. The resulting DataFrame, df2
, contains only these two columns.
# Filter columns
df2=df.filter(items=['Courses','Fee'])
print("Filter columns by labels:\n",df2)
Yields below output.
Note that items
param is used to match on exact values. Using the filter()
method with the like
parameter to filter columns based on partial matching. Specifically, you are keeping columns that contain the substring ‘ration’.
In the below example, only the column with the label ‘Duration’ contains the substring ‘ration’, so it is the only column retained in the filtered DataFrame df2
. The axis=1
parameter indicates that the filtering is applied to columns.
# Filter Columns using like param
df2 = df.filter(like='ration', axis=1)
print(df2)
# Output:
# Duration
# 0 30days
# 1 50days
# 2 30days
# 3 60days
# 4 35days
# 5 30days
To use the filter()
method with the regex
parameter to filter columns based on a regular expression. Specifically, you are keeping columns whose names end with the letter ‘e’.
In the below example, only the column with the label ‘Fee’ matches the regular expression ‘e$’ (indicating that the column name ends with ‘e’), so it is the only column retained in the filtered DataFrame df2
. The axis=1
parameter indicates that the filtering is applied to columns.
# Filter column names by regex
df2 = df.filter(regex='e$', axis=1)
print("Filter column names by regex:\n",df2)
# Output:
# Filter column names by regex:
# Fee
# 0 22000
# 1 25000
# 2 23000
# 3 24000
# 4 26000
# 5 27000
4. Pandas filter() Rows by Index
Use axis=0
on filter()
function to filter rows by index (indices). The below example filters rows by index 3
and 5
.
However, it’s important to note that the filter()
method is primarily designed for filtering columns based on labels, and it doesn’t directly filter rows by index labels.
# Filter rows by index
df2=df.filter(items=[3,5], axis=0)
print("Filter rows by index:\n",df2)
# Output:
# Filter rows by index:
# Courses Fee Duration
# 3 Java 24000 60days
# 5 PHP 27000 30days
Use like
param to filter rows that match with substring. For our example, this doesn’t make sense as we have a numeric index. however, below is an example that demonstrates the usage of like
param.
# Filter row using like
df2 = df.filter(like='4', axis=0)
print(df2)
# Output:
# Courses Fee Duration
# 4 PySpark 26000 35days
Frequently Asked Questions on Pandas DataFrame filter() Function
The filter()
function in Pandas is used to subset rows or columns of a DataFrame based on labels. It allows you to filter and select specific columns or rows using criteria such as column names, partial matching, or regular expressions.
The filter()
function can be applied to a Pandas DataFrame to selectively include or exclude columns or rows based on specified labels, partial matches, or regular expressions. It provides flexibility in choosing subsets of data.
Use the like
parameter of the filter()
method in Pandas to filter columns based on partial matching. This allows you to select columns that contain a specific substring in their names.
You can use the regex
parameter of the filter()
method in Pandas to filter columns based on regular expressions. This allows you to select columns whose names match a specific regular expression pattern.
You can use the filter()
method in Pandas to select specific columns in a DataFrame. The items
parameter of the filter()
method allows you to specify a list of column names that you want to include in the resulting DataFrame.
Alternatives include using boolean indexing, the loc
method, or directly indexing the DataFrame with column or row labels. The choice of method depends on the specific use case and preference.
Conclusion
In this article, you have learned how to filter rows by indices and columns by labels using DataFrame.filter()
. Also learned how to filter on exact match, similar match, and regular expression.
Related Articles
- Pandas Series filter() Function
- How to Rename a Pandas Series
- How to Convert List to Pandas Series
- Create a Set From a Series in Pandas
- Pandas Series apply() Function Usage
- Check Values of Pandas Series is Unique
- Pandas DataFrame isna() Function
- Pandas DataFrame insert() Function
- Pandas Add Column with Default Value
- Compare Two DataFrames Row by Row
- Pandas Select Rows Based on List Index
- Pandas Get First Column of DataFrame as Series?
Thanks for your help.