Pandas filter() function in Python is used the filters the DataFame rows and columns. The returned DataFrame contains only rows and columns that are specified with the function. It doesn’t update the existing DataFrame instead it always returns a new one.
In this article, I will explain how to filter multiple rows and columns from pandas DataFrame and also explain how to filter using regex (regular expression). DataFrame.loc[] is another way to select a group of rows and columns by indices and label(s) respectively.
Related:
- pandas.DataFrame.query() – To query rows based on column value (condition).
- pandas.DataFrame.loc[] – To select rows by indices label and column by name.
- pandas.DataFrame.iloc[] – To select rows by index and column by position.
- pandas.DataFrame.apply() – To custom select using lambda function.
1. filter() Syntax
Following is the syntax of pandas.DataFrame.filter()
# Syntax of DataFrame.filter()
DataFrame.filter(items=None, like=None, regex=None, axis=None)
item
– Takes list of axis labels that you wanted to filter.like
– Takes axis string label that you wanted to filterregex
– regular expressionaxis
– {0 or ‘index’, 1 or ‘columns’, None}, default None. When not specified it usedcolumns
.
Let’s create a pandas DataFrame from Dict and understand usage with examples.
import pandas as pd
technologies= {
'Courses':["Spark","PySpark","Spark","Java","PySpark","PHP"],
'Fee' :[22000,25000,23000,24000,26000,27000],
'Duration':['30days','50days','30days','60days','35days','30days']
}
df = pd.DataFrame(technologies)
print(df)
#Outputs
# Courses Fee Duration
#0 Spark 22000 30days
#1 PySpark 25000 50days
#2 Spark 23000 30days
#3 Java 24000 60days
#4 PySpark 26000 35days
#5 PHP 27000 30days
2. Pandas filter() Columns by Labels
By default pandas.DataFrame.filter() select the columns by labels you specified using item
, like
, and regex
parameters. You can also explicitly specify axis=1
to select columns.
# Filter columns
df2=df.filter(items=['Courses','Fee'])
print(df2)
# Outputs
# Courses Fee
#0 Spark 22000
#1 PySpark 25000
#2 Spark 23000
#3 Java 24000
#4 PySpark 26000
#5 PHP 27000
Note that items
param is used to match on exact values. Use like param to match substring.
# Filter Columns using like
df2 = df.filter(like='ration', axis=1)
print(df2)
# Outputs
# Duration
#0 30days
#1 50days
#2 30days
#3 60days
#4 35days
#5 30days
To filter columns with regular expressions, use regex
param. The below example filters column that ends with the character e
.
# Filr column names by regex
df2 = df.filter(regex='e$', axis=1)
print(df2)
# Outputs
# Fee
#0 22000
#1 25000
#2 23000
#3 24000
#4 26000
#5 27000
3. Pandas filter() Rows by Index
Use axis=0
on filter() function to filter rows by index (indices). The below example filters rows by index 3
and 5
.
# Filter rows
df2=df.filter(items=[3,5], axis=0)
print(df2)
# Outputs
# Courses Fee Duration
#3 Java 24000 60days
#5 PHP 27000 30days
Use like
param to filter rows that match with substring. For our example, this doesn’t make sense as we have a numeric index. however, below is an example that demonstrates the usage of like
param.
# Filter row using like
df2 = df.filter(like='4', axis=0)
print(df2)
# Outputs
# Courses Fee Duration
#4 PySpark 26000 35days
Conclusion
In this article, you have learned how to filter rows by indices and columns by labels using DataFrame.filter(). Also learned how to filter on exact match, similar match, and regular expression.
Related Articles
- Pandas Series filter() Function
- Pandas Series apply() Function Usage
- Check Values of Pandas Series is Unique
- How to Rename a Pandas Series
- Pandas Get First Column of DataFrame as Series?
- How to Convert List to Pandas Series
- Create a Set From a Series in Pandas
Thanks for your help.