To filter Pandas Dataframe rows by Index use filter()
function. Use axis=0
as a param to the function to filter rows by index (indices). This function filter() is used to Subset rows of the Dataframe according to labels in the specified index. It doesn’t update the existing DataFrame instead it always returns a new one.
In this article, I will explain pandas filter by index and how we can get a DataFrame containing only rows and columns that are specified with the function. and also explain how to filter using items
, like
expression. DataFrame.index.isin()
is another way to filter the rows of a pandas DataFrame based on index values.
1. Quick Examples of Filter by Index
If you are in a hurry, below are some quick examples of how to get filtered by index in Pandas DataFrame.
# Quick examples of filter by index
# Example 1: Pandas filter() by index
df2= df.filter(items = [2], axis=0)
# Example 2: Use filter() by index along axis=0
df2= df.filter(items=[3, 5], axis=0)
# Example 3: Filter row using like
df2 = df.filter(like='4', axis=0)
# Example 4: filter for rows in list
# Use DataFrme.index.isin() function
list = [1, 3, 6]
df2 = df[df.index.isin(list)]
# Example 5: pandas filter() by Non-numeric index
index_labels=['Inx_A','Inx_B','Inx_C','Inx_AA','Inx_BB','Inx_AC','Inx_CB']
df = pd.DataFrame(technologies,index=index_labels)
df2 = df.filter(like = 'Inx_A', axis=0)
# Example 6: pandas filter() by Non-numeric two indexes
df2= df.filter(items=['Inx_B', 'Inx_BB'], axis=0)
# Example 7: Filter by non-numeric index
# Use DataFrme.index.isin() function
list = ['Inx_A', 'Inx_B', 'Inx_C', 'Inx_AC']
df2 = df[df.index.isin(list)]
# Example 8: Filter row using like
df2 = df.filter(like='Inx_BB', axis=0)
2. pandas filter() Syntax
Following is the syntax of pandas.DataFrame.filter(). It returns the same type of object as the input object.
# Syntax of DataFrame.filter()
DataFrame.filter(items=None, like=None, regex=None, axis=None)
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Courses
, Fee
, Duration
and Discount
.
# Create pandas DataFrame
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas","Spark","PySpark","Pandas"],
'Fee': [22000,25000,30000,35000,22000,25000,35000],
'Duration':['30days','50days','40days','35days','30days','50days','60days'],
'Discount':[1000,2000,2500,1500,1000,2000,1500]
})
df = pd.DataFrame(technologies)
print("Create DataFrame:\n",df)
Yields below output.
3. Pandas filter() by Index
Use axis=0
on Pandas DataFrame filter()
function to filter rows by index (indices). The below example filters rows by index 2
.
This program filters rows from the DataFrame df
based on the specified index value 2
. The resulting DataFrame df2
will contain only the row with index value 2
.
# Pandas filter() by index
df2= df.filter(items = [2], axis=0)
print(df2)
# Output:
# Courses Fee Duration Discount
# 2 Hadoop 30000 40days 2500
4. Pandas Filter by Multiple Indexes
Use items param with list of indexes to Pandas filter()
function to filter by multiple indexes. The below example filters rows by index 3
and 5
.
# Use filter() by index along axis=0
df2= df.filter(items=[3, 5], axis=0)
print(df2)
# Output:
# Courses Fee Duration Discount
# 3 Pandas 35000 35days 1500
# 5 PySpark 25000 50days 2000
5. Filter Row Using Like Param
Use like
param to filter rows that match with the substring. For our example, this doesn’t make sense as we have a numeric index. however, below is an example that demonstrates the usage of like
param.
# Filter row using like
df2 = df.filter(like='4', axis=0)
print(df2)
# Output:
# Courses Fee Duration Discount
# 4 Spark 22000 30days 1000
6. Filters by List of Multiple Index Values
If you have values in a list and wanted to filter the DataFrame with these values, use isin() function. For each index you will apply isin()
function to check whether this value is present in the list which you will pass inside isin()
function as an argument. Suppose we would like to filter for rows where the index value is equal to 1, 3, or 6 it will return the same index values.
# Filter for rows in list
# Use DataFrme.index.isin() function
list = [1, 3, 6]
df2 = df[df.index.isin(list)]
print(df2)
# Output:
# Courses Fee Duration Discount
# 1 PySpark 25000 50days 2000
# 3 Pandas 35000 35days 1500
# 6 Pandas 35000 60days 1500
7. Pandas filter() by Non-numeric Index
You can also use filter()
function to filter Dataframe for indexes that contain a specific character. Use filter()
function along with axis=0
the specific character. it will return the same type of object as the input object.
# Pandas filter() by Non-numeric index
index_labels=['Inx_A','Inx_B','Inx_C','Inx_AA','Inx_BB','Inx_AC','Inx_CB']
df = pd.DataFrame(technologies,index=index_labels)
df2 = df.filter(like = 'Inx_A', axis=0)
print(df2)
# Output:
# Courses Fee Duration Discount
# Inx_A Spark 22000 30days 1000
# Inx_AA Pandas 35000 35days 1500
# Inx_AC PySpark 25000 50days 2000
Alternatively, you can also axis=0
on DataFrame.filter() function to filter rows by non-numeric value indexes that contain a specific character. The below example filters rows by index 'Inx_B
‘, and 'Inx_BB'
.
# Pandas filter() by Non-numeric two indexes
df2= df.filter(items=['Inx_B', 'Inx_BB'], axis=0)
print(df2)
# Output:
# Courses Fee Duration Discount
# Inx_B PySpark 25000 50days 2000
# Inx_BB Spark 22000 30days 1000
8. Filter by isin() with Non-numeric Index
Similarly, If you have values in a list and wanted to filter the DataFrame with these values, use isin()
function. Suppose you would like to filter for rows where the non-numeric index value is equal to 'Inx_A'
, 'Inx_B'
, 'Inx_C'
, or 'Inx_AC'
it will return the same index values.
# Filter by non-numeric index
# Use DataFrme.index.isin() function
list = ['Inx_A', 'Inx_B', 'Inx_C', 'Inx_AC']
df2 = df[df.index.isin(list)]
print(df2)
# Output:
# Courses Fee Duration Discount
# Inx_A Spark 22000 30days 1000
# Inx_B PySpark 25000 50days 2000
# Inx_C Hadoop 30000 40days 2500
# Inx_AC PySpark 25000 50days 2000
9. Filter Non-numeric Index Using Like
Use like
param to filter rows that match with the substring. For our example, this doesn’t make sense as we have a Non_numeric
index. however, below is an example that demonstrates the usage of like
param.
# Filter row using like
df2 = df.filter(like='Inx_BB', axis=0)
print(df2)
# Output:
# Courses Fee Duration Discount
# Inx_BB Spark 22000 30days 1000
10. Complete Example For Filter by Index
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas","Spark","PySpark","Pandas"],
'Fee': [22000,25000,30000,35000,22000,25000,35000],
'Duration':['30days','50days','40days','35days','30days','50days','60days'],
'Discount':[1000,2000,2500,1500,1000,2000,1500]
})
df = pd.DataFrame(technologies)
print(df)
# Pandas filter() by index
df2= df.filter(items = [2], axis=0)
print(df2)
# Use filter() by index along axis=0
df2= df.filter(items=[3, 5], axis=0)
print(df2)
# Filter row using like
df2 = df.filter(like='4', axis=0)
print(df2)
# Filter for rows in list
# Use DataFrme.index.isin() function
list = [1, 3, 6]
df2 = df[df.index.isin(list)]
print(df2)
# Pandas filter() by Non-numeric index
index_labels=['Inx_A','Inx_B','Inx_C','Inx_AA','Inx_BB','Inx_AC','Inx_CB']
df = pd.DataFrame(technologies,index=index_labels)
df2 = df.filter(like = 'Inx_A', axis=0)
print(df2)
# Pandas filter() by Non-numeric two indexes
df2= df.filter(items=['Inx_B', 'Inx_BB'], axis=0)
print(df2)
# Filter by non-numeric index
# Use DataFrme.index.isin() function
list = ['Inx_A', 'Inx_B', 'Inx_C', 'Inx_AC']
df2 = df[df.index.isin(list)]
print(df2)
# Filter row using like
df2 = df.filter(like='Inx_BB', axis=0)
print(df2)
Frequently Asked Questions on Pandas Filter by Index
Filtering by index in Pandas involves selecting specific rows or columns from a DataFrame based on the index values. The index is a label or sequence of labels that uniquely identifies each row or column in a DataFrame.
You can filter rows based on a range of index values in Pandas using the .loc[]
accessor. For example, start_index
and end_index
represent the range of index values you want to include in the filtered DataFrame. The loc
accessor is used with the slicing notation (:
) to select rows within the specified range.
To filter columns based on index values in Pandas, you can use the .loc[]
accessor. For example, index_values_to_keep
is a list containing the index values of the columns you want to include in the filtered DataFrame. The loc
accessor is used with a colon :
for the rows (indicating all rows) and the list of index values for columns.
If you have a DataFrame with a multi-level index and you want to filter based on that multi-level index, you can use the .loc[]
accessor with a tuple representing the levels.
You can filter rows based on a condition using index values in Pandas. You can use boolean indexing for this purpose. For example, the condition
is a boolean expression that is applied to the index values. It creates a boolean mask where True
indicates the rows that satisfy the condition, and False
indicates the rows that do not. The resulting DataFrame, df_filtered
, contains only the rows that meet the specified condition.
You can reset the index after filtering in Pandas using the reset_index
method. For example, the reset_index
method is called on the filtered DataFrame (df_filtered
). The drop=True
argument is used to discard the old index and replace it with the default integer index. If you omit drop=True
, the old index will be added as a new column in the DataFrame.
Conclusion
In this article, I have explained how to filter the Pandas DataFrame by index and also learned how to filter by multiple indexes by using filter()
with different options.
Happy Learning !!
Related Articles
- Pandas filter by column value
- Pandas Set Column as Index in DataFrame
- Pandas Set Index to Column in DataFrame
- Get Count of Each Row of Pandas DataFrame
- Pandas.Index.drop_duplicates() Explained
- How to Rename Column by Index in Pandas
- Pandas set index name to DataFrame
- Pandas Set Index to Column in DataFrame
- How to Use NOT IN Filter in Pandas
- Pandas Filter DataFrame Rows on Dates
- Pandas Filter Rows Using IN Like SQL
- Pandas Filter DataFrame by Multiple Conditions
- pandas DataFrame filter() – Usage & Examples
- Pandas Filter DataFrame by Substring criteria
- Apply Multiple Filters to Pandas DataFrame or Series
- Pandas Operator Chaining to Filter DataFrame Rows
- Pandas Filter Rows with NAN Value from DataFrame Column
- Pandas Set Value to Particular Cell in DataFrame Using Index
Hi Kaulgud, Yes, using the isin method on a pandas DataFrame with an index can often provide better performance compared to using it without an index.
Hello, thanks for detail explanation on how to filter dataframe rows
One question, we can get rows filtered by using filter method and using .isin
Can we say if we have index defined, and we use .isin method, we will get better performance?
To filter rows based on non index columns , we have to use filter method only
Is this understanding correct?