Pandas Filter by Index

  • Post author:
  • Post category:Pandas
  • Post last modified:January 10, 2024
  • Reading time:21 mins read

To filter Pandas Dataframe rows by Index use filter() function. Use axis=0 as a param to the function to filter rows by index (indices). This function filter() is used to Subset rows of the Dataframe according to labels in the specified index. It doesn’t update the existing DataFrame instead it always returns a new one.

In this article, I will explain pandas filter by index and how we can get a DataFrame containing only rows and columns that are specified with the function. and also explain how to filter using items, like expression. DataFrame.index.isin() is another way to filter the rows of a pandas DataFrame based on index values.

1. Quick Examples of Filter by Index

If you are in a hurry, below are some quick examples of how to get filtered by index in Pandas DataFrame.


# Quick examples of filter by index

# Example 1: Pandas filter() by index
df2= df.filter(items = [2], axis=0)

# Example 2: Use filter() by index along axis=0
df2=  df.filter(items=[3, 5], axis=0)

# Example 3: Filter row using like
df2 = df.filter(like='4', axis=0)

# Example 4: filter for rows in list
# Use DataFrme.index.isin() function
list = [1, 3, 6]
df2 = df[df.index.isin(list)]

# Example 5: pandas filter() by Non-numeric index
index_labels=['Inx_A','Inx_B','Inx_C','Inx_AA','Inx_BB','Inx_AC','Inx_CB']
df = pd.DataFrame(technologies,index=index_labels)
df2 = df.filter(like = 'Inx_A', axis=0)

# Example 6: pandas filter() by Non-numeric two indexes 
df2=  df.filter(items=['Inx_B', 'Inx_BB'], axis=0)

# Example 7: Filter by non-numeric index 
# Use DataFrme.index.isin() function
list = ['Inx_A', 'Inx_B', 'Inx_C', 'Inx_AC']
df2 = df[df.index.isin(list)]

# Example 8: Filter row using like
df2 = df.filter(like='Inx_BB', axis=0)

2. pandas filter() Syntax

Following is the syntax of pandas.DataFrame.filter(). It returns the same type of object as the input object.


# Syntax of DataFrame.filter()
DataFrame.filter(items=None, like=None, regex=None, axis=None)

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are CoursesFeeDuration and Discount.


# Create pandas DataFrame 
import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas","Spark","PySpark","Pandas"],
    'Fee': [22000,25000,30000,35000,22000,25000,35000],
    'Duration':['30days','50days','40days','35days','30days','50days','60days'],
    'Discount':[1000,2000,2500,1500,1000,2000,1500]
              })
df = pd.DataFrame(technologies)
print("Create DataFrame:\n",df)

Yields below output.

pandas filter index

3. Pandas filter() by Index

Use axis=0 on Pandas DataFrame filter() function to filter rows by index (indices). The below example filters rows by index 2.

This program filters rows from the DataFrame df based on the specified index value 2. The resulting DataFrame df2 will contain only the row with index value 2.


# Pandas filter() by index
df2= df.filter(items = [2], axis=0)
print(df2)

# Output:
#  Courses    Fee Duration  Discount
# 2  Hadoop  30000   40days      2500

4. Pandas Filter by Multiple Indexes

Use items param with list of indexes to Pandas filter() function to filter by multiple indexes. The below example filters rows by index 3 and 5.


# Use filter() by index along axis=0
df2=  df.filter(items=[3, 5], axis=0)
print(df2)

# Output:
#   Courses    Fee Duration  Discount
# 3   Pandas  35000   35days      1500
# 5  PySpark  25000   50days      2000

5. Filter Row Using Like Param

Use like param to filter rows that match with the substring. For our example, this doesn’t make sense as we have a numeric index. however, below is an example that demonstrates the usage of like param.


# Filter row using like
df2 = df.filter(like='4', axis=0)
print(df2)

# Output:
#   Courses    Fee Duration  Discount
# 4   Spark  22000   30days      1000

6. Filters by List of Multiple Index Values

If you have values in a list and wanted to filter the DataFrame with these values, use isin() function. For each index you will apply isin() function to check whether this value is present in the list which you will pass inside isin() function as an argument. Suppose we would like to filter for rows where the index value is equal to 1, 3, or 6 it will return the same index values.


# Filter for rows in list
# Use DataFrme.index.isin() function
list = [1, 3, 6]
df2 = df[df.index.isin(list)]
print(df2)

# Output:
#   Courses    Fee Duration  Discount
# 1  PySpark  25000   50days      2000
# 3   Pandas  35000   35days      1500
# 6   Pandas  35000   60days      1500

7. Pandas filter() by Non-numeric Index

You can also use filter() function to filter Dataframe for indexes that contain a specific character. Use filter() function along with axis=0 the specific character. it will return the same type of object as the input object.


# Pandas filter() by Non-numeric index
index_labels=['Inx_A','Inx_B','Inx_C','Inx_AA','Inx_BB','Inx_AC','Inx_CB']
df = pd.DataFrame(technologies,index=index_labels)
df2 = df.filter(like = 'Inx_A', axis=0)
print(df2)

# Output:
#        Courses    Fee Duration  Discount
# Inx_A     Spark  22000   30days      1000
# Inx_AA   Pandas  35000   35days      1500
# Inx_AC  PySpark  25000   50days      2000

Alternatively, you can also axis=0 on DataFrame.filter() function to filter rows by non-numeric value indexes that contain a specific character. The below example filters rows by index 'Inx_B‘, and 'Inx_BB'.


# Pandas filter() by Non-numeric two indexes 
df2=  df.filter(items=['Inx_B', 'Inx_BB'], axis=0)
print(df2)

# Output:
#        Courses    Fee Duration  Discount
# Inx_B   PySpark  25000   50days      2000
# Inx_BB    Spark  22000   30days      1000

8. Filter by isin() with Non-numeric Index

Similarly, If you have values in a list and wanted to filter the DataFrame with these values, use isin() function. Suppose you would like to filter for rows where the non-numeric index value is equal to 'Inx_A', 'Inx_B', 'Inx_C', or 'Inx_AC' it will return the same index values.


# Filter by non-numeric index 
# Use DataFrme.index.isin() function
list = ['Inx_A', 'Inx_B', 'Inx_C', 'Inx_AC']
df2 = df[df.index.isin(list)]
print(df2) 

# Output:
#        Courses    Fee Duration  Discount
# Inx_A     Spark  22000   30days      1000
# Inx_B   PySpark  25000   50days      2000
# Inx_C    Hadoop  30000   40days      2500
# Inx_AC  PySpark  25000   50days      2000

9. Filter Non-numeric Index Using Like

Use like param to filter rows that match with the substring. For our example, this doesn’t make sense as we have a Non_numeric index. however, below is an example that demonstrates the usage of like param.


# Filter row using like
df2 = df.filter(like='Inx_BB', axis=0)
print(df2)

# Output:
#         Courses    Fee Duration  Discount
# Inx_BB    Spark  22000   30days      1000

10. Complete Example For Filter by Index


import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas","Spark","PySpark","Pandas"],
    'Fee': [22000,25000,30000,35000,22000,25000,35000],
    'Duration':['30days','50days','40days','35days','30days','50days','60days'],
    'Discount':[1000,2000,2500,1500,1000,2000,1500]
              })
df = pd.DataFrame(technologies)
print(df)

# Pandas filter() by index
df2= df.filter(items = [2], axis=0)
print(df2)

# Use filter() by index along axis=0
df2=  df.filter(items=[3, 5], axis=0)
print(df2)

# Filter row using like
df2 = df.filter(like='4', axis=0)
print(df2)

# Filter for rows in list
# Use DataFrme.index.isin() function
list = [1, 3, 6]
df2 = df[df.index.isin(list)]
print(df2)

# Pandas filter() by Non-numeric index
index_labels=['Inx_A','Inx_B','Inx_C','Inx_AA','Inx_BB','Inx_AC','Inx_CB']
df = pd.DataFrame(technologies,index=index_labels)
df2 = df.filter(like = 'Inx_A', axis=0)
print(df2)

# Pandas filter() by Non-numeric two indexes 
df2=  df.filter(items=['Inx_B', 'Inx_BB'], axis=0)
print(df2)

# Filter by non-numeric index 
# Use DataFrme.index.isin() function
list = ['Inx_A', 'Inx_B', 'Inx_C', 'Inx_AC']
df2 = df[df.index.isin(list)]
print(df2) 

# Filter row using like
df2 = df.filter(like='Inx_BB', axis=0)
print(df2)

Frequently Asked Questions on Pandas Filter by Index

What is meant by filtering by index in Pandas?

Filtering by index in Pandas involves selecting specific rows or columns from a DataFrame based on the index values. The index is a label or sequence of labels that uniquely identifies each row or column in a DataFrame.

Can I filter rows based on a range of index values?

You can filter rows based on a range of index values in Pandas using the .loc[] accessor. For example, start_index and end_index represent the range of index values you want to include in the filtered DataFrame. The loc accessor is used with the slicing notation (:) to select rows within the specified range.

How do I filter columns based on index values?

To filter columns based on index values in Pandas, you can use the .loc[] accessor. For example, index_values_to_keep is a list containing the index values of the columns you want to include in the filtered DataFrame. The loc accessor is used with a colon : for the rows (indicating all rows) and the list of index values for columns.

What if I want to filter based on a multi-level index?

If you have a DataFrame with a multi-level index and you want to filter based on that multi-level index, you can use the .loc[] accessor with a tuple representing the levels.

Is there a way to filter rows based on a condition using index values?

You can filter rows based on a condition using index values in Pandas. You can use boolean indexing for this purpose. For example, the condition is a boolean expression that is applied to the index values. It creates a boolean mask where True indicates the rows that satisfy the condition, and False indicates the rows that do not. The resulting DataFrame, df_filtered, contains only the rows that meet the specified condition.

Can I reset the index after filtering?

You can reset the index after filtering in Pandas using the reset_index method. For example, the reset_index method is called on the filtered DataFrame (df_filtered). The drop=True argument is used to discard the old index and replace it with the default integer index. If you omit drop=True, the old index will be added as a new column in the DataFrame.

Conclusion

In this article, I have explained how to filter the Pandas DataFrame by index and also learned how to filter by multiple indexes by using filter() with different options.

Happy Learning !!

Related Articles

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply

This Post Has 2 Comments

  1. Atul Kaulgud

    Hello, thanks for detail explanation on how to filter dataframe rows

    One question, we can get rows filtered by using filter method and using .isin

    Can we say if we have index defined, and we use .isin method, we will get better performance?

    To filter rows based on non index columns , we have to use filter method only

    Is this understanding correct?

    1. Naveen (NNK)

      Hi Kaulgud, Yes, using the isin method on a pandas DataFrame with an index can often provide better performance compared to using it without an index.