Pandas Filter by Index

To filter Pandas Dataframe rows by Index use filter() function. Use axis=0 as a param to the function to filter rows by index (indices). This function filter() is used to Subset rows of the Dataframe according to labels in the specified index. It doesn’t update the existing DataFrame instead it always returns a new one.

In this article, I will explain pandas filter by index and how we can get a DataFrame containing only rows and columns that are specified with the function. and also explain how to filter using items, like expression. DataFrame.index.isin() is another way to filter the rows of a pandas DataFrame based on index values.

1. Quick Examples of Filter by Index

If you are in a hurry, below are some quick examples of how to get filtered by index in pandas DataFrame.


# Below are quick example

# Example 1: Pandas filter() by index
df2= df.filter(items = [2], axis=0)

# Example 2: Use filter() by index along axis=0
df2=  df.filter(items=[3, 5], axis=0)

# Example 3: Filter row using like
df2 = df.filter(like='4', axis=0)

# Example 4: filter for rows in list
# Use DataFrme.index.isin() function
list = [1, 3, 6]
df2 = df[df.index.isin(list)]

# Example 5: pandas filter() by Non-numeric index
index_labels=['Inx_A','Inx_B','Inx_C','Inx_AA','Inx_BB','Inx_AC','Inx_CB']
df = pd.DataFrame(technologies,index=index_labels)
df2 = df.filter(like = 'Inx_A', axis=0)

# Example 6: pandas filter() by Non-numeric two indexes 
df2=  df.filter(items=['Inx_B', 'Inx_BB'], axis=0)

# Example 7: Filter by non-numeric index 
# Use DataFrme.index.isin() function
list = ['Inx_A', 'Inx_B', 'Inx_C', 'Inx_AC']
df2 = df[df.index.isin(list)]

# Example 8: Filter row using like
df2 = df.filter(like='Inx_BB', axis=0)

2. pandas filter() Syntax

Following is the syntax of pandas.DataFrame.filter(). It returns the same type of object as the input object.


# Syntax of DataFrame.filter()
DataFrame.filter(items=None, like=None, regex=None, axis=None)

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are CoursesFeeDuration and Discount.


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas","Spark","PySpark","Pandas"],
    'Fee': [22000,25000,30000,35000,22000,25000,35000],
    'Duration':['30days','50days','40days','35days','30days','50days','60days'],
    'Discount':[1000,2000,2500,1500,1000,2000,1500]
              })
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output
   Courses    Fee Duration  Discount
0    Spark  22000   30days      1000
1  PySpark  25000   50days      2000
2   Hadoop  30000   40days      2500
3   Pandas  35000   35days      1500
4    Spark  22000   30days      1000
5  PySpark  25000   50days      2000
6   Pandas  35000   60days      1500

3. Pandas filter() by Index

Use axis=0 on Pandas DataFrame filter() function to filter rows by index (indices). The below example filters rows by index 2.


# Pandas filter() by index
df2= df.filter(items = [2], axis=0)
print(df2)

# Output
#  Courses    Fee Duration  Discount
# 2  Hadoop  30000   40days      2500

4. Pandas Filter by Multiple Indexes

Use items param with list of indexes to Pandas filter() function to filter by multiple indexes. The below example filters rows by index 3 and 5.


# Use filter() by index along axis=0
df2=  df.filter(items=[3, 5], axis=0)
print(df2)

# Output
#   Courses    Fee Duration  Discount
# 3   Pandas  35000   35days      1500
# 5  PySpark  25000   50days      2000

5. Filter Row Using Like Param

Use like param to filter rows that match with the substring. For our example, this doesn’t make sense as we have a numeric index. however, below is an example that demonstrates the usage of like param.


# Filter row using like
df2 = df.filter(like='4', axis=0)
print(df2)

# Output
#   Courses    Fee Duration  Discount
# 4   Spark  22000   30days      1000

6. Filters by List of Multiple Index Values

If you have values in a list and wanted to filter the DataFrame with these values, use isin() function. For each index you will apply isin() function to check whether this value is present in the list which you will pass inside isin() function as an argument. Suppose we would like to filter for rows where the index value is equal to 1, 3, or 6 it will return the same index values.


# filter for rows in list
# Use DataFrme.index.isin() function
list = [1, 3, 6]
df2 = df[df.index.isin(list)]
print(df2)

# Output
#   Courses    Fee Duration  Discount
#1  PySpark  25000   50days      2000
#3   Pandas  35000   35days      1500
#6   Pandas  35000   60days      1500

7. Pandas filter() by Non-numeric Index

You can also use filter() function to filter Dataframe for indexes that contain a specific character. Use filter() function along with axis=0 the specific character. it will return the same type of object as the input object.


# pandas filter() by Non-numeric index
index_labels=['Inx_A','Inx_B','Inx_C','Inx_AA','Inx_BB','Inx_AC','Inx_CB']
df = pd.DataFrame(technologies,index=index_labels)
df2 = df.filter(like = 'Inx_A', axis=0)
print(df2)

# Output
#        Courses    Fee Duration  Discount
#Inx_A     Spark  22000   30days      1000
#Inx_AA   Pandas  35000   35days      1500
Inx_AC  PySpark  25000   50days      2000

Alternatively, you can also axis=0 on DataFrame.filter() function to filter rows by non-numeric value indexes that contain a specific character. The below example filters rows by index 'Inx_B‘, and 'Inx_BB'.


# pandas filter() by Non-numeric two indexes 
df2=  df.filter(items=['Inx_B', 'Inx_BB'], axis=0)
print(df2)

# Output
#        Courses    Fee Duration  Discount
#Inx_B   PySpark  25000   50days      2000
#Inx_BB    Spark  22000   30days      1000

8. Filter by isin() with Non-numeric Index

Similarly, If you have values in a list and wanted to filter the DataFrame with these values, use isin() function. Suppose you would like to filter for rows where the non-numeric index value is equal to 'Inx_A', 'Inx_B', 'Inx_C', or 'Inx_AC' it will return the same index values.


# Filter by non-numeric index 
# Use DataFrme.index.isin() function
list = ['Inx_A', 'Inx_B', 'Inx_C', 'Inx_AC']
df2 = df[df.index.isin(list)]
print(df2) 

# Output
#        Courses    Fee Duration  Discount
#Inx_A     Spark  22000   30days      1000
#Inx_B   PySpark  25000   50days      2000
#Inx_C    Hadoop  30000   40days      2500
#Inx_AC  PySpark  25000   50days      2000

9. Filter Non-numeric Index Using Like

Use like param to filter rows that match with the substring. For our example, this doesn’t make sense as we have a Non_numeric index. however, below is an example that demonstrates the usage of like param.


# Filter row using like
df2 = df.filter(like='Inx_BB', axis=0)
print(df2)

# Output
#         Courses    Fee Duration  Discount
#Inx_BB    Spark  22000   30days      1000

10. Complete Example For Filter by Index


import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas","Spark","PySpark","Pandas"],
    'Fee': [22000,25000,30000,35000,22000,25000,35000],
    'Duration':['30days','50days','40days','35days','30days','50days','60days'],
    'Discount':[1000,2000,2500,1500,1000,2000,1500]
              })
df = pd.DataFrame(technologies)
print(df)

# Pandas filter() by index
df2= df.filter(items = [2], axis=0)
print(df2)

# Use filter() by index along axis=0
df2=  df.filter(items=[3, 5], axis=0)
print(df2)

# Filter row using like
df2 = df.filter(like='4', axis=0)
print(df2)

# filter for rows in list
# Use DataFrme.index.isin() function
list = [1, 3, 6]
df2 = df[df.index.isin(list)]
print(df2)

# pandas filter() by Non-numeric index
index_labels=['Inx_A','Inx_B','Inx_C','Inx_AA','Inx_BB','Inx_AC','Inx_CB']
df = pd.DataFrame(technologies,index=index_labels)
df2 = df.filter(like = 'Inx_A', axis=0)
print(df2)

# pandas filter() by Non-numeric two indexes 
df2=  df.filter(items=['Inx_B', 'Inx_BB'], axis=0)
print(df2)

# Filter by non-numeric index 
# Use DataFrme.index.isin() function
list = ['Inx_A', 'Inx_B', 'Inx_C', 'Inx_AC']
df2 = df[df.index.isin(list)]
print(df2) 

# Filter row using like
df2 = df.filter(like='Inx_BB', axis=0)
print(df2)

10. Conclusion

In this article, I have explained how to filter the Pandas DataFrame by index and also learned how to filter by multiple indexes by using filter() with different options.

Happy Learning !!

Related Articles

References

Leave a Reply