Pandas Series filter() Method

Pandas Series.filter() method is used to return the subset of values from Series that satisfies the condition. The filter() is applied with help of the index labels or on the values themselves. We can filter or subset the values of the pandas series using various functions.

Quick Examples of Series filter() Method

Below are quick examples of the Pandas Series filter() method.


# Quick examples of series filter() function

# Example 1: use Series.filter() function 
# To filter a pandas series
ser2 = ser.filter(regex = '. .')

# Example 2: filter() index by labels
ser2 = ser.filter(items = ['Spark', 'Python'])

# Example 3 : use loc[] and lambda 
# To filter a pandas series
ser2 = ser.loc[lambda x : x == 23000]

# Example 4: use loc[] property & OR condition
ser2 = ser.loc[lambda x : (x  28000)]

# Example 5: use where() function to filter series
ser2 = ser.where(ser < 25000).dropna()

# Example 6: use isin() function to filter series
ser2 = ser[ser.isin([23000,28000])]

Syntax of Series.filter() Function

Following is the syntax of the create Series.filter() function.


# Syntax of Series.filter() function
Series.filter(items=None, like=None, regex=None, axis=None)

Parameter of filter()

Following are the parameters of the filter().

items – A list of labels to filter on the specified axis.
like – A string that is used to filter labels based on a substring match.
regex – A regular expression (regex) to filter labels based on pattern matching.
axis – {0 or ‘index’, 1 or ‘columns’, None}, default None. When not specified it used columns. The axis along which the filtering will be applied. By default, it is set to None, which means the filtering is done on the index.

Return Value of filter()

It returns the value of the filter() the same type as the input object.

Create Pandas Series

Pandas Series is a fundamental data structure in the Pandas library, representing a one-dimensional array with labeled indices. It’s versatile, capable of storing various data types including strings, integers, floats, and even other Python objects. Accessing elements within a Series is intuitive, as you can use the corresponding default indices or the labels themselves for retrieval.

Note : Series data structure is the same as the NumPy array data structure but only one difference that is arrays indices are integers and start with 0, whereas in series, the index can be anything even strings. The labels do not need to be unique but they must be of hashable type.

To run some examples of the Pandas series filter() method, let’s create Pandas series using list.


import pandas as pd
  
# Create the Series
ser = pd.Series([20000,25000,23000,28000,55000,23000])
  
# Create the Index
index = ['Java','Spark','PySpark','Pandas','python NumPy','Python']
  
# Set the index
ser.index = index
print(ser)

# Output:
# Java            20000
# Spark           25000
# PySpark         23000
# Pandas          28000
# python NumPy    55000
# Python          23000
# dtype: int64

Use Series.filter() Function To Filter a Pandas Series

By using Series.filter() function you can filter the Series by index labels or by values. When you use index labels to files you can use regular expressions by using “regex”. The following example filters values from the given series object whose index label name has a space.


# Use Series.filter() function to filter a pandas series
ser2 = ser.filter(regex = '. .')
print(ser2)

# Output:
# python NumPy    55000
# dtype: int64

Filter Series by Index Labels

By default pandas.Series.filter() select the indexes by labels you specified using item, like, and regex parameters. The following example filters series with the list of index labels Spark and Python.


# Filter() index by labels
ser2 = ser.filter(items = ['Spark', 'Python'])
print(ser2)

# Output:
# Spark     25000
# Python    23000
# dtype: int64

Use loc[] & Lambda to Filter a Pandas Series

Alternatively, you can also use filter a Pandas Series using loc[] along with a lambda function to retrieve values equal to 23000.


# Use loc[] and lambda to filter a pandas series
ser2 = ser.loc[lambda x : x == 23000]
print(ser2)

# Output:
# PySpark    23000
# Python     23000
# dtype: int64

When applying logical OR conditions, it’s important to use the bitwise OR operator | instead of the or keyword. In this program, (ser < 23000)|(ser > 28000) create a boolean mask where it’s True for values less than 23000 or greater than 28000. Then, ser.loc[] filters the Series based on this boolean mask, returning only the elements that satisfy the condition.


# Use loc[] property & OR condition
ser2 = ser.loc[lambda x : (x < 23000 or x > 28000)]
print(ser2)

# Output:
# Java            20000
# python NumPy    55000
# dtype: int64

Use where() Function To Filter Series

Similarly, we can also use where() function to filter a series by values using expressions. Using the where() function to filter a Series (ser) where the values are less than 25000 and then dropping the NaN values using dropna().


# Use where() function to filter series
ser2 = ser.where(ser < 25000).dropna()
print(ser2)

# Output:
# Pandas          28000.0
# python NumPy    55000.0
# dtype: float64

The above program will produce a Series (ser2) where values greater than or equal to 25000 will be replaced with NaN, and values less than 25000 will be retained. The dropna() method is optional in this case since where() already handles the NaN values, and you can choose whether to include it based on your specific requirements.

Use isin() Function To Filter Series

By use isin() function is used to get the values from the series that are present in the list of values.


# Use isin() function to filter series
ser2 = ser[ser.isin([23000,28000])]
print(ser2)

# Output:
# PySpark    23000
# Pandas     28000
# Python     23000
# dtype: int64

In the above example, ser.isin([23000, 30000]) creates a boolean mask indicating whether each element in the original Series (ser) is in the specified list [23000, 28000]. The boolean mask is then used to filter the original Series using boolean indexing, resulting in ser2 containing only the elements that match the specified values.

Complete Example


import pandas as pd
  
# Create the Series
ser = pd.Series([20000,25000,23000,28000,55000,23000])
  
# Create the Index
index_ = ['Java','Spark','PySpark','Pandas','python NumPy','Python']
  
# Set the index
ser.index = index_
print(ser)

# Use Series.filter() function to filter a pandas series
ser2 = ser.filter(regex = '. .')
print(ser2)

# Filter() index by labels
ser2 = ser.filter(items = ['Spark', 'Python'])
print(ser2)

# Use loc[] and lambda to filter a pandas series
ser2 = ser.loc[lambda x : x == 23000]
print(ser2)

# Use loc[] property & OR condition
ser2 = ser.loc[lambda x : (x  28000)]
print(ser2)

# Use where() function to filter series
ser2 = ser.where(ser < 25000).dropna()
print(ser2)

# Use isin() function to filter series
ser2 = ser[ser.isin([23000,28000])]
print(ser2)

FAQ on Series filter() Method

How does the filter() function work in Pandas Series?

The function works by creating a boolean mask based on the specified condition. It then applies this mask to the Series, retaining only the elements that satisfy the condition.

Can the filter() function be applied to both the index and values of a Series?

The filter() function can be applied to both the index and values of a Pandas Series. By default, it filters based on the index, but users can specify the axis parameter to filter along the columns (axis=1).

What is the typical use case for the filter() function in Pandas Series?

The filter() function is commonly used in data analysis tasks where there is a need to extract subsets of data based on specific conditions, enabling users to focus on relevant information and perform targeted analysis on a Pandas Series.

Are the filtering conditions in filter() mutually exclusive?

The filtering conditions in filter() are mutually exclusive. Users can choose to use either items, like, or regex to specify the filtering criteria, and only one of them should be used at a time.

How does the filter() function compare to other filtering methods in Pandas, such as boolean indexing?

The filter() function is a versatile tool for filtering data, offering a more structured and parameterized approach compared to boolean indexing. While boolean indexing is powerful, the filter() function provides additional options for label-based filtering, substring matching, and regular expression filtering. The choice between them depends on the specific requirements of the analysis.

Conclusion

In this article, I have explained how to filter the Pandas Series by using filter(), where(), isin(), and loc[] with lambda function with examples.

Happy Learning !!

References

https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html