You are currently viewing R dplyr filter() – Subset DataFrame Rows

The filter() function from dplyr package is used to filter the data frame rows in R. Note that filter() doesn’t filter the data instead it retains all rows that satisfy the specified condition.

Advertisements

dplyr is an R package that offers a grammar for data manipulation and includes a widely-used set of verbs to help data science analysts address common data manipulation tasks. To use this, you have to install it first using install.packages('dplyr') and load it using library(dplyr).

1. Create DataFrame

Let’s create a data frame,


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M','F','F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
  row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika      F 1987-06-14  <NA>
r4 13 sahithi      F 1985-08-16  <NA>
r5 14   kumar      M 1995-03-02    DC
r6 15   scott      M 1991-06-21    DW
r7 16     Don      M 1986-03-24    AZ
r8 17     Lin      F 1990-08-26    PH

2. dplyr filter() Syntax

Following is the syntax of the filter() function from the dplyr package.


# Syntax of filter()
filter(x, condition,...)

Parameters

  • x – Object you wanted to apply a filter on. In our case, it will be a data frame object.
  • condition – condition you wanted to apply to filter the df.

3. Filter Data Frame Rows based on Row

If you have row names on the data frame and want to filter rows by row name in R data frame, use the below approach. By default, row names in an R data frame are incremental sequence numbers assigned at creation. You can also assign custom row names during the creation of the data frame or by using the rownames() function on an existing data frame. To set column names, use the colnames() function.


# filter() by row name
library('dplyr')
filter(df, rownames(df) == 'r3')

Yields below output. This example returns a row that matches with row name 'r3'


# Output
   id    name gender        dob state
r3 12 deepika      F 1987-06-14  <NA>

4. Filter by Column Value

You can also filter dataframe based on column value by specifying the conditions. In the following examples, I have covered how to filter the data frame based on column value. The following example retains rows that gender is equal to 'M'.


# filter() by column Value
library('dplyr')
filter(df, gender == 'M')

Yields below output.


# Output
   id name gender        dob state
r7 16  Don      M 1986-03-24    AZ
r8 17  Lin      F 1990-08-26    PH

5. Filter Rows by List of Values

If you want to choose the rows that match with the list of values, use the %in% operator in the condition. The following example retains all rows where the state is in the list of values.


# filter() by list of values
filter(df, state %in% c("CA", "AZ", "PH"))

Yields below output.


# Output
   id name gender        dob state
r1 10  sai      M 1990-10-02    CA
r7 16  Don      M 1986-03-24    AZ
r8 17  Lin      F 1990-08-26    PH

6. Filter Rows by Multiple Conditions

You can also filter data frame rows by multiple conditions in R, all you need to do is use logical operators between the conditions in the expression.

The expressions include comparison operators (==, >, >= ), logical operators (&, |, !, xor()), range operators (between(), near()) as well as NA value checks against the column values.


# filter() by multiple conditions
library('dplyr')
filter(df, gender == 'M' & id > 15)

Yields below output.


# Output
   id name gender        dob state
r7 16  Don      M 1986-03-24    AZ

7. Filter Data Frame Rows by Row Number

To filter data frame rows by row number or positions in R, we have to use the slice() function. this function takes the data frame object as the first argument and the row number that you want to filter as the second argument.


# filter() by row number
library('dplyr')
slice(df, 2)

Yields below output.


# Output
   id name gender        dob state
r2 11  ram      M 1981-03-24    NY

Frequently Asked Questions on dplyr filter() Function in R

What does the filter() function in dplyr do?

The filter() function from dplyr is used to subset or filter rows from a data frame based on specified conditions. It retains only the rows that satisfy specified conditions.

What is the basic syntax of the filter() function?

The basic syntax of the filter() function is filter(x, condition,…) where, x is the original data frame and condition specifies to filter the data.

How can I use multiple conditions with filter()?

You can use the filter() function to filter data frame rows by multiple conditions in R, to do this use logical operators such as &, |, and ! between the conditions in the expression.

How do I filter rows based on a specific column’s values?

Use the column name within the condition to filter the rows based on a specific column value. For example, filter(df, column_name == 'column_value').

What are some common conditions used with filter()?

The common conditions include comparison operators (==, >, >= ), logical operators (&, |, !, xor()), range operators (between(), near()) as well as NA value check against the column values.

8. Conclusion

In this article, you have learned the syntax and usage of the R filter() function from the dplyr package that is used to filter data frame rows by column value, row name, row number, multiple conditions, etc.

Related Articles

References