You are currently viewing R dplyr filter() – Subset DataFrame Rows

The filter() function from dplyr package is used to filter the data frame rows in R. Note that filter() doesn’t filter the data instead it retains all rows that satisfy the specified condition.

dplyr is an R package that provides a grammar of data manipulation and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation. To use this, you have to install it first using install.packages('dplyr') and load it using library(dplyr).

1. Dataset Preparation

Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M','F','F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
  row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika      F 1987-06-14  <NA>
r4 13 sahithi      F 1985-08-16  <NA>
r5 14   kumar      M 1995-03-02    DC
r6 15   scott      M 1991-06-21    DW
r7 16     Don      M 1986-03-24    AZ
r8 17     Lin      F 1990-08-26    PH

2. dplyr filter() Syntax

Following is the syntax of the filter() function from the dplyr package.


# Syntax of filter()
filter(x, condition,...)

Parameters

  • x – Object you wanted to apply a filter on. In our case, it will be a data frame object.
  • condition – condition you wanted to apply to filter the df.

3. Filter Data Frame Rows by Row Name

If you have row names on the data frame and wanted to filter rows by row name in R data frame, use the below approach. By default row names are the incremental sequence numbers assigned at the time of the creation of the R data frame. R also provides a way to assign custom row names while creating the data frame or setting row names on the existing one by using rownames() function. To set the column names use colnames() function.


# filter() by row name
library('dplyr')
filter(df, rownames(df) == 'r3')

Yields below output. This example returns a row that matches with row name 'r3'


# Output
   id    name gender        dob state
r3 12 deepika      F 1987-06-14  <NA>

5. Filter by Column Value

You can also filter dataframe based on column value by specifying the conditions. In the following examples, I have covered how to filter the data frame based on column value. The following example retains rows that gender is equal to 'M'.


# filter() by column Value
library('dplyr')
filter(df, gender == 'M')

Yields below output.


# Output
   id name gender        dob state
r7 16  Don      M 1986-03-24    AZ
r8 17  Lin      F 1990-08-26    PH

6. Filter Rows by List of Values

If you want to choose the rows that match with the list of values, use %in% operator in the condition. The following example retains all rows where the state is in the list of values.


# filter() by list of values
filter(df, state %in% c("CA", "AZ", "PH"))

Yields below output.


# Output
   id name gender        dob state
r1 10  sai      M 1990-10-02    CA
r7 16  Don      M 1986-03-24    AZ
r8 17  Lin      F 1990-08-26    PH

7. Filter Rows by Multiple Conditions

You can also filter data frame rows by multiple conditions in R, all you need to do is use logical operators between the conditions in the expression.

The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values.


# filter() by multiple conditions
library('dplyr')
filter(df, gender == 'M' & id > 15)

Yields below output.


# Output
   id name gender        dob state
r7 16  Don      M 1986-03-24    AZ

7. Filter Data Frame Rows by Row Number

In order to filter data frame rows by row number or positions in R, we have to use the slice() function. this function takes the data frame object as the first argument and the row number that you want to filter as the second argument.


# filter() by row number
library('dplyr')
slice(df, 2)

Yields below output.


# Output
   id name gender        dob state
r2 11  ram      M 1981-03-24    NY

Frequently Asked Questions on dplyr filter() Function in R

What does the filter() function in dplyr do?

The filter() function from dplyr is used to subset or filter rows from a data frame based on specified conditions. It retains only the rows that satisfy specified consdition.

How is the basic syntax of the filter() function?

The basic syntax of the filter() function is filter(x, condition,…) where, x is the original data frame and condition specifies to filter the data.

How can I use multiple conditions with filter()?

You can use filter() function to filter data frame rows by multiple conditions in R, to do this use logical operators such as &, |, and ! between the conditions in the expression.

How do I filter rows based on a specific column’s values?

Use the column name within the condition to filter the rows based on a specific column value. For example, filter(df, column_name == 'column_value').

What are some common conditions used with filter()?

The common conditions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values.

8. Conclusion

In this article, you have learned the syntax and usage of the R filter() function from dplyr package that is used to filter data frame rows by column value, row name, row number, multiple conditions e.t.c.

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium