The filter()
function from dplyr
package is used to filter the data frame rows in R. Note that filter() doesn’t filter the data instead it retains all rows that satisfy the specified condition.
dplyr is an R package that offers a grammar for data manipulation and includes a widely-used set of verbs to help data science analysts address common data manipulation tasks. To use this, you have to install it first using install.packages('dplyr')
and load it using library(dplyr)
.
- dplyr filter() Syntax
- Filter by Row Name
- Filter by Column Value
- Filter by Multiple Conditions
- Filter by Row Number
1. Create DataFrame
Let’s create a data frame,
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M','F','F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df
Yields below output.
# Output
id name gender dob state
r1 10 sai M 1990-10-02 CA
r2 11 ram M 1981-03-24 NY
r3 12 deepika F 1987-06-14 <NA>
r4 13 sahithi F 1985-08-16 <NA>
r5 14 kumar M 1995-03-02 DC
r6 15 scott M 1991-06-21 DW
r7 16 Don M 1986-03-24 AZ
r8 17 Lin F 1990-08-26 PH
2. dplyr filter() Syntax
Following is the syntax of the filter()
function from the dplyr
package.
# Syntax of filter()
filter(x, condition,...)
Parameters
x
– Object you wanted to apply a filter on. In our case, it will be a data frame object.condition
– condition you wanted to apply to filter the df.
3. Filter Data Frame Rows based on Row
If you have row names on the data frame and want to filter rows by row name in R data frame, use the below approach. By default, row names in an R data frame are incremental sequence numbers assigned at creation. You can also assign custom row names during the creation of the data frame or by using the rownames()
function on an existing data frame. To set column names, use the colnames()
function.
# filter() by row name
library('dplyr')
filter(df, rownames(df) == 'r3')
Yields below output. This example returns a row that matches with row name 'r3'
# Output
id name gender dob state
r3 12 deepika F 1987-06-14 <NA>
4. Filter by Column Value
You can also filter dataframe based on column value by specifying the conditions. In the following examples, I have covered how to filter the data frame based on column value. The following example retains rows that gender
is equal to 'M'
.
# filter() by column Value
library('dplyr')
filter(df, gender == 'M')
Yields below output.
# Output
id name gender dob state
r7 16 Don M 1986-03-24 AZ
r8 17 Lin F 1990-08-26 PH
5. Filter Rows by List of Values
If you want to choose the rows that match with the list of values, use the %in% operator in the condition. The following example retains all rows where the state
is in the list of values.
# filter() by list of values
filter(df, state %in% c("CA", "AZ", "PH"))
Yields below output.
# Output
id name gender dob state
r1 10 sai M 1990-10-02 CA
r7 16 Don M 1986-03-24 AZ
r8 17 Lin F 1990-08-26 PH
6. Filter Rows by Multiple Conditions
You can also filter data frame rows by multiple conditions in R, all you need to do is use logical operators between the conditions in the expression.
The expressions include comparison operators (==, >, >= ), logical operators (&, |, !, xor()), range operators (between(), near()) as well as NA value checks against the column values.
# filter() by multiple conditions
library('dplyr')
filter(df, gender == 'M' & id > 15)
Yields below output.
# Output
id name gender dob state
r7 16 Don M 1986-03-24 AZ
7. Filter Data Frame Rows by Row Number
To filter data frame rows by row number or positions in R, we have to use the slice()
function. this function takes the data frame object as the first argument and the row number that you want to filter as the second argument.
# filter() by row number
library('dplyr')
slice(df, 2)
Yields below output.
# Output
id name gender dob state
r2 11 ram M 1981-03-24 NY
Frequently Asked Questions on dplyr filter() Function in R
The filter()
function from dplyr is used to subset or filter rows from a data frame based on specified conditions. It retains only the rows that satisfy specified conditions.
The basic syntax of the filter() function is filter(x, condition,…)
where, x
is the original data frame and condition
specifies to filter the data.
You can use the filter() function to filter data frame rows by multiple conditions in R, to do this use logical operators such as &
, |
, and !
between the conditions in the expression.
Use the column name within the condition to filter the rows based on a specific column value. For example, filter(df, column_name == 'column_value')
.
filter()
? The common conditions include comparison operators (==, >, >= ), logical operators (&, |, !, xor()), range operators (between(), near()) as well as NA value check against the column values.
8. Conclusion
In this article, you have learned the syntax and usage of the R filter() function from the dplyr package that is used to filter data frame rows by column value, row name, row number, multiple conditions, etc.
Related Articles
- How to Get Rows by Index in R with Examples
- How to Get Rows by Condition in R with Examples
- How to Get Rows by Column Values in R
- R Subset DataFrame by Column Value
- R subset() function from the dplyr package
- R select() function from dplyr package
- R mutate() function from dplyr package
- How to subset a matrix in R?