In R, to subset the data frame based on multiple conditions, you can use the df[]
notation, the subset()
function from the base package, or the filter()
function from the dplyr package. I will explore multiple ways to subset the R data frame in this article by various conditions.
Create Dataframe
Let’s build an R data frame, execute these examples, and analyze the results.
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M',NA,'F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df
Yields below output.
id name gender dob state
r1 10 sai M 1990-10-02 CA
r2 11 ram M 1981-03-24 NY
r3 12 deepika <NA> 1987-06-14 <NA>
r4 13 sahithi F 1985-08-16 <NA>
r5 14 kumar M 1995-03-02 DC
r6 15 scott M 1991-06-21 DW
r7 16 Don M 1986-03-24 AZ
r8 17 Lin F 1990-08-26 PH
Subset Rows by Multiple Conditions
The subset() is an R base function that extracts specific observations and variables from a data frame (DataFrame) by submitting with multiple conditions. It is also used to get a subset of vectors and a subset of matrices.
This subset() function takes a syntax subset(x, subset, select, drop = FALSE, …)
the first argument is the input object, the second is the subset expression, and the third specifies the variables to select.
# subset by multiple conditions using |
subset(df, gender == 'M' | state == 'PH')
# subset by multiple conditions using &
subset(df, gender == 'M' & state %in% c('CA','NY'))
Yields below output.
Using df[] Notation
By using bracket notation df[] on the R data frame we can also get data frame by multiple conditions.
# Select Rows by Checking multiple conditions
df[df$gender == 'M' | df$state == 'PH',]
df[df$gender == 'M' & df$state %in% c('CA','NY'),]
Yields the same output as above.
Using filter() Function
Similarly, you can also subset the data frame by multiple conditions using the filter() function from dplyr package. To use this, you first need to install it with install.packages('dplyr')
and then load it with library(dplyr)
library(dplyr)
# Using dplyr::filter
df %>% filter(gender == 'M' | state == 'PH')
df %>% filter(gender == 'M' & state %in% c('CA','NY') )
Yields the same output as above.
Complete Example
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M',NA,'F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df
# subset by multiple conditions using |
subset(df, gender == 'M' | state == 'PH')
# subset by multiple conditions using &
subset(df, gender == 'M' & state %in% c('CA','NY'))
# Select Rows by Checking multiple conditions
df[df$gender == 'M' | df$state == 'PH',]
df[df$gender == 'M' & df$state %in% c('CA','NY'),]
library(dplyr)
# Using dplyr::filter
df %>% filter(gender == 'M' | state == 'PH')
df %>% filter(gender == 'M' & state %in% c('CA','NY') )
Conclusion
In this article, I have explained how to subset the data frame by multiple conditions in R by using the subset()
function, filter()
from dplyr
package, and using df[]
notation.
Related Articles
- Subset Data Frame in R with Examples
- R Subset Data Frame by Column Value & Name
- R – Create DataFrame from Existing DataFrame
- How to Subset Vector in R?
- How to filter dataframe by column value?
- How to subset a matrix in R?