You are currently viewing R subset Multiple Conditions

To subset with multiple conditions in R, you can use either df[] notation, subset() function from r base package, filter() from dplyr package.

In this article, I will explain different ways to subset the R DataFrame by multiple conditions.

1. Create DataFrame

Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV files to R DataFrame. Also, refer to Import Excel File into R.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M',NA,'F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
  row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df

Yields below output.


   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika   <NA> 1987-06-14  <NA>
r4 13 sahithi      F 1985-08-16  <NA>
r5 14   kumar      M 1995-03-02    DC
r6 15   scott      M 1991-06-21    DW
r7 16     Don      M 1986-03-24    AZ
r8 17     Lin      F 1990-08-26    PH

2. Subset Rows by Multiple Conditions

The subset() is a R base function that is used to get the observations and variables from the data frame (DataFrame) by submitting with multiple conditions. Also used to get a subset of vectors, and a subset of matrices.

This subset() function takes a syntax subset(x, subset, select, drop = FALSE, …) where the first argument is the input object, the second argument is the subset expression and the third is to specify what variables to select.


# subset by multiple conditions using |
subset(df, gender == 'M' | state == 'PH')

# subset by multiple conditions using &
subset(df, gender == 'M' & state %in% c('CA','NY'))

Yields below output.

r subset multiple conditions

3. Using df[] Notation

By using bracket notation df[] on R data.frame we can also get data frame by multiple conditions


# Select Rows by Checking multiple conditions
df[df$gender == 'M' | df$state == 'PH',]

df[df$gender == 'M' & df$state %in% c('CA','NY'),]

Yields the same output as above.

4. Using filter() Function

Similarly, you can also subset the data.frame by multiple conditions using filter() function from dplyr package. In order to use this, you have to install it first using install.packages('dplyr') and load it using library(dplyr).


library(dplyr)
# Using dplyr::filter
df %>% filter(gender == 'M' | state == 'PH')
df %>% filter(gender == 'M' & state %in% c('CA','NY') )

Yields the same output as above.

5. Complete Example


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M',NA,'F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
  row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df

# subset by multiple conditions using |
subset(df, gender == 'M' | state == 'PH')

# subset by multiple conditions using &
subset(df, gender == 'M' & state %in% c('CA','NY'))

# Select Rows by Checking multiple conditions
df[df$gender == 'M' | df$state == 'PH',]

df[df$gender == 'M' & df$state %in% c('CA','NY'),]

library(dplyr)
# Using dplyr::filter
df %>% filter(gender == 'M' | state == 'PH')
df %>% filter(gender == 'M' & state %in% c('CA','NY') )

5. Conclusion

In this article, you have learned how to Subset the data frame by multiple conditions in R by using the subset() function, filter() from dplyr package, and using df[] notation.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium