How to filter the data frame by multiple conditions in R? You can use df[]
notation and which()
function to filter the data frame based on multiple conditions. Filtering a data frame typically refers to the process of selecting a few rows or columns from a larger dataframe based on specific criteria. This can involve selecting rows where a certain column meets certain conditions (e.g., values greater than a threshold) or columns based on their names or data types.
You can also use the filter() from the dplyr package and the subset() function from the R base package to implement the filtering of data frames based on certain conditions. In this article, I will explain different ways to filter the R DataFrame by multiple conditions.
Key Points –
- In a resultant Data Frame, the order of rows preserved is the same as in the original data.
- After filtering, the columns remain unchanged.
- Some groups of rows might be combined if they meet the conditions.
- You can use logical operators like AND(&), and OR(|) to implement the multiple conditions and filter the rows of the data frame.
- You can represent the columns using its name (df$col_name) or its index (df[]).
Create Data Frame
To run some examples of filtering a data frame, let’s create an R DataFrame. If you have data in CSV you can easily import CSV files to R DataFrame.
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M',NA,'F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df
Yields below output.
Using df[] to Filter by Multiple Conditions
You can use df[]
notation without which() to implement the filtering of the data frame by multiple conditions. To filter rows in a data frame based on multiple conditions on column values, use the logical AND operator. This operator combines conditions using the &
symbol and returns TRUE
if both conditions are TRUE
.
# Fiter the data frame by multiple conditions
# using df[] without which()
fil_df <- df[df$gender == 'F' & df$state %in% c('PH', NA),]
print("After filtering the data frame:")
fil_df
The above code has returned a new data frame where the rows are based on the gender
column value F
and state
column value PH
and NA
.
Yields below output.
# Output:
[1] "After filtering the data frame:"
> fil_df
id name gender dob state
r4 13 sahithi F 1985-08-16 <NA>
r8 17 Lin F 1990-08-26 PH
Filter Rows by Multiple Conditions using df[] with which()
Alternatively, you can use the df[]
notation along with the which()
function to filter the data frame by multiple conditions using the row indices obtained from the which() function. This effectively filters the data frame based on the specified conditions.
To filter rows in a data frame based on multiple conditions, use the logical OR operator. This operator combines conditions using the |
symbol and returns TRUE
if at least one of the conditions is TRUE
.
# Fiter the data frame by multiple conditions
# using df[] with which()
fil_df <- df[which(df$gender == 'F' | df$state != 'CA'),]
print("After filtering the data frame:")
fil_df
The above code has returned a new data frame where the rows are based on the gender
column value F
and state
column value CA
(California).
Yields below output.
Using the filter() Function to Filter by Multiple Conditions
Similarly, you can use filter() function from dplyr package to implement the filtering of data frame based on multiple conditions. Before going to use the filter() function you need to install the dplyr package using install.packages('dplyr')
. After completing the installation you need to load it using library(dplyr)
.
Let’s pass multiple conditions on specified column values using logical operators to filter the data frame rows.
# Using dplyr::filter
# Load dplyr package
library(dplyr)
fil_df <- df %>% filter(gender == 'F' | state != 'CA')
print("After fitering the data frame:")
fil_df
fil_df <- df %>% filter(gender == 'F' & state %in% c('PH', NA))
print("After fitering the data frame:")
fil_df
Yields below output.
# Output:
[1] "After fitering the data frame:"
> fil_df
id name gender dob state
r2 11 ram M 1981-03-24 NY
r4 13 sahithi F 1985-08-16 <NA>
r5 14 kumar M 1995-03-02 DC
r6 15 scott M 1991-06-21 DW
r7 16 Don M 1986-03-24 AZ
r8 17 Lin F 1990-08-26 PH
[1] "After fitering the data frame:"
> fil_df
id name gender dob state
r4 13 sahithi F 1985-08-16 <NA>
r8 17 Lin F 1990-08-26 PH
Use Base Function to Filter the Data Frame
Finally, you can use the subset() of the R base function to filter the data frame based on multiple conditions. This function accepts the given data frame as the first argument and an expression as the second argument.
# subset by multiple conditions using |
fil_df <- subset(df, gender == 'F' | state != 'CA')
print("After fitering the data frame:")
fil_df
# subset by multiple conditions using &
fil_df <- subset(df, gender == 'F' & state %in% c('PH',NA))
print("After fitering the data frame:")
fil_df
Yields the output as same as the above.
Conclusion
In this article, I have explained how to filter the data frame based on multiple conditions in R. Using df[] notation with and without which() function, filter() function from dplyr package, and R base function. The logical AND operator (&
) symbol returns TRUE
if both conditions are TRUE
. The logical OR operator (|
) symbol returns FALSE
if both conditions are FALSE
.
Related Articles
- How to Select Rows by Index in R with Examples
- How to Select Rows by Condition in R with Examples
- How to Select Rows by Column Values in R
- R select() function from dplyr package
- R mutate() function from dplyr package
- How to select rows by name in R?
- How to subset dataframe by column value in R?
- How to filter dataframe by column value?
- Explained apply Functions in R with Examples
- R Count Frequency of All Unique Values in Vector