How do I select rows based on column value in R? In this article, I will explain how to select rows based on the values in specific column by using the R base function subset()
, square bracket notation, filter()
from dplyr
package and finally using data.table
.
1. Quick Examples
Following are quick examples of how to select DataFrame rows based on column value. For more examples of select refer to select rows in R.
# Quick Examples
# Example 1: Select Rows by column value
df[df$gender == 'M',]
# Example 2: Select Rows by Checking values on Multiple Columns
df[df$gender == 'M' & df$id > 15,]
# Example 3: Select Rows by list of column Values
df[df$state %in% c('CA','AZ','PH'),]
# Example 4: Using is.element()
df[is.element(df$state, c('CA','AZ','PH')),]
# Example 5: Using subset
subset(df, state %in% c("CA", "AZ", "PH"))
# Example 6: Using dplyr::filter
dplyr::filter(df, state %in% c("CA", "AZ", "PH"))
# Example 7: Using data.table
library(data.table)
setDT(df, key = 'state')[J(c("CA", "AZ", "PH"))]
Let’s create an R DataFrame, run these examples, and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M','F','F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df
Yields below output.
2. Select Rows based on Column Value
Let’s use the R base square bracket notation df[] to select rows based on the values of a single column. The following example selects all rows where the values in a specific column meet a certain condition.
# Select Rows by column value
df[df$gender == 'M',]
The above code has returned a new DataFrame containing only the rows where the gender
is M
. In this case, it will include rows r1
, r2
, r5
, r6
, and r7
. This example yields the below output.
3. Select Rows by List of Column Values
By using the same notation along with an operator %in% to select the DataFrame rows based on a list of column values. The following example returns all rows when state values are present in vector values c('CA','AZ','PH')
.
# Select Rows by list of column Values
df[df$state %in% c('CA','AZ','PH'),]
Yields below output.
# Output
id name gender dob state
r1 16 Don M 1986-03-24 AZ
r2 10 sai M 1990-10-02 CA
r3 17 Lin F 1990-08-26 PH
4. Select Rows based on Multiple Columns
If you want to check the conditions of multiple columns and select the rows based on the result, use the below approach. In this example, I am using multiple conditions, each one with a separate column.
# Select Rows by Checking values on Multiple Columns
df[df$gender == 'M' & df$id > 15,]
Yields below output.
# Output
id name gender dob state
r1 16 Don M 1986-03-24 AZ
5. Using is.element() to Select Rows by Column Values
The is.element() function is an R base function that is used to check if the value of a column from the first argument is present in the second argument if it is present it returns the row.
# Using is.element()
df[is.element(df$state, c('CA','AZ','PH')),]
Yields below output.
# Output
id name gender dob state
r1 16 Don M 1986-03-24 AZ
r2 10 sai M 1990-10-02 CA
r3 17 Lin F 1990-08-26 PH
6. Using subset()
subset() is also a R base function that also mostly used to select rows from the DataFrame. This function takes the DataFrame object as input and the condition to select rows.
# Using subset
subset(df, state %in% c("CA", "AZ", "PH"))
7. Using dplyr::filter()
dplyr
is a package that provides a grammar of data manipulation, and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation.
To use this, you have to install it first using install.packages('dplyr')
and load it using library(dplyr)
. Here, I will be using method filter() from this package to select rows based on column values.
# Using dplyr::filter
library(dplyr)
dplyr::filter(df, state %in% c("CA", "AZ", "PH"))
8. Using data.table
data.table
is a package that is used to work with tabular data in R. It provides efficient data.table
object which is a much improved and better performance version of the default data.frame
.
To use this, you have to install it first using install.packages('data.table')
and load it using library(data.table)
. Here, I will be using methods from this package to select rows based on column values.
# Using data.table
library(data.table)
setDT(df, key = 'state')[J(c("CA", "AZ", "PH"))]
Conclusion
In this article, you have learned how to select rows based on single/multiple column values in R by using r base function subset(), square bracket notation, function() from dplyr package and finally using data.table.
Related Articles
- Select Rows by Index in R with Examples
- R Select Rows by Condition with Examples
- How to select columns by Name in R
- How to select columns by Index in R
- R subset() function
- R filter() function from dplyr package
- R select() function from dplyr package
- R mutate() function from dplyr package