You are currently viewing Select Rows based on Column Value in R

How do I select rows based on column value in R? In this article, I will explain how to select rows based on the values in specific column by using the R base function subset(), square bracket notation, filter() from dplyr package and finally using data.table.

1. Quick Examples

Following are quick examples of how to select DataFrame rows based on column value. For more examples of select refer to select rows in R.


# Quick Examples

# Example 1: Select Rows by column value
df[df$gender == 'M',]

# Example 2: Select Rows by Checking values on Multiple Columns
df[df$gender == 'M' & df$id > 15,]

# Example 3: Select Rows by list of column Values
df[df$state %in% c('CA','AZ','PH'),]

# Example 4: Using is.element()
df[is.element(df$state, c('CA','AZ','PH')),]

# Example 5: Using subset
subset(df, state %in% c("CA", "AZ", "PH"))

# Example 6: Using dplyr::filter
dplyr::filter(df, state %in% c("CA", "AZ", "PH"))

# Example 7: Using data.table
library(data.table)
setDT(df, key = 'state')[J(c("CA", "AZ", "PH"))]

Let’s create an R DataFrame, run these examples, and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M','F','F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
  row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df

Yields below output.

r select rows column

2. Select Rows based on Column Value

Let’s use the R base square bracket notation df[] to select rows based on the values of a single column. The following example selects all rows where the values in a specific column meet a certain condition.


# Select Rows by column value
df[df$gender == 'M',]

The above code has returned a new DataFrame containing only the rows where the gender is M. In this case, it will include rows r1, r2, r5, r6, and r7. This example yields the below output.

r select rows column

3. Select Rows by List of Column Values

By using the same notation along with an operator %in% to select the DataFrame rows based on a list of column values. The following example returns all rows when state values are present in vector values c('CA','AZ','PH').


# Select Rows by list of column Values
df[df$state %in% c('CA','AZ','PH'),]

Yields below output.


# Output
   id name gender        dob state
r1 16  Don      M 1986-03-24    AZ
r2 10  sai      M 1990-10-02    CA
r3 17  Lin      F 1990-08-26    PH

4. Select Rows based on Multiple Columns

If you want to check the conditions of multiple columns and select the rows based on the result, use the below approach. In this example, I am using multiple conditions, each one with a separate column.


# Select Rows by Checking values on Multiple Columns
df[df$gender == 'M' & df$id > 15,]

Yields below output.


# Output
   id name gender        dob state
r1 16  Don      M 1986-03-24    AZ

5. Using is.element() to Select Rows by Column Values

The is.element() function is an R base function that is used to check if the value of a column from the first argument is present in the second argument if it is present it returns the row.


# Using is.element()
df[is.element(df$state, c('CA','AZ','PH')),]

Yields below output.


# Output
   id name gender        dob state
r1 16  Don      M 1986-03-24    AZ
r2 10  sai      M 1990-10-02    CA
r3 17  Lin      F 1990-08-26    PH

6. Using subset()

subset() is also a R base function that also mostly used to select rows from the DataFrame. This function takes the DataFrame object as input and the condition to select rows.


# Using subset
subset(df, state %in% c("CA", "AZ", "PH"))

7. Using dplyr::filter()

dplyr is a package that provides a grammar of data manipulation, and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation.

To use this, you have to install it first using install.packages('dplyr') and load it using library(dplyr). Here, I will be using method filter() from this package to select rows based on column values.


# Using dplyr::filter
library(dplyr)
dplyr::filter(df, state %in% c("CA", "AZ", "PH"))

8. Using data.table

data.table is a package that is used to work with tabular data in R. It provides efficient data.table object which is a much improved and better performance version of the default data.frame.

To use this, you have to install it first using install.packages('data.table') and load it using library(data.table). Here, I will be using methods from this package to select rows based on column values.


# Using data.table
library(data.table)
setDT(df, key = 'state')[J(c("CA", "AZ", "PH"))]

Conclusion

In this article, you have learned how to select rows based on single/multiple column values in R by using r base function subset(), square bracket notation, function() from dplyr package and finally using data.table.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium