You are currently viewing How to Select Rows in R with Examples

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition etc. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame. If you have data.table then use the function from it to achieve better performance.

Some of the methods explained in this article are also used to select columns from an R data frame.

1. Quick Examples of Selecting Rows

Following are quick examples of how to select rows from DataFrame (data.frame) in R


# Quick Examples of selecting rows

# Example 1: Select Rows by index
df[3,]

# Example 2: Select Rows by list of index values
df[c(3,4,6),]

# Example 3: Select Rows by index range
df[3:6,]

# Example 4: Select first N rows
head(df,3)

# Example 5: Select last N rows
tail(df,3)

# Example 6: Select Rows by name
df['r3',]

# Example 7: Select Rows by list of names
df[c('r3','r6'),]

# Example 8: Select Rows by column value
df[df$gender == 'M',]

# Example 9: Select Rows by checking values on multiple columns
df[df$gender == 'M' & df$id > 15,]

# Example 10: Select Rows by list of column values
df[df$state %in% c('CA','AZ','PH'),]

# Example 11: Using is.element()
df[is.element(df$state, c('CA','AZ','PH')),]

# Example 12: Using subset
subset(df, state %in% c("CA", "AZ", "PH"))

# Example 13: Using dplyr::filter
dplyr::filter(df, state %in% c("CA", "AZ", "PH"))

# Example 14: Using data.table
library(data.table)
setDT(df, key = 'state')[J(c("CA", "AZ", "PH"))]

Let’s create an R DataFrame, run these examples, and explore the output. If you already have data in CSV you can easily import CSV files to R DataFrame. Also, refer to Import Excel File into R.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M','F','F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH')
)
df

Yields below output.

r select rows

2. Using R base to Select Rows

By using a bracket notation you can select rows from DataFrame in R. In this selection, I will cover how to select rows by index, select rows by Name, and check column values. All these return a DataFrame after selecting the specific rows hence, you can use these to Create an R DataFrame from the existing DataFrame

2.1 By Index

Every row or observation in a DataFrame is assigned an index, you can use this index to get rows. Following are some commonly used methods to select rows by index in R.


# Select Rows by Index
df[3,]

# Select Rows by List of Index Values
df[c(3,4,6),]

Yields below output.

r select rows

Below are some more ways to select rows using an index of the R data frame. The first way is to get the specified portion of rows using the index range, the second way is to get the first N rows and the final way is to get the last N rows.


# Select Rows by Index Range
df[3:6,]

# Output:
#   id    name gender        dob state
# 3 12 deepika      F 1987-06-14  <NA>
# 4 13 sahithi      F 1985-08-16  <NA>
# 5 14   kumar      M 1990-10-02    DC
# 6 15   scott      M 1981-03-24    DW


# Select first N rows
head(df,3)

# Output:
#   id    name gender        dob state
# 1 10     sai      M 1990-10-02    CA
# 2 11     ram      M 1981-03-24    NY
# 3 12 deepika      F 1987-06-14  <NA>


# Select last N rows
tail(df,3)

# Output:
#   id    name gender        dob state
# 1 10     sai      M 1990-10-02    CA
# 2 11     ram      M 1981-03-24    NY
# 3 12 deepika      F 1987-06-14  <NA>

2.2 By Row Name

If you have row names on the DataFrame and want to select rows by row name in R, use the below approach. By default row names are the incremental sequence numbers assigned at the time of the creation of the R DataFrame. R also provides a way to assign custom row names while creating the DataFrame or setting row names on the existing one by using the rownames() function. To set the column names use colnames() function.

Create the R data frame by customizing the row names using rownames() function.


# Create R data frame 
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M','F','F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH')
)
# Set custom row names
custom_row_names <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6', 'r7', 'r8')
rownames(df) <- custom_row_names
print(df)

Yields below output.


# Output:
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika      F 1987-06-14  <NA>
r4 13 sahithi      F 1985-08-16  <NA>
r5 14   kumar      M 1990-10-02    DC
r6 15   scott      M 1981-03-24    DW
r7 16     Don      M 1987-06-14    AZ
r8 17     Lin      F 1985-08-16    PH

To get the single/multiple rows by name you can use the vector with the specified values, you want to return.


# Select Rows by Name
df['r3',]

# Output:
#    id    name gender        dob state
# r3 12 deepika      F 1987-06-14  <NA>

# Select Rows by list of names
df[c('r3','r6'),]

# Output:
#    id    name gender        dob state
# r3 12 deepika      F 1987-06-14  <NA>
# r6 15   scott      M 1981-03-24    DW

2.3 By Checking Column Values

Let’s see some examples of how to select rows by conditions in R, for example, conditions include equal, not equal. And also some examples to get rows based on multiple conditions. To get rows based on column value use %in% operator.


# Select Rows by equal condition
df[df$gender == 'M',]

# Output:
#    id  name gender        dob state
# r1 10   sai      M 1990-10-02    CA
# r2 11   ram      M 1981-03-24    NY
# r5 14 kumar      M 1990-10-02    DC
# r6 15 scott      M 1981-03-24    DW
# r7 16   Don      M 1987-06-14    AZ

# Select Rows by not equal condition
df[df$gender != 'M',]

# Output:
#    id    name gender        dob state
# r3 12 deepika      F 1987-06-14  <NA>
# r4 13 sahithi      F 1985-08-16  <NA>
# r8 17     Lin      F 1985-08-16    PH

# Select Rows by Multiple Conditions
df[df$gender == 'M' & df$id > 15,]

# Output:
#    id name gender        dob state
# r7 16  Don      M 1987-06-14    AZ

# Select rows based on list
df[df$id %in% c(13,14,15),]

# Output:
#    id    name gender        dob state
# r4 13 sahithi      F 1985-08-16  <NA>
# r5 14   kumar      M 1990-10-02    DC
# r6 15   scott      M 1981-03-24    DW

3. Select Rows with head() and tail()

Similarly, You can use the head() function to get the first N rows and use the tail() to get the last N rows. These two methods take a dataframe object as the first argument and an integer value as the second argument that specifies how many rows to return. For example,


# Select first N rows
head(df,3)

# Output:
#    id    name gender        dob state
# r1 10     sai      M 1990-10-02    CA
# r2 11     ram      M 1981-03-24    NY
# r3 12 deepika      F 1987-06-14  <NA>


# Select last N rows
tail(df,3)

# Output:
#    id  name gender        dob state
# r6 15 scott      M 1981-03-24    DW
# r7 16   Don      M 1987-06-14    AZ
# r8 17   Lin      F 1985-08-16    PH

4. Using subset() to Select Rows from DataFrame

The subset() function is an R base function commonly used to select specific rows from a data frame. It accepts a data frame as an argument, along with a condition that defines which rows you want to select.


# Using subset to select rows
subset(df, state %in% c("CA", "AZ", "PH"))

# Output:
#    id name gender        dob state
# r1 10  sai      M 1990-10-02    CA
# r7 16  Don      M 1987-06-14    AZ
# r8 17  Lin      F 1985-08-16    PH

5. Using filter() from dplyr Package

The <a href="https://sparkbyexamples.com/r-programming/r-dplyr-tutorial-learn-with-examples/">dplyr</a> package provides a structured approach to data manipulation. It includes a collection of popular functions to help data analysts with common data transformation tasks.

To work with dplyr, start by installing it with install.packages('dplyr') and then load it using library(dplyr). In this example, I’ll use the dplyr::filter() function from this package to select rows based on specific column values.


# Load dplyr package
library(dplyr)

# Using dplyr::filter
dplyr::filter(df, state %in% c("CA", "AZ", "PH"))

# Output:
#    id name gender        dob state
# r1 10  sai      M 1990-10-02    CA
# r7 16  Don      M 1987-06-14    AZ
# r8 17  Lin      F 1985-08-16    PH

6. Using setDT() from data.table Package

The data.table package in R is designed for handling tabular data. It offers highly efficient the data.table object, which is an optimized and high-performance alternative to the default data.frame.

To use the data.table, you need to install the package with install.packages('data.table') and load it using library(data.table). In this example, I will use data.table methods to select specific rows based on the values in certain columns.


# Using data.table
library(data.table)
setDT(df, key = 'state')[J(c("CA", "AZ", "PH"))]

Yields the same as the above output.

7. Conclusion

In this article, you have learned by using bracket notation [] on R DataFrame you can select rows by name, by index, by column value, by condition, etc. You can also use the R base function subset() to get the same results. Besides these, R also provides another package dplyr to return the selected rows from the DataFrame. If you have a data.table then use the function from it to achieve better performance.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium