How to Select Rows in R with Examples

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition etc. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame. If you have data.table then use the function from it to achieve better performance.

Some of the methods explained in this article are also used to select columns from an R data frame.

1. Quick Examples of Selecting Rows

Following are quick examples of how to select rows from DataFrame (data.frame) in R


# Quick Examples of selecting rows

# Example 1: Select Rows by index
df[3,]

# Example 2: Select Rows by list of index values
df[c(3,4,6),]

# Example 3: Select Rows by index range
df[3:6,]

# Example 4: Select first N rows
head(df,3)

# Example 5: Select last N rows
tail(df,3)

# Example 6: Select Rows by name
df['r3',]

# Example 7: Select Rows by list of names
df[c('r3','r6'),]

# Example 8: Select Rows by column value
df[df$gender == 'M',]

# Example 9: Select Rows by checking values on multiple columns
df[df$gender == 'M' & df$id > 15,]

# Example 10: Select Rows by list of column values
df[df$state %in% c('CA','AZ','PH'),]

# Example 11: Using is.element()
df[is.element(df$state, c('CA','AZ','PH')),]

# Example 12: Using subset
subset(df, state %in% c("CA", "AZ", "PH"))

# Example 13: Using dplyr::filter
dplyr::filter(df, state %in% c("CA", "AZ", "PH"))

# Example 14: Using data.table
library(data.table)
setDT(df, key = 'state')[J(c("CA", "AZ", "PH"))]

Let’s create an R DataFrame, run these examples, and explore the output. If you already have data in CSV you can easily import CSV files to R DataFrame. Also, refer to Import Excel File into R.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M','F','F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH')
)
df

Yields below output.

r select rows

2. Using R base to Select Rows

By using a bracket notation you can select rows from DataFrame in R. In this selection, I will cover how to select rows by index, select rows by Name, and check column values. All these return a DataFrame after selecting the specific rows hence, you can use these to Create an R DataFrame from the existing DataFrame

2.1 By Index

Every row or observation in a DataFrame is assigned an index, you can use this index to get rows. Following are some commonly used methods to select rows by index in R.


# Select Rows by Index
df[3,]

# Select Rows by List of Index Values
df[c(3,4,6),]

Yields below output.

r select rows

Below are some more ways to select rows using an index of R data frame. First way is get the specified portion of rows using index range, second way is get the first N rows and final way isget the last N rows.


# Select Rows by Index Range
df[3:6,]

# Output:
#   id    name gender        dob state
# 3 12 deepika      F 1987-06-14  <NA>
# 4 13 sahithi      F 1985-08-16  <NA>
# 5 14   kumar      M 1990-10-02    DC
# 6 15   scott      M 1981-03-24    DW


# Select first N rows
head(df,3)

# Output:
#   id    name gender        dob state
# 1 10     sai      M 1990-10-02    CA
# 2 11     ram      M 1981-03-24    NY
# 3 12 deepika      F 1987-06-14  <NA>


# Select last N rows
tail(df,3)

# Output:
#   id    name gender        dob state
# 1 10     sai      M 1990-10-02    CA
# 2 11     ram      M 1981-03-24    NY
# 3 12 deepika      F 1987-06-14  <NA>

2.2 By Row Name

If you have row names on the DataFrame and want to select rows by row name in R, use the below approach. By default row names are the incremental sequence numbers assigned at the time of the creation of the R DataFrame. R also provides a way to assign custom row names while creating the DataFrame or setting row names on the existing by using the rownames() function. To set the column names use colnames() function.

Create the R data frame with customize the row names using rownames() function.


# Create R data frame 
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M','F','F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH')
)
# Set custom row names
custom_row_names <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6', 'r7', 'r8')
rownames(df) <- custom_row_names
print(df)

Yields below output.


# Output:
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika      F 1987-06-14  <NA>
r4 13 sahithi      F 1985-08-16  <NA>
r5 14   kumar      M 1990-10-02    DC
r6 15   scott      M 1981-03-24    DW
r7 16     Don      M 1987-06-14    AZ
r8 17     Lin      F 1985-08-16    PH

To get the single/multiple rows by name you can use the vector with the specified values, you want to return.


# Select Rows by Name
df['r3',]

# Output:
#    id    name gender        dob state
# r3 12 deepika      F 1987-06-14  <NA>

# Select Rows by list of names
df[c('r3','r6'),]

# Output:
#    id    name gender        dob state
# r3 12 deepika      F 1987-06-14  <NA>
# r6 15   scott      M 1981-03-24    DW

2.3 By Checking Column Values

Let’s see some examples of how to select rows by conditions in R, for example, conditions include equal, not equal. And also some examples to get rows based on multiple conditions. To get rows based on column value use %in% operator.


# Select Rows by equal condition
df[df$gender == 'M',]

# Output:
#    id  name gender        dob state
# r1 10   sai      M 1990-10-02    CA
# r2 11   ram      M 1981-03-24    NY
# r5 14 kumar      M 1990-10-02    DC
# r6 15 scott      M 1981-03-24    DW
# r7 16   Don      M 1987-06-14    AZ

# Select Rows by not equal condition
df[df$gender != 'M',]

# Output:
#    id    name gender        dob state
# r3 12 deepika      F 1987-06-14  <NA>
# r4 13 sahithi      F 1985-08-16  <NA>
# r8 17     Lin      F 1985-08-16    PH

# Select Rows by Multiple Conditions
df[df$gender == 'M' & df$id > 15,]

# Output:
#    id name gender        dob state
# r7 16  Don      M 1987-06-14    AZ

# Select rows based on list
df[df$id %in% c(13,14,15),]

# Output:
#    id    name gender        dob state
# r4 13 sahithi      F 1985-08-16  <NA>
# r5 14   kumar      M 1990-10-02    DC
# r6 15   scott      M 1981-03-24    DW

3. Select Rows with head() and tail()

Similarly, You cab use head() function to get the first N rows and use tail() to get the last N rows. These two methods take dataframe object as the first argument and an integer value as the second argument that specifies how many rows to return. For example,


# Select first N rows
head(df,3)

# Output:
#    id    name gender        dob state
# r1 10     sai      M 1990-10-02    CA
# r2 11     ram      M 1981-03-24    NY
# r3 12 deepika      F 1987-06-14  <NA>


# Select last N rows
tail(df,3)

# Output:
#    id  name gender        dob state
# r6 15 scott      M 1981-03-24    DW
# r7 16   Don      M 1987-06-14    AZ
# r8 17   Lin      F 1985-08-16    PH

4. Using subset() to Select Rows from DataFrame

subset() is also an R base function that is also mostly used to select rows from the DataFrame. This function takes the DataFrame object as input and the condition to select rows.


# Using subset to select rows
subset(df, state %in% c("CA", "AZ", "PH"))

# Output:
#    id name gender        dob state
# r1 10  sai      M 1990-10-02    CA
# r7 16  Don      M 1987-06-14    AZ
# r8 17  Lin      F 1985-08-16    PH

5. Using filter() from dplyr Package

dplyr is a package that provides a grammar of data manipulation and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation.

In order to use this, you have to install it first using install.packages('dplyr') and load it using library(dplyr). Here, I will be using method dplyr::filter() from this package to get rows based on column values.


# Load dplyr package
library(dplyr)

# Using dplyr::filter
dplyr::filter(df, state %in% c("CA", "AZ", "PH"))

# Output:
#    id name gender        dob state
# r1 10  sai      M 1990-10-02    CA
# r7 16  Don      M 1987-06-14    AZ
# r8 17  Lin      F 1985-08-16    PH

6. Using setDT() from data.table Package

data.table is a package that is used to work with tabular data in R. It provides the efficient data.table object which is a much improved and better performance version of the default data.frame.

In order to use this, you have to install it first using install.packages('data.table') and load it using library(data.table). Here, I will be using methods from this package to select rows based on column values


# Using data.table
library(data.table)
setDT(df, key = 'state')[J(c("CA", "AZ", "PH"))]

Yields same as the above output.

7. Conclusion

In this article, you have learned by using bracket notation [] on R DataFrame you can select rows by name, by index, by column value, by condition etc. You can also use the R base function subset() to get the same results. Besides these, R also provides another package dplyr to return the selected rows from the DataFrame. If you have a data.table then use the function from it to achieve better performance.

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply