How to subset the data frame (DataFrame) by column value and name in R? By using R base df[] notation, or subset() you can easily subset the R Data Frame (data.frame) by column value or by column name.
1. Quick Examples of Subset DataFrame by Column Value & Name
Following are quick examples of how to subset the DataFrame to get the rows by column value and subset columns by column name in R.
# Quick Examples
# Subset Rows by column value
df[df$gender == 'M',]
subset(df, gender == 'M')
# Subset Rows by list of column Values
df[df$state %in% c('CA','AZ','PH'),]
subset(df, state %in% c('CA','AZ','PH'))
# Subset Rows by Checking values on Multiple Columns
df[df$gender == 'M' & df$id > 15,]
subset(df, gender == 'M' & id >15)
# Subset DataFrame by column name id and name.
df[df$gender == 'M', c('id','name')]
subset(df,gender == 'M',select = c('id','name'))
Let’s create an R DataFrame, run these examples, and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M','F','F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df
Yields below output.
2. Subset Data Frame by Column Value
2.1. Subset Data Frame by Column Value
Let’s use the R base square bracket notation df[]
and subset() function to subset data frame rows based on a column value. The following example subsets the data frame by getting all rows where the column gender
is equal to the value 'M'
. Note that the subset() takes the input data frame as the first argument and the second should be the condition you want to apply.
# Subset the data frame by column value
# Using df[]
df[df$gender == 'M',]
# Using subset()
subset(df, gender == 'M')
Yields below output.
2.2. Subset Data Frame by a List of Column Values
By using the same options, you can also use an operator %in%
to subset the DataFrame by filtering rows based on a list of values. The following example returns all rows where state
values are present in vector values c('CA','AZ','PH')
.
# Subset a data frame by list of column values
# using df[]
df[df$state %in% c('CA','AZ','PH'),]
# Using subset()
subset(df, state %in% c('CA','AZ','PH'))
Yields below output.
# Output
id name gender dob state
r1 16 Don M 1986-03-24 AZ
r2 10 sai M 1990-10-02 CA
r3 17 Lin F 1990-08-26 PH
2.3. Subset Data Frame based on Multiple Columns
If you want to check the conditions of multiple columns and subset the rows based on the result, use the below approach. In this example, I am using multiple conditions, each one with a separate column. This returns rows where gender
is equal to M and id
is greater than 15.
# Using df[]
df[df$gender == 'M' & df$id > 15,]
# Using subset()
subset(df, gender == 'M' & id >15)
Yields below output.
# Output
id name gender dob state
r1 16 Don M 1986-03-24 AZ
3. Subset Data Frame by Column Name
3.1 Subset Data Frame by Column Name
Let’s use the same df[]
notation and subset() function to subset the data frame by column name in R. To subset columns use select
argument with values as column names to subset()
.
# Using df[]
df[df$gender == 'M', 'id']
# Using subset()
subset(df,gender == 'M',select = 'id')
3.2 Subset Data Frame by List of Column Names
Similarly, let’s see how to subset the DataFrame by the list of column names in R. To have a list of column names, create a vector with the column names you are interested in using c() and assign this to the select
argument. The following examples return the data frame with columns id
and name
.
# Using df[]
df[df$gender == 'M', c('id','name')]
# Using subset()
subset(df,gender == 'M',select = c('id','name'))
Yields below output.
# Output
id name
r1 10 sai
r2 11 ram
r5 14 kumar
r6 15 scott
r7 16 Don
Frequently Asked Questions on Subset a Data Frame
You can use the subset()
function or R base df[]
notation to filter rows based on a specific column value.
To subset a data frame based on multiple conditions you can use logical operators (e.g., &
for AND, |
for OR).
You can use the %in%
operator to subset a data frame within a range.
The filter()
function in the dplyr package is used for subsetting data frames based on column values.
4. Conclusion
In this article, you have learned how to subset data frame by column value and by column name in R. You can do this by using R base subset() or the square bracket notation df[].
Related Articles
- How to Select Rows by Index in R with Examples
- How to Select Rows by Condition in R with Examples
- How to Select Rows by Column Values in R
- R subset Data Frame with Examples
- R filter() function from dplyr package
- R select() function from dplyr package
- R mutate() function from dplyr package
- How to select rows by name in R?
- How to filter dataframe by column value?
- How to subset a matrix in R?