The subset() is a generic R function that is used to get the rows and columns (In R terms observations & variables) from the data frame. Also used to get a subset of vectors, and subset of matrices. In this article, I will explain the syntax and usage of the subset()
function and some examples of how to get the subset of rows and columns.
Alternatively, you can also select rows in R using df[] notation.
1. R subset() Function Syntax
The following is the syntax of the subset()
function.
# Syntax of subset()
subset(x, subset, select, drop = FALSE, …)
Arguments
x
– Object to be subsetted. Could be any of the Vector, data.frame, & matrices.subset
– Subset expression.select
– Columns to select in a vector.drop
– Passed on to the indexing method for matrices and data frames...
– Other arguments.
In order to understand the R subset() function with examples, first, let’s create a DataFrame in R.
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M',NA,'F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df
Yields below output.
id name gender dob state
r1 10 sai M 1990-10-02 CA
r2 11 ram M 1981-03-24 NY
r3 12 deepika <NA> 1987-06-14 <NA>
r4 13 sahithi F 1985-08-16 <NA>
r5 14 kumar M 1995-03-02 DC
r6 15 scott M 1991-06-21 DW
r7 16 Don M 1986-03-24 AZ
r8 17 Lin F 1990-08-26 PH
2. Rows subset() Example
The subset()
function of R is used to get the subset of rows from the data frame based on a list of row names, a list of values, and based on conditions (certain criteria) e.t.c
2.1 subset() by Row Name
By using the subset()
function let’s see how to get the specific row by name. Use the subset argument to specify the expression on how to get the rows.
# subset by row name
subset(df, subset=rownames(df) == 'r1')
Yields below output.
# Output
id name gender dob state
r1 10 sai M 1990-10-02 CA
2.2 subset() by a list of values
If you have a list of row values to subset, first create the vector with the list of values and use the %in%
operator on condition to the subset()
function.
# subset row by of row values
subset(df, rownames(df) %in% c('r1','r2','r3'))
Yields below output.
# Output
id name gender dob state
r1 10 sai M 1990-10-02 CA
r2 11 ram M 1981-03-24 NY
r3 12 deepika <NA> 1987-06-14 <NA>
2.3 subset() by Condition or Column Value
Using subset(), you can also select rows based on column value by specifying the conditions. In the following examples, I have covered how to use a subset data frame based on single and multiple conditions and based on a list of column values.
# subset by condition
subset(df, gender=='M')
# subset by condition
subset(df, state %in% c('CA','DC'))
# subset by multiple conditions
subset(df, gender=='M' | state == 'PH')
# subset by multiple conditions
subset(df, gender=='M' & state %in% c('CA','NY'))
I would leave the above subset examples to you to run and explore the output.
3. Columns subset() Example
subset()
function is also used to get the columns or variables from the R. To subset columns use a select
argument with either the column name or vector of column names. To create a vector with values use c() function.
3.1 subset() Columns by Name
The following example returns the subset of columns by name id
, name
and gender
and rows where gender
is equal to the value 'M'
.
#subset columns
subset(df,gender=='M',select=c('id','name','gender'))
Yields below output.
# Output
id name gender
r1 10 sai M
r2 11 ram M
r5 14 kumar M
r6 15 scott M
r7 16 Don M
To subset by single column use subset(df,gender=='M',select='id')
3.2 subset() Columns by Index
If you wanted to get the subset of columns based on an index, just pass the vector of column indexes as a value to select
param. The following example subset rows with gender
value as 'M'
and columns with indexes 1,2, and 3.
#subset columns by index
subset(df,gender=='M',select=c(1,2,3))
Yields the same output as below.
4. Complete Example
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M',NA,'F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df
# subset of rows
subset(df, rownames(df) == 'r1')
# subset of rows
subset(df, rownames(df) %in% c('r1','r2','r3'))
# subset by condition
subset(df, gender=='M')
# subset by condition
subset(df, state %in% c('CA','DC'))
# subset by multiple conditions
subset(df, gender=='M' | state == 'PH')
# subset by multiple conditions
subset(df, gender=='M' & state %in% c('CA','NY'))
#subset columns
subset(df,gender=='M',select='id')
#subset columns
subset(df,gender=='M',select=c('id','name','gender'))
#subset columns
subset(df,gender=='M',select=c(1,2,3))
5. Conclusion
In this article, you have learned subset() function that is used to get the specified observations(rows) and variables(columns) in the R. Also, learned subset syntax and usage with examples.
Related Articles
- How to Select Rows by Index in R with Examples
- How to Select Rows by Condition in R with Examples
- How to Select Rows by Column Values in R
- R filter() function from dplyr package
- R select() function from dplyr package
- R mutate() function from dplyr package
- How to select rows by name in R?
- How to subset data frame by column value in R?
- How to filter data frame by column value?