By using the R base df[] notation or the select() function from the dplyr package you can select a single column or multiple columns by name from the R data frame. In this article, I will explore different examples including selecting columns by name from the list, between two column names e.t.c
1. Quick Examples of Select Columns by Name
The following are quick examples of selecting data frame columns by name in R.
# Quick Examples
# R base - Select columns by name
df[,"name"]
# R base - Select columns from list
df[,c("name","gender")]
# Load dplyr
library('dplyr')
# dplyr - Select columns by label name & gender
df %>% select('name','gender')
df %>% select(c('name','gender'))
# dplyr - Select columns except name & gender
df %>% select(-c('name','gender'))
# dplyr - Select columns between name and state
df %>% select('name':'state')
# dplyr - Select columns starts with a string
df %>% select(starts_with('gen'))
# dplyr - Select columns not start with a string
df %>% select(-starts_with('gen'))
# dplyr - Select columns that ends with a string
df %>% select(ends_with('e'))
# dplyr - Select columns that contains
df %>% select(contains('a'))
Let’s create an R DataFrame, run some examples, and analyze the results. If you contain data in CSV format, it’s straightforward to import CSV files to R data frame. Additionally, you may want to refer to guidelines on importing Excel File into R.
# Create DataFrame
df <- data.frame(
id = c(10,11),
name = c('sai','ram'),
gender = c('M','M'),
dob = as.Date(c('1990-10-02','1981-3-24')),
state = c('CA','NY'),
row.names=c('r1','r2')
)
df
Yields below output.
# Output
id name gender dob state
r1 10 sai M 1990-10-02 CA
r2 11 ram M 1981-03-24 NY
2. Select Columns by Name using the R base
Let’s use the R base bracket notation df[] to select columns by name from the data frame in R. The df[]
notation takes syntax df[rows,columns]
, so when using this notation to select columns in R use the columns parameter on the right after the comma.
Similarly, you can also use this notation to select columns by name in R. All you need to pass is the column name as a string to this df[]
. The following example returns a column name from the data frame.
# R base - Select columns by name
df[,"name"]
#Output
#[1] "sai" "ram"
Most of the time you would like to select multiple columns from the list, to do so create a vector with all the columns you wanted and pass it to the column section of df[]. The following example returns the name and gender from a data frame.
# R base - Select columns from list
df[,c("name","gender")]
# Output
# name gender
#r1 sai M
#r2 ram M
3. Select Columns by Name using dplyr Package
Most of the R syntax refers to $ to refer to the column name along with the data frame object (df$id
) and uses []
notation, this syntax is not easy to read, and sometimes R code becomes confusing. dplyr select() function is used to select the columns or variables from the data frame by using English verbs. This takes the first argument as the data frame and the second argument is the column name or vector of column names.
Let’s select columns by Name using dplyr Package, the first example from the following selects the specified columns by name that are supplied to select() function with a comma separator. The second example selects all columns from the list.
# Select columns by label name & gender
df %>% select('name','gender')
df %>% select(c('name','gender'))
# Output
# name gender
#r1 sai M
#r2 ram M
When we use dplyr
package, we mostly use the infix operator %>%
from magrittr
, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator. For example, x %>% f(y)
converted into f(x, y)
. For more examples of this package refer to the R dplyr package tutorial with examples.
3.1. Select Columns Except List of Columns
By using select() from dplyr, you can drop columns from the data frame by specifying their names. To drop columns, prepend a minus sign (-) before the column names. Keep in mind that this operation generates a new data frame excluding the specified columns.
# Select columns except name & gender
df %>% select(-c('name','gender'))
# Output
# id dob state
#r1 10 1990-10-02 CA
#r2 11 1981-03-24 NY
3.2. Select All Between 2 Column Names
To select columns situated between two specific columns, utilize the range operator (:
). The column name preceding the operator indicates the starting point, while the column name following it indicates the endpoint. For instance, the subsequent examples represent the selection of all columns located between name
and state
.
# Select columns between name and state
df %>% select('name':'state')
# Output
# name gender dob state
#r1 sai M 1990-10-02 CA
#r2 ram M 1981-03-24 NY
3.3. Get All that Starts with
To get all columns start with a character string of column names using starts_with()
The below example returns all columns that began with the gen
string.
# Select columns starts with a string
df %>% select(starts_with('gen'))
# Output
# gender
# r1 M
# r2 M
3.4. Get All that ends with
You can use a combination of ends_with()
and select()
to retrieve all columns that end with a specified character sequence. For example, the following query demonstrates selecting columns that end with the e
string.
# Select columns that ends with a string
df %>% select(ends_with('e'))
# Output
# name state
#r1 sai CA
#r2 ram NY
4. Conclusion
In this article, you have learned how to select columns by names in the R programming language. To select columns in R you can use either R base df[] notation or the select() function from the dplyr package.
Related Articles
- R filter() function from dplyr package
- How to select columns by Index in R?
- R Filter DataFrame by Column Value
- slice() from dplyr in R – Examples
- R Subset Data Frame by Column Value & Name
- R subset() Function – Get Rows & Columns
- Select Rows by Name in R
- Select Rows based on Column Value in R
- Select Rows by Index in R with Examples
- R Select Rows by Condition with Examples
- R dplyr filter() – Subset DataFrame Rows