You are currently viewing How to Select Columns by Name in R?

By using the R base df[] notation or the select() function from the dplyr package you can select a single column or multiple columns by name from the R data frame. In this article, I will explore different examples including selecting columns by name from the list, between two column names e.t.c

Advertisements

1. Quick Examples of Select Columns by Name

The following are quick examples of selecting data frame columns by name in R.


# Quick Examples
# R base - Select columns by name
df[,"name"]

# R base - Select columns from list
df[,c("name","gender")]

# Load dplyr 
library('dplyr')

# dplyr - Select columns by label name & gender
df %>% select('name','gender')
df %>% select(c('name','gender'))

# dplyr - Select columns except name & gender
df %>% select(-c('name','gender'))

# dplyr - Select columns between name and state
df %>% select('name':'state')

# dplyr - Select columns starts with a string
df %>% select(starts_with('gen'))

# dplyr - Select columns not start with a string
df %>% select(-starts_with('gen'))

# dplyr - Select columns that ends with a string
df %>% select(ends_with('e'))

# dplyr - Select columns that contains
df %>% select(contains('a'))

Let’s create an R DataFrame, run some examples, and analyze the results. If you contain data in CSV format, it’s straightforward to import CSV files to R data frame. Additionally, you may want to refer to guidelines on importing Excel File into R.


# Create DataFrame
df <- data.frame(
  id = c(10,11),
  name = c('sai','ram'),
  gender = c('M','M'),
  dob = as.Date(c('1990-10-02','1981-3-24')),
  state = c('CA','NY'),
  row.names=c('r1','r2')
)
df

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY

2. Select Columns by Name using the R base

Let’s use the R base bracket notation df[] to select columns by name from the data frame in R. The df[] notation takes syntax df[rows,columns], so when using this notation to select columns in R use the columns parameter on the right after the comma.

Similarly, you can also use this notation to select columns by name in R. All you need to pass is the column name as a string to this df[]. The following example returns a column name from the data frame.


# R base - Select columns by name
df[,"name"]

#Output
#[1] "sai" "ram"

Most of the time you would like to select multiple columns from the list, to do so create a vector with all the columns you wanted and pass it to the column section of df[]. The following example returns the name and gender from a data frame.


# R base - Select columns from list
df[,c("name","gender")]

# Output
#   name gender
#r1  sai      M
#r2  ram      M

3. Select Columns by Name using dplyr Package

Most of the R syntax refers to $ to refer to the column name along with the data frame object (df$id) and uses [] notation, this syntax is not easy to read, and sometimes R code becomes confusing. dplyr select() function is used to select the columns or variables from the data frame by using English verbs. This takes the first argument as the data frame and the second argument is the column name or vector of column names.

Let’s select columns by Name using dplyr Package, the first example from the following selects the specified columns by name that are supplied to select() function with a comma separator. The second example selects all columns from the list.


# Select columns by label name & gender
df %>% select('name','gender')
df %>% select(c('name','gender'))

# Output
#   name gender
#r1  sai      M
#r2  ram      M

When we use dplyr package, we mostly use the infix operator %>% from magrittr, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator. For example, x %>% f(y) converted into f(x, y). For more examples of this package refer to the R dplyr package tutorial with examples.

3.1. Select Columns Except List of Columns

By using select() from dplyr, you can drop columns from the data frame by specifying their names. To drop columns, prepend a minus sign (-) before the column names. Keep in mind that this operation generates a new data frame excluding the specified columns.


# Select columns except name & gender
df %>% select(-c('name','gender'))

# Output
#   id        dob state
#r1 10 1990-10-02    CA
#r2 11 1981-03-24    NY

3.2. Select All Between 2 Column Names

To select columns situated between two specific columns, utilize the range operator (:). The column name preceding the operator indicates the starting point, while the column name following it indicates the endpoint. For instance, the subsequent examples represent the selection of all columns located between name and state.


# Select columns between name and state
df %>% select('name':'state')

# Output
#   name gender        dob state
#r1  sai      M 1990-10-02    CA
#r2  ram      M 1981-03-24    NY

3.3. Get All that Starts with

To get all columns start with a character string of column names using starts_with() The below example returns all columns that began with the gen string.


# Select columns starts with a string
df %>% select(starts_with('gen'))

# Output
#   gender
# r1      M
# r2      M

3.4. Get All that ends with

You can use a combination of ends_with()and select() to retrieve all columns that end with a specified character sequence. For example, the following query demonstrates selecting columns that end with the e string.


# Select columns that ends with a string
df %>% select(ends_with('e'))

# Output
#   name state
#r1  sai    CA
#r2  ram    NY

4. Conclusion

In this article, you have learned how to select columns by names in the R programming language. To select columns in R you can use either R base df[] notation or the select() function from the dplyr package.

Related Articles

References