How to Select Columns by Name in R?

By using R base df[] notation or select() function from dplyr package you can select a single column or select multiple columns by name from the R Data Frame. In this article, I will explain different examples including selecting columns by name from the list, between two column names e.t.c

1. Quick Examples of Select Columns by Name

The following are quick examples of how to select data frame columns by name in R.


# Quick Examples
# R base - Select columns by name
df[,"name"]

# R base - Select columns from list
df[,c("name","gender")]

# Load dplyr 
library('dplyr')

# dplyr - Select columns by label name & gender
df %>% select('name','gender')
df %>% select(c('name','gender'))

# dplyr - Select columns except name & gender
df %>% select(-c('name','gender'))

# dplyr - Select columns between name and state
df %>% select('name':'state')

# dplyr - Select columns starts with a string
df %>% select(starts_with('gen'))

# dplyr - Select columns not start with a string
df %>% select(-starts_with('gen'))

# dplyr - Select columns that ends with a string
df %>% select(ends_with('e'))

# dplyr - Select columns that contains
df %>% select(contains('a'))

Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.


# Create DataFrame
df <- data.frame(
  id = c(10,11),
  name = c('sai','ram'),
  gender = c('M','M'),
  dob = as.Date(c('1990-10-02','1981-3-24')),
  state = c('CA','NY'),
  row.names=c('r1','r2')
)
df

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY

2. Select Columns by Name using R base

Lets use the R base bracket notation df[] to select columns by name from data frame in R. The df[] notation takes syntax df[rows,columns], so when using this notation to select columns in R use the columns parameter on the right after the comma.

Similarly, you can also use this notation to select columns by name in R. All you need to pass is the column name as string to this df[]. The following example returns a column name from the data frame.


# R base - Select columns by name
df[,"name"]

#Output
#[1] "sai" "ram"

Most of the time you would like to select multiple columns from the list, to do so just create a vector with all the columns you wanted and pass it to column section of df[]. The following example returns name and gender from data frame.


# R base - Select columns from list
df[,c("name","gender")]

# Output
#   name gender
#r1  sai      M
#r2  ram      M

3. Select Columns by Name using dplyr Package

Most of the R syntax takes $ to refer to column name along with data frame object (df$id) and uses [] notation, this syntax is not easy to read, and sometimes R code becomes confusing. dplyr select() function is used to select the columns or variables from the data frame by using English verbs. This takes the first argument as the data frame and the second argument is the column name or vector of column names.

Let’s select columns by Name using dplyr Package, the first example from the following selects the specified columns by name that are supplied to select() function with a comma separator. The second example selects all columns from the list.


# Select columns by label name & gender
df %>% select('name','gender')
df %>% select(c('name','gender'))

# Output
#   name gender
#r1  sai      M
#r2  ram      M

When we use dplyr package, we mostly use the infix operator %>% from magrittr, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator. For example, x %>% f(y) converted into f(x, y). For more examples of this package refer to R dplyr package tutorial with examples.

3.1. Select Columns Except List of Columns

By using select() from dplyr, you can also drop columns from the DataFrame by Name. To drop columns, use - along with the columns. Not that it just returns a new DataFrame without the specified columns.


# Select columns except name & gender
df %>% select(-c('name','gender'))

# Output
#   id        dob state
#r1 10 1990-10-02    CA
#r2 11 1981-03-24    NY

3.2. Select All Between 2 Column Names

If you want to select columns between two columns use the range operator (:). The left-hand side of the operator is the starting column name and the right-hand side would be the end column name. The following examples select all columns between name and state columns.


# Select columns between name and state
df %>% select('name':'state')

# Output
#   name gender        dob state
#r1  sai      M 1990-10-02    CA
#r2  ram      M 1981-03-24    NY

3.3. Get All that starts with

To get all columns start with a character string of column name using starts_with() The following example selects all columns that start with the gen string.


# Select columns starts with a string
df %>% select(starts_with('gen'))

# Output
#   gender
#r1      M
#r2      M

3.4. Get All that ends with

Use ends_with() along with the select() to get all columns ends with a character string. The following example selects all columns that end with the e string.


# Select columns that ends with a string
df %>% select(ends_with('e'))

# Output
#   name state
#r1  sai    CA
#r2  ram    NY

4. Conclusion

In this article, you have learned how to select columns by names in the R programming language. To select columns in R you can use either R base df[] notation or select() function from dplyr package.

Related Articles

References

r select columns name

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing How to Select Columns by Name in R?