You are currently viewing Select Columns by Index Position in R

By using the R base df[] notation or select() function from dplyr package you can select a single column or select multiple columns by index position (column number) from the R Data Frame. In this article, I will explain different examples including selecting columns by index from the list, between two column indexes e.t.c

1. Quick Examples of Select Columns by Index Position

The following are quick examples of how to select columns by index position (column number) in R. For other examples on columns refer to select columns from the R data frame.


# Quick Examples

# R base - select column by index position
df[,2]

# R base - Select columns by index position
df[,c(2,3)]

# R base - Select columns by range
df[,2:2]

# Load dplyr 
library('dplyr')

# Select column by index position
df %>% select(2,3)

# dplyr - Select columns by list of index or position
df %>% select(c(2,3))

# Select columns by index range
df %>% select(2:3)

Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R if you have an excel file to use.


# Create DataFrame
df <- data.frame(
  id = c(10,11),
  name = c('sai','ram'),
  gender = c('M','M'),
  dob = as.Date(c('1990-10-02','1981-3-24')),
  state = c('CA','NY'),
  row.names=c('r1','r2')
)
df

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY

2. Select Columns by Index Position using R Base

By using the R base bracket notation df[] you can select columns by index position (column number) from R data frame. The df[] notation takes syntax df[rows,columns], so when using this notation to select columns by index use the columns parameter on the right after the comma.

Similarly, you can also use this notation to select columns by index position in R. All you need to pass is the column index to this df[]. The following example returns values from 2nd column of the data frame.

Note that in R, the index starts from 1.


# R base - select column by index position
df[,2]

# Output
#[1] "sai" "ram"

3. Select Columns by Multiple or List of Index using R Base

The following example returns all data frame columns from the list of index values. Provide the list of values using the vector.


# R base - Select columns by index position
df[,c(2,3)]

# Output
#   name gender
#r1  sai      M
#r2  ram      M

4. Get Columns by Index Range Using R Base

If you want to select columns between two indexes use the range operator (:). The left-hand side of the operator is the starting column index and the right-hand side would be the end column index. The following examples select all columns between 2 and 5 indexes.


# R base - Select columns by range
df[,2:5]

# Output
#   name gender        dob state
#r1  sai      M 1990-10-02    CA
#r2  ram      M 1981-03-24    NY

5. Select Columns by Index Position using dplyr Package

Most of the R syntax takes $ to refer to column name along with data frame object (df$id) and uses [] notation, this syntax is not easy to read, and sometimes R code becomes confusing. dplyr select() function is used to select the columns or variables by index from the data frame by using English verbs. This takes the first argument as the data frame and the second argument as the column name or vector of column names.

Let’s select columns by index position using dplyr Package, the first example from the following selects the specified columns by indexs that are supplied to select() function with a comma separator. The second example selects all columns from the list.


# Load dplyr 
library('dplyr')

# Select column by index position
df %>% select(2,3,5)

# Output
#   name gender state
#r1  sai      M    CA
#r2  ram      M    NY

When we use dplyr package, we mostly use the infix operator %>% from magrittr, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator. For example, x %>% f(y) converted into f(x, y). For more examples of this package refer to R dplyr package tutorial with examples.

6. Get Columns by List of Indexes

The following examples returns a data frame by selecting columns from the list of indexes. In the below example c() is used to create a vector.


# dplyr - Select columns by list of index or position
df %>% select(c(2,3))

# Output
#   name gender
#r1  sai      M
#r2  ram      M

7. Get Columns by Index Range

If you want to select columns between two indexes use the range operator (:). The left-hand side of the operator is the starting column index and the right-hand side would be the end column index. The following examples select all columns between 2 and 4 indexes.


# Select columns by index range
df %>% select(2:4)

# Output
#   name gender        dob
#r1  sai      M 1990-10-02
#r2  ram      M 1981-03-24

8. Conclusion

In this article, you have learned how to select columns by index in R. By using the R base df[] notation or select() function from dplyr package you can select a single column or select multiple columns by index position (column number) from the R Data Frame.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium