How to Remove Column in R?

To remove a single column or multiple columns in R DataFrame use square bracket notation [] or use functions from third-party packages like dplyr. There are several ways to remove columns or variables from the R DataFrame (data.frame).

1. Prepare the Data

Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV files to R DataFrame. Also, refer to Import Excel File into R.


# Create dataframe
df=data.frame(id=c(11,22),
              pages=c(32,45),
              name=c("spark","python"),
              chapters=c(76,86),
              price=c(144,553))

# Display the dataframe
print(df)

# Output
#  id pages   name chapters price
#1 11    32  spark       76   144
#2 22    45 python       86   553

2. Remove Column using R Base Functions

By using R base function subset() or square bracket notation you can remove single or multiple columns by index/name from the R DataFrame.

2.1 Remove Column by Index

First, let’s use the R base bracket notation df[] to remove the column by Index. This notation takes syntax df[, columns] to select columns in R, And to remove columns you have to use the – (negative) operator.

The following example removes the second column by Index from the R DataFrame.


# Remove Columns by Index
df2 <- df[,-2]
df2

# Output
  id   name chapters price
1 11  spark       76   144
2 22 python       86   553

2.2 Remove Columns by Range

This notation also supports selecting columns by the range and using the negative operator to remove columns by range. In the following example, removes all rows between 2 and 4 indexes, which ideally removes columns pages, names, and chapters.


# Remove Columns by Range
df2 <- df[,-2:-4]
df2

# Output
  id price
1 11   144
2 22   553

2.3 Remove Multiple Columns

Use vector to specify the column/vector indexes you want to remove from the R data frame. The following example removes multiple columns with indexes 2 and 3.


# Remove Multiple columns
df2 <- df[,-c(2,3)]
df2

# Output
  id chapters price
1 11       76   144
2 22       86   553

2.4 Remove Columns From List

You can also use the column names from the list to remove them from the R data frame. Here I am using names() function which returns all column names and checks if a name is present in the list using %in% operator.


# Remove  Columns in List
df2 <- df[,!names(df) %in% c("id", "name", "chapters")]

# Output
  pages price
1    32   144
2    45   553

2.5 By using subset() Function

By using the R base function subset() you can remove columns by name from the data frame. This function takes the data frame object as an argument and the columns you wanted to remove.


# Remove using subset
df2 <- subset(df, select = -c(id, name, chapters))

Yields the same output as above.

3. Remove Columns by using dplyr Functions

In this section, I will use functions from the dplyr package to remove columns in R data frame. dplyr is an R package that provides a grammar of data manipulation and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation. In order to use this, you have to install it first using install.packages('dplyr') and load it using library(dplyr).

3.1 Remove Column by Matching

dplyr select() function is used to select the column and by using negation of this to remove columns. All verbs in dplyr package take data.frame as a first argument. When we use dplyr package, we mostly use the infix operator %>% from magrittr, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator.

For example, x %>% f(y) converted into f(x, y) so the result from left-hand side is then “piped” into the right-hand side. This pipe can be used to write multiple operations that you can read left-to-right.


# Load the dplyr package
library("dplyr")

# Remove columns using select()
df2 <- df %>% select(-c(id, name, chapters))

# Output
  pages price
1    32   144
2    45   553

3.2 Remove Variables By Name Range

The same function can also be used to remove variables by name range.


# Remove columns by Range
df2 <-df %>% select(-(id:chapters))
df2

# Output
  price
1   144
2   553

3.3 Remove Variables using contains

Use -contains() to ignore columns that contain text. The following example removes the column chapters as it contains text apt. This function also takes a list of values to check contains.


# Remove columns contains character
df2 <-df %>% select(-contains('apt'))
df2

# Output
 id pages   name price
1 11    32  spark   144
2 22    45 python   553

3.4 Remove Column starts with

Use -starts_with() to ignore columns that start with a text. The following example removes the column chapters as it starts with character c.


# Remove columns starts with
df2 <-df %>% select(-starts_with('c'))
df2

# Output
  id pages   name price
1 11    32  spark   144
2 22    45 python   553

3.5 Remove Column ends with

Similarly, use -ends_with() to remove variables that end with a text, the following examples remove name and price columns as they end with the letter e.


# Remove columns ends with
df2 <-df %>% select(-ends_with('e'))
df2

# Output
#  id pages chapters
#1 11    32       76
#2 22    45       86

3.6 Remove Columns if it exists

Finally, use the one_of() function to check if the column exists and then remove it from the data frame only when exists. If a column is not found, it returns a warning.


df2 <- df %>% 
    select(-one_of("name", "marks"))

1. Complete Example of Remove Columns in R

The following is a complete example of how to remove a single column/variable or several columns/variables from the R DataFrame (data.frame)


# Create dataframe
df=data.frame(id=c(11,22,33,44,55),
              pages=c(32,45,33,22,56),
              name=c("spark","python","R","java","jsp"),
              chapters=c(76,86,11,15,7),
              price=c(144,553,321,567,890))

# Display the dataframe
print(df)

# Remove Columns by Index
df2 <- df[,-2]

# Remove Columns by Range
df2 <- df[,-2:-4]

# Remove Multiple columns
df2 <- df[,-c(2,3)]

# Remove  Columns in List
df2 <- df[,!names(df) %in% c("id", "name", "chapters")]

# Remove using subset
df2 <- subset(df, select = -c(id, name, chapters))

# Load the dplyr package
library("dplyr")

# Remove columns using select()
df2 <- df %>% select(-c(id, name, chapters))

# Remove columns by Range
df2 <- df %>% select(-(id:chapters))

# Remove columns contains character
df2 <- df %>% select(-contains('apt'))

# Remove columns starts with
df2 <- df %>% select(-starts_with('c'))

# Remove columns ends with
df2 <- df %>% select(-ends_with('e'))

# Remove columns using  within()
df2 <- within(df, rm(id, name, chapters))

Conclusion

In this article, you have learned different ways to remove a single column/variable and several columns/variables in the R data frame. The example includes removing columns by name, index, from the list based on conditions e.t.c. Also learned how to use a select() function from the dplyr package.

Reference

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing How to Remove Column in R?