You are currently viewing R Delete Multiple Columns from DataFrame

To delete multiple columns from a data frame in R, you can use the df[] notation, subset() function, and select() function from the dplyr package. In this article, I will explain deleting or removing multiple columns by column names, by the list from the data frame.

Advertisements

Key points-

  • Use square brackets with the negative operator to remove columns by index or by name. This is a basic and flexible method.
  • You can specify indices with a negative operator (e.g., -2 to remove the second column) to remove one or more columns from a data frame.
  • Use the names() function with a negation operator to remove columns by their names.
  • To remove a range of columns, you can use a sequence of indices and apply the negative operator.
  • The subset() function allows you to select or remove specific columns by name. To remove, use select = -c(column_names).
  • The select() function from the dplyr package can remove columns by using a minus sign (-) followed by the column names.
  • To remove a dynamic list of columns, you can use a negation with %in% to exclude those in a predefined list.

Quick Examples

Following are quick examples of how to delete multiple columns from a data frame.


# Remove Columns by Range
df[,-2:-4]

# Remove multiple Columns from List
df[,!names(df) %in% c("id", "name", "chapters")]

# Remove using subset
subset(df, select = -c(id, name, chapters))

# Remove columns using select() from dplyr
df %>% select(-c(id, name, chapters))

Let’s create the R DataFrame from Vectors.


# Create data frame
df=data.frame(id=c(11,22),
              pages=c(32,45),
              name=c("spark","python"),
              chapters=c(76,86),
              price=c(144,553))

# Display the data frame
print(df)

Yields below output.

r delete multiple columns

R df[] to Delete Multiple Columns

To remove multiple columns in R, you can use square bracket notation df[]. The typical syntax to select specific columns is df[, columns]. To remove columns, use the negative operator (-) before the column numbers.

You can also use this method to select a range of columns and then exclude them by applying the negative operator to specify which columns to remove.


# Remove Columns by Range
df2 <- df[,-2:-4]
df2

# Output
#   id price
# 1 11   144
# 2 22   553

From the above example, it removes all columns from index 2 to 4, effectively deleting the pages, names, and chapters columns.

R Delete Multiple Columns by Name

The above example explains how to delete multiple columns by index, now let’s see how to remove multiple columns by name in R by using the same df[] notation.


# Remove  Columns in List
df2 <- df[,!names(df) %in% c("id", "name", "chapters")]

# Output
#  pages price
# 1    32   144
# 2    45   553

Using subset()

Alternatively, you can also use the subset() function from the base package to delete multiple columns by specifying a list of column names to be removed. This function requires a data frame object and a list of columns you want to delete as arguments.


# Remove using subset
df2 <- subset(df, select = -c(id, name, chapters))

Similar to the above example, this will delete the columns named “id“, “name” and “chapters” from the data frame and leave the columns “pages” and “price“.

select() to Delete Multiple Columns

The select() function from the dplyr package can be used to delete multiple columns from a data frame in R. The select() function takes a minus sign (-) before the column name to specify that the column should be removed. You can specify as many column names as you want in this way to delete them.


# Load the dplyr package
library("dplyr")

# Remove columns using select()
df2 <- df %>% select(-c(id, name, chapters))

This also yields the same output as above.

Conclusion

In this article, you have learned how to delete multiple columns by name, index, and names from a list by using df[] notation, a subset(), and select() from the dplyr package.

Related Articles

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium