To create a new column in a data frame in R based on single/multiple conditions you can use the mutate()
function from the dplyr
package. Data manipulation is a crucial aspect of data analysis, and creating new columns in a DataFrame based on specific conditions is a common task in R.
In this article, I will explain various methods to add new columns to a DataFrame based on single or multiple conditions, using character conditions, implementing custom functions, and adding columns based on row indices.
Key points-
- Create a logical condition using comparison operators (e.g.,
==
,!=
,<
,>
,<=
,>=
) to evaluate against the existing column. - Utilize the logical condition within square brackets to assign the result to a new column, e.g.,
data_frame$new_column <- data_frame$existing_column == value
. - To Implement more complex conditions, you can use
ifelse()
or nestedif-else
structures. - You can use functions from the
dplyr
package, such asmutate()
andcase_when()
, to concise the code with a readable format. - Create custom functions using
apply()
orsapply()
to perform operations on each element of the existing column.
What is DataFrame in R
A data frame in R is a fundamental data structure used for storing and manipulating structured data in the format of rows and columns similar to an RDBMS table or spreadsheet. It is a two-dimensional data structure such that one dimension refers to the row and another to a column. Each column in the data frame is a Vector of the same length, in other words, all columns in the data frame should have the same length.
Let’s create an R DataFrame, run these examples, and explore the output. If you already have data in CSV you can easily import CSV files to R DataFrame. Also, refer to Import Excel File into R.
# Create dataframe
df=data.frame(id=c(11,22),
pages=c(32,45),
name=c("spark","python"))
df
Yields below output.
2. R dplyr Create New Column Based on Condition
If you want to create a new column to the existing DataFrame based on a specified condition, you can use the mutate() function from the dplyr package in R. Before implementing the dplyr
functions we need to install dplyr
package and load it as a library(dplyr)
.
# Create new column using dplyr package
# Install.packages("dplyr")
library(dplyr)
# Create the dataframe
df <- data.frame(id=c(11, 22),
pages=c(32, 45),
name=c("spark", "python"))
# Create a new column based on condition
df <- df %>% mutate(new_column = ifelse(pages > 40, "High Pages", "Low Pages"))
# Print the updated dataframe
print(df)
In this example, a new column named new_column
is added based on the condition that if the value in the pages
column is greater than 40, it is named as High Pages
otherwise, it is named as Low Pages
. You can customize the condition and the names according to your specific requirements. This example yields the below output.
3. Create a New Column Based on the String condition
To create a new column in a data frame based on character conditions, you can use the same method as described above. First, create a data frame and use the pipe operator (%>%
) to load the data frame into the mutate()
function. This will create a new column based on a condition specified by the ifelse statement. Finally, we will get the updated data frame with a new column.
# Create new column based on string condition
df <- df %>% mutate(new_column = ifelse(name == "spark", "Spark Language", "Other Language"))
df
# Output:
# id pages name new_column
# 1 11 32 spark Spark Language
# 2 22 45 python Other Language
This example creates a new column named new_column
based on a conditional statement using the mutate
function from the dplyr
package. The new column indicates whether the language is Spark Language
or Other Language
based on the name
column. Finally, the resulting dataframe is printed.
4. Create a Column Based on Multiple Conditions
Similarly, you can use the mutate()
function from dplyr to create a new column based on multiple conditions. First, create a dataframe and apply a pipe operator(%>%
) to load this data frame into the mutate() function. This function will create a new column based on multiple conditions. These conditions are implemented using case_when()
function. Finally, we will get the updated data frame with a new column.
# Create a new column based on multiple conditions
df <- df %>% mutate(new_column = case_when(
pages > 40 & name == "python" ~ "High Python Pages",
pages <= 40 & name == "spark" ~ "Low Spark Pages",
TRUE ~ "Other"
))
df
# Output:
# id pages name new_column
# 1 11 32 spark Low Spark Pages
# 2 22 45 python High Python Pages
From the above code, the new_column
is created based on two conditions, resulting in different labels. the code creates a dataframe, adds a new column based on specific conditions, and then displays the resulting dataframe. The new column categorizes the data based on the values in the pages
and name
columns.
5. Create a New Column Based on Row Index
Alternatively, you can create a new column based on the row index using the dplyr mutate()
function. Let’s use the mutate() function with the help of the ifelse statement to determine whether each row of the data frame is even or odd.
# Create a new column based on row index
df <- df %>%
mutate(new_column = ifelse(row_number() %% 2 == 0, "Even Row", "Odd Row"))
df
# Output:
id pages name new_column
1 11 32 spark Odd Row
2 22 45 python Even Row
Here, a new column is created indicating whether each row is an even or odd row.
6. Create a New Column with a custom function
So far, we have created a new column of the data frame using the dplyr
package. Now we will see using the custom function and dplyr function to create a new column of data frame based on certain conditions. First, define a custom function and pass the specified column as an argument and it will return the category based on the values of the specified column.
Then use the pipe operator(%>%
) to pipe the given data frame into the mutate() function. This function will create a new column of a data frame with the help of the sapply()
function which is applied to the specified column, using the custom function to determine the category for each value.
# Create a New Column with a custom function
# Define a custom function
get_category <- function(pages) {
if (pages > 40) {
return("High Pages")
} else if (pages <= 40) {
return("Low Pages")
} else {
return("Other")
}
}
# Use the custom function to create the new column
df <- df %>%
mutate(new_column = sapply(pages, get_category))
df
# Output:
# id pages name new_column
# 1 11 32 spark Low Pages
# 2 22 45 python High Pages
In this example, the get_category
function is used to determine the category for each row based on the pages
column.
Conclusion
In this article, I have explained how to create a new column in an R data frame based on single/multiple conditions by using the mutate() function from dplyr packages and other functions. Also explained the creation of a new column by using the custom function with the help of the dplyr function with well-defined examples.
Happy Learning!!
Related Articles
- Convert DataFrame Column to Numeric Type in R
- Drop Dataframe Columns by Name in R
- How to Replace Empty String with NA in R?
- How to Replace Zero (0) with NA on R Dataframe Column?
- How to Replace NA with Empty String in an R DataFrame?
- R – Replace String with Another String or Character.
- R – Replace Values Based on Condition
- How to Replace Values in R with Examples?
- How to Rename Multiple Columns in R?
- R- create DatFrame with column names
- R- split column into multiple columns in a DataFrame
- How to combine columns into one in R?
- How to transpose data frame in R?