How to add a column based on another existing column in Pandas DataFrame. You can add/append a new column to the DataFrame based on the values of another column using df.assign()
, df.apply()
, and, np.where()
functions and return a new DataFrame after adding a new column.
In this article, I will explain how to add/append a column to the DataFrame based on the values of another column using multiple functions with well-defined examples.
Related: You can select a column to DataFrame based on condition.
1. Quick Examples of Add Column Based on Another Column
Following are examples of adding a column based on another column.
# Below are the quick examples
# Example 1: Add Column using arithmetic operation
# Based on existing column
df["Final_Fee"] = df["Fee"] - df["Discount"]
# Example 2: Add New Column using assign()
df = pd.DataFrame(technologies)
df1 = df.assign(Discount_Percent=lambda x: x.Discount / x.Fee * 100)
# Example 3: Add column using np.where()
df['Discount_rating'] = np.where(df['Discount'] > 2000, 'Good', 'Bad')
# Example 4: Add column using apply()
df['Final_fee'] = df.apply(lambda x: x['Fee'] - x['Discount'], axis=1)
# Example 5: Add column to DataFrame using loc[]
df['Without_discount'] = df.loc[:,['Fee', 'Discount']].sum(axis=1)
Now, let’s create a DataFrame Let’s create a Pandas DataFrame with sample data and execute the above examples.
# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fee' :[22000,25000,23000,24000,26000],
'Discount':[1000,2300,1000,1200,2500]
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
2. Pandas Add Column Based on Existing Column
To add a new column based on an existing column in Pandas DataFrame use the df[] notation. We can derive a new column by computing arithmetic operations on existing columns and assign the result as a new column to DataFrame. This process is the fastest and simplest way of creating a new column using another column of DataFrame.
# Add column using arithmetic operation
# Based on existing columns
df["Final_Fee"] = df["Fee"] - df["Discount"]
print("The DataFrame after adding a new Column:\n", df)
Yields below output.
3. Adding a Column using assign()
In real-time, we are mostly required to add a column to DataFrame by calculating from an existing column, the below example derives the Discount_Percent
column from Fee
and Discount
. Here, I will use lambda to derive a new column from the existing one.
# Add New Column from Existing Column
df = pd.DataFrame(technologies)
df1 = df.assign(Discount_Percent=lambda x: x.Discount / x.Fee * 100)
print("The DataFrame after adding a new Column:\n", df1)
Yields below output. Similarly, you can also derive multiple columns and add them to a DataFrame in a single statement, I will leave this to you to explore.
# Output:
The DataFrame after adding a new Column:
Courses Fee Discount Discount_Percent
0 Spark 22000 1000 4.545455
1 PySpark 25000 2300 9.200000
2 Hadoop 23000 1000 4.347826
3 Python 24000 1200 5.000000
4 Pandas 26000 2500 9.615385
4. Pandas Add Column Using NumPy where()
Alternatively, we can create and add a new column in pandas DataFrames based on the values of existing columns using numpy.where() function. In this example, I will add a new column, that I have to create based on the ‘Discount’ column of the given DataFrame.
# Add column using np.where()
df['Discount_rating'] = np.where(df['Discount'] > 2000, 'Good', 'Bad')
print("The DataFrame after adding new Column:\n", df)
Yields below output.
# Output:
The DataFrame after adding a new Column:
Courses Fee Discount Discount_rating
0 Spark 22000 1000 Bad
1 PySpark 25000 2300 Good
2 Hadoop 23000 1000 Bad
3 Python 24000 1200 Bad
4 Pandas 26000 2500 Good
5. Add Column Using apply()
Using apply() function we can add a required column to DataFrame by calculating values from an existing column. The below example derives the Final_fee
column from Fee
and Discount
. Here, I will use lambda to derive a new column.
# Add column using apply()
df['Final_fee'] = df.apply(lambda x: x['Fee'] - x['Discount'], axis=1)
print("The DataFrame after adding a new Column:\n", df)
Yields below output.
# Output:
The DataFrame after adding a new Column:
Courses Fee Discount Final_fee
0 Spark 22000 1000 21000
1 PySpark 25000 2300 22700
2 Hadoop 23000 1000 22000
3 Python 24000 1200 22800
4 Pandas 26000 2500 23500
6. Add Column to DataFrame using loc[]
Moreover, we can add a column to DataFrame based on the values of existing columns using Pandas.DataFrame.loc[] attribute. loc[] is used to select rows and columns by names/labels of pandas DataFrame. Here, I will select multiple columns of DataFrame using the loc[] attribute and then call the sum() function. This syntax will add the new column to DataFrame.
# Add column to DataFrame using loc[]
df['Without_discount'] = df.loc[:,['Fee', 'Discount']].sum(axis=1)
print("The DataFrame after adding a new Column:\n", df)
Yields below output.
# Output:
The DataFrame after adding a new Column:
Courses Fee Discount Without_discount
0 Spark 22000 1000 23000
1 PySpark 25000 2300 27300
2 Hadoop 23000 1000 24000
3 Python 24000 1200 25200
4 Pandas 26000 2500 28500
Frequently Asked Questions of Add Column based on Another Column
You can add a new column in Pandas based on the existing column by using df[] notation. You can derive a new column by applying simple arithmetic operations on existing columns and assigning the result to a new column. For example, df['new_column'] = df['existing_column'] * 2
.
You can add a new column conditionally based on values from another column using df.apply()
method with a custom function or using the numpy.where()
function.
You can concatenate or combine values from two existing columns using string concatenation, for example: df['new_column'] = df['column1'] + df['column2']
.
You can add a new column by applying a custom function to an existing column using the .apply()
method, which allows you to apply a function element-wise to each row.
You can perform various date and time operations when creating a new column based on date or time columns. The Pandas library provides many functions for working with date and time data.
7. Conclusion
In this article, I have explained how to add/append a column to the Pandas DataFrame based on the values of another column using multiple functions and also I explained how to use basic arithmetic operations and other functions.
Happy learning!!