How to add a column based on another existing column in Pandas DataFrame. You can add/append a new column to the DataFrame based on the values of another column using df.assign()
, df.apply()
, and, np.where()
functions and return a new Dataframe after adding a new column.
In this article, I will explain how to add/append a column to the DataFrame based on the values of another column using multiple functions with well-defined examples.
1. Quick Examples of Add Column Based on Another Column
Following are examples of adding a column based on another column.
# Below are the quick examples
# Example 1: Add Column using arithmetic operation
# based on existing column
df["Final_Fee"] = df["Fee"] - df["Discount"]
# Example 2: Add New Column using assign()
df = pd.DataFrame(technologies)
df1 = df.assign(Discount_Percent=lambda x: x.Fee * x.Discount / 100)
# Example 3: Add column using np.where()
df['Discount_rating'] = np.where(df['Discount'] > 2000, 'Good', 'Bad')
# Example 4: Add column using apply()
df['Final_fee'] = df.apply(lambda x: x['Fee'] - x['Discount'], axis=1)
# Example 5: Add column to DataFrame using loc[]
df['Without_discount'] = df.loc[:,['Fee', 'Discount']].sum(axis=1)
Now, let’s create a DataFrame Let’s create a Pandas DataFrame with sample data and execute the above examples.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fee' :[22000,25000,23000,24000,26000],
'Discount':[1000,2300,1000,1200,2500]
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
# Output:
Courses Fee Discount
0 Spark 22000 1000
1 PySpark 25000 2300
2 Hadoop 23000 1000
3 Python 24000 1200
4 Pandas 26000 2500
2. Pandas Add Column Based on Existing Column
To add a new column based on an existing column in Pandas DataFrame use the df[] notation. We can derive a new column by computing arithmetic operations on existing columns and assign the result as a new column to DataFrame. This process is the fastest and simplest way of creating a new column using another column of DataFrame.
# Add column using arithmetic operation
# based on existing columns
df["Final_Fee"] = df["Fee"] - df["Discount"]
print(df)
Yields below output.
# Output:
Courses Fee Discount Final_Fee
0 Spark 22000 1000 21000
1 PySpark 25000 2300 22700
2 Hadoop 23000 1000 22000
3 Python 24000 1200 22800
4 Pandas 26000 2500 23500
3. Adding a Column using assign()
In real-time, we are mostly required to add a column by calculating from an existing column, the below example derives the Discount_Percent
column from Fee
and Discount
. Here, I will use lambda to derive a new column from the existing one.
# Add New Column from Existing Column
df = pd.DataFrame(technologies)
df1 = df.assign(Discount_Percent=lambda x: x.Fee * x.Discount / 100)
print(df1)
Yields below output. Similarly, you can also derive multiple columns and add them to a DataFrame in a single statement, I will leave this to you to explore.
# Output:
Courses Fee Discount Discount_Percent
0 Spark 22000 1000 220000.0
1 PySpark 25000 2300 575000.0
2 Hadoop 23000 1000 230000.0
3 Python 24000 1200 288000.0
4 Pandas 26000 2500 650000.0
4. Pandas Add Column Using NumPy where()
Alternatively, we can create and add a new column in pandas DataFrames based on the values of existing columns using numpy.where() function. In this example, I will add a new column, that I have to create based on the ‘Discount’ column of the given DataFrame.
# Add column using np.where()
df['Discount_rating'] = np.where(df['Discount'] > 2000, 'Good', 'Bad')
print(df)
Yields below output.
# Output:
Courses Fee Discount Discount_rating
0 Spark 22000 1000 Bad
1 PySpark 25000 2300 Good
2 Hadoop 23000 1000 Bad
3 Python 24000 1200 Bad
4 Pandas 26000 2500 Good
5. Add Column Using apply()
Using apply() function we can add a required column to DataFrame by calculating values from an existing column. The below example derives the Final_fee
column from Fee
and Discount
. Here, I will use lambda to derive a new column.
# Add column using apply()
df['Final_fee'] = df.apply(lambda x: x['Fee'] - x['Discount'], axis=1)
print(df)
Yields below output.
# Output:
Courses Fee Discount Final_fee
0 Spark 22000 1000 21000
1 PySpark 25000 2300 22700
2 Hadoop 23000 1000 22000
3 Python 24000 1200 22800
4 Pandas 26000 2500 23500
6. Add Column to DataFrame using loc[]
Moreover, we can add a column to DataFrame based on the values of existing columns using Pandas.DataFrame.loc[] attribute. loc[] is used to select rows and columns by names/labels of pandas DataFrame. Here, I will select multiple columns of DataFrame using the loc[] attribute and then call the sum() function. This syntax will add the new column to DataFrame.
# Add column to DataFrame using loc[]
df['Without_discount'] = df.loc[:,['Fee', 'Discount']].sum(axis=1)
print(df)
Yields below output.
# Output:
Courses Fee Discount Without_discount
0 Spark 22000 1000 23000
1 PySpark 25000 2300 27300
2 Hadoop 23000 1000 24000
3 Python 24000 1200 25200
4 Pandas 26000 2500 28500
7. Conclusion
In this article, I have explained how to add/append a column to the Pandas DataFrame based on the values of another column using multiple functions and also I explained how to use basic arithmetic operations and other functions
Happy learning!!