Pandas Add Column based on Another Column

How to add a column based on another existing column in Pandas DataFrame. You can add/append a new column to the DataFrame based on the values of another column using df.assign(), df.apply(), and, np.where() functions and return a new Dataframe after adding a new column.

In this article, I will explain how to add/append a column to the DataFrame based on the values of another column using multiple functions with well-defined examples.

1. Quick Examples of Add Column Based on Another Column

Following are examples of adding a column based on another column.


# Below are the quick examples

# Example 1: Add Column using arithmetic operation
# Based on existing column 
df["Final_Fee"] = df["Fee"] - df["Discount"]

# Example 2: Add New Column using assign()
df = pd.DataFrame(technologies)
df1 = df.assign(Discount_Percent=lambda x: x.Fee * x.Discount / 100)

# Example 3: Add column using np.where()
df['Discount_rating'] = np.where(df['Discount'] > 2000, 'Good', 'Bad')

# Example 4: Add column using apply()
df['Final_fee'] = df.apply(lambda x: x['Fee'] - x['Discount'], axis=1)

# Example 5: Add column to DataFrame using loc[]
df['Without_discount'] = df.loc[:,['Fee', 'Discount']].sum(axis=1)

Now, let’s create a DataFrame Let’s create a Pandas DataFrame with sample data and execute the above examples.


# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Discount':[1000,2300,1000,1200,2500]
          }

df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses    Fee  Discount
0    Spark  22000      1000
1  PySpark  25000      2300
2   Hadoop  23000      1000
3   Python  24000      1200
4   Pandas  26000      2500

2. Pandas Add Column Based on Existing Column

To add a new column based on an existing column in Pandas DataFrame use the df[] notation. We can derive a new column by computing arithmetic operations on existing columns and assign the result as a new column to DataFrame. This process is the fastest and simplest way of creating a new column using another column of DataFrame.


# Add column using arithmetic operation
# Based on existing columns 
df["Final_Fee"] = df["Fee"] - df["Discount"]
print(df)

Yields below output.


# Output:
   Courses    Fee  Discount  Final_Fee
 0    Spark  22000      1000      21000
 1  PySpark  25000      2300      22700
 2   Hadoop  23000      1000      22000
 3   Python  24000      1200      22800
 4   Pandas  26000      2500      23500

3. Adding a Column using assign()

In real-time, we are mostly required to add a column by calculating from an existing column, the below example derives the Discount_Percent column from Fee and Discount. Here, I will use lambda to derive a new column from the existing one.


# Add New Column from Existing Column
df = pd.DataFrame(technologies)
df1 = df.assign(Discount_Percent=lambda x: x.Fee * x.Discount / 100)
print(df1)

Yields below output. Similarly, you can also derive multiple columns and add them to a DataFrame in a single statement, I will leave this to you to explore.


# Output:
   Courses    Fee  Discount  Discount_Percent
0    Spark  22000      1000          220000.0
1  PySpark  25000      2300          575000.0
2   Hadoop  23000      1000          230000.0
3   Python  24000      1200          288000.0
4   Pandas  26000      2500          650000.0

4. Pandas Add Column Using NumPy where()

Alternatively, we can create and add a new column in pandas DataFrames based on the values of existing columns using numpy.where() function. In this example, I will add a new column, that I have to create based on the ‘Discount’ column of the given DataFrame.


# Add column using np.where()
df['Discount_rating'] = np.where(df['Discount'] > 2000, 'Good', 'Bad')
print(df)

Yields below output.


# Output:
   Courses    Fee  Discount Discount_rating
 0    Spark  22000      1000             Bad
 1  PySpark  25000      2300            Good
 2   Hadoop  23000      1000             Bad
 3   Python  24000      1200             Bad
 4   Pandas  26000      2500            Good

5. Add Column Using apply()

Using apply() function we can add a required column to DataFrame by calculating values from an existing column. The below example derives the Final_fee column from Fee and Discount. Here, I will use lambda to derive a new column.


# Add column using apply()
df['Final_fee'] = df.apply(lambda x: x['Fee'] - x['Discount'], axis=1)
print(df)

Yields below output.


# Output:
   Courses    Fee  Discount  Final_fee
0    Spark  22000      1000      21000
1  PySpark  25000      2300      22700
2   Hadoop  23000      1000      22000
3   Python  24000      1200      22800
4   Pandas  26000      2500      23500

6. Add Column to DataFrame using loc[]

Moreover, we can add a column to DataFrame based on the values of existing columns using Pandas.DataFrame.loc[] attribute. loc[] is used to select rows and columns by names/labels of pandas DataFrame. Here, I will select multiple columns of DataFrame using the loc[] attribute and then call the sum() function. This syntax will add the new column to DataFrame.


# Add column to DataFrame using loc[]
df['Without_discount'] = df.loc[:,['Fee', 'Discount']].sum(axis=1)
print(df)

Yields below output.


# Output:
   Courses    Fee  Discount  Without_discount
0    Spark  22000      1000             23000
1  PySpark  25000      2300             27300
2   Hadoop  23000      1000             24000
3   Python  24000      1200             25200
4   Pandas  26000      2500             28500

7. Conclusion

In this article, I have explained how to add/append a column to the Pandas DataFrame based on the values of another column using multiple functions and also I explained how to use basic arithmetic operations and other functions

Happy learning!!

References

Vijetha

With 5 of experience in technical writing, I have had the privilege to work with a diverse range of technologies like Python, Pandas, NumPy and R. During this time, I have consistently demonstrated my ability to grasp intricate technical details and transform them into comprehensible materials.

Leave a Reply

You are currently viewing Pandas Add Column based on Another Column