• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:14 mins read
You are currently viewing Pandas Add Column based on Another Column

How to add a column based on another existing column in Pandas DataFrame. You can add/append a new column to the DataFrame based on the values of another column using df.assign(), df.apply(), and, np.where() functions and return a new DataFrame after adding a new column.

Advertisements

In this article, I will explain how to add/append a column to the DataFrame based on the values of another column using multiple functions with well-defined examples.

Related: You can select a column to DataFrame based on condition.

1. Quick Examples of Add Column Based on Another Column

Following are examples of adding a column based on another column.


# Below are the quick examples

# Example 1: Add Column using arithmetic operation
# Based on existing column 
df["Final_Fee"] = df["Fee"] - df["Discount"]

# Example 2: Add New Column using assign()
df = pd.DataFrame(technologies)
df1 = df.assign(Discount_Percent=lambda x: x.Discount / x.Fee * 100)

# Example 3: Add column using np.where()
df['Discount_rating'] = np.where(df['Discount'] > 2000, 'Good', 'Bad')

# Example 4: Add column using apply()
df['Final_fee'] = df.apply(lambda x: x['Fee'] - x['Discount'], axis=1)

# Example 5: Add column to DataFrame using loc[]
df['Without_discount'] = df.loc[:,['Fee', 'Discount']].sum(axis=1)

Now, let’s create a DataFrame Let’s create a Pandas DataFrame with sample data and execute the above examples.


# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Discount':[1000,2300,1000,1200,2500]
          }

df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

pandas add column

2. Pandas Add Column Based on Existing Column

To add a new column based on an existing column in Pandas DataFrame use the df[] notation. We can derive a new column by computing arithmetic operations on existing columns and assign the result as a new column to DataFrame. This process is the fastest and simplest way of creating a new column using another column of DataFrame.


# Add column using arithmetic operation
# Based on existing columns 
df["Final_Fee"] = df["Fee"] - df["Discount"]
print("The DataFrame after adding a new Column:\n", df)

Yields below output.

pandas add column

3. Adding a Column using assign()

In real-time, we are mostly required to add a column to DataFrame by calculating from an existing column, the below example derives the Discount_Percent column from Fee and Discount. Here, I will use lambda to derive a new column from the existing one.


# Add New Column from Existing Column 
df = pd.DataFrame(technologies)
df1 = df.assign(Discount_Percent=lambda x: x.Discount / x.Fee * 100)
print("The DataFrame after adding a new Column:\n", df1)

Yields below output. Similarly, you can also derive multiple columns and add them to a DataFrame in a single statement, I will leave this to you to explore.


# Output:
The DataFrame after adding a new Column:
    Courses    Fee  Discount  Discount_Percent
0    Spark  22000      1000          4.545455
1  PySpark  25000      2300          9.200000
2   Hadoop  23000      1000          4.347826
3   Python  24000      1200          5.000000
4   Pandas  26000      2500          9.615385

4. Pandas Add Column Using NumPy where()

Alternatively, we can create and add a new column in pandas DataFrames based on the values of existing columns using numpy.where() function. In this example, I will add a new column, that I have to create based on the ‘Discount’ column of the given DataFrame.


# Add column using np.where()
df['Discount_rating'] = np.where(df['Discount'] > 2000, 'Good', 'Bad')
print("The DataFrame after adding new Column:\n", df)

Yields below output.


# Output:
The DataFrame after adding a new Column:
   Courses    Fee  Discount Discount_rating
 0    Spark  22000      1000             Bad
 1  PySpark  25000      2300            Good
 2   Hadoop  23000      1000             Bad
 3   Python  24000      1200             Bad
 4   Pandas  26000      2500            Good

5. Add Column Using apply()

Using apply() function we can add a required column to DataFrame by calculating values from an existing column. The below example derives the Final_fee column from Fee and Discount. Here, I will use lambda to derive a new column.


# Add column using apply()
df['Final_fee'] = df.apply(lambda x: x['Fee'] - x['Discount'], axis=1)
print("The DataFrame after adding a new Column:\n", df)

Yields below output.


# Output:
The DataFrame after adding a new Column:
   Courses    Fee  Discount  Final_fee
0    Spark  22000      1000      21000
1  PySpark  25000      2300      22700
2   Hadoop  23000      1000      22000
3   Python  24000      1200      22800
4   Pandas  26000      2500      23500

6. Add Column to DataFrame using loc[]

Moreover, we can add a column to DataFrame based on the values of existing columns using Pandas.DataFrame.loc[] attribute. loc[] is used to select rows and columns by names/labels of pandas DataFrame. Here, I will select multiple columns of DataFrame using the loc[] attribute and then call the sum() function. This syntax will add the new column to DataFrame.


# Add column to DataFrame using loc[]
df['Without_discount'] = df.loc[:,['Fee', 'Discount']].sum(axis=1)
print("The DataFrame after adding a new Column:\n", df)

Yields below output.


# Output:
The DataFrame after adding a new Column:
   Courses    Fee  Discount  Without_discount
0    Spark  22000      1000             23000
1  PySpark  25000      2300             27300
2   Hadoop  23000      1000             24000
3   Python  24000      1200             25200
4   Pandas  26000      2500             28500

Frequently Asked Questions of Add Column based on Another Column

How do I add a new column based on an existing column in pandas?

You can add a new column in Pandas based on the existing column by using df[] notation. You can derive a new column by applying simple arithmetic operations on existing columns and assigning the result to a new column. For example, df['new_column'] = df['existing_column'] * 2.

How can I add a new column conditionally in pandas based on values from another column?

You can add a new column conditionally based on values from another column using df.apply() method with a custom function or using the numpy.where() function.

How do I add a column that combines values from two existing columns in pandas?

You can concatenate or combine values from two existing columns using string concatenation, for example: df['new_column'] = df['column1'] + df['column2'].

What’s the best way to add a column that contains the result of a custom function applied to an existing column in pandas?

You can add a new column by applying a custom function to an existing column using the .apply() method, which allows you to apply a function element-wise to each row.

How can I add a column based on another column that involves date or time operations in pandas?

You can perform various date and time operations when creating a new column based on date or time columns. The Pandas library provides many functions for working with date and time data.

7. Conclusion

In this article, I have explained how to add/append a column to the Pandas DataFrame based on the values of another column using multiple functions and also I explained how to use basic arithmetic operations and other functions.

Happy learning!!

References