You can create a conditional column in pandas DataFrame by using np.where()
, np.select()
, DataFrame.map()
, DataFrame.assign()
, DataFrame.apply()
, DataFrame.loc[]
. Additionally, you can also use mask()
method transform()
and lambda
functions to create single and multiple functions. In this article, I will explain several ways of how to create a conditional DataFrame column (new) with examples.
Adding a new column by conditionally checking values on existing columns is required when you would need to curate the DataFrame or derive a new column from the existing columns.
Key Points –
- Use boolean conditions to create a new column based on specific criteria within a Pandas DataFrame.
- Utilize the DataFrame syntax, such as
df['new_column']
, to assign values to the newly created column. - Use NumPy’s
np.where
function as a concise way to apply conditional logic and assign values to the new column. - Use chained assignments with
.loc
to set values for a new column based on specific conditions, ensuring modifications are made to the original DataFrame. - Leverage the
apply
function along with a lambda function to apply conditional logic to each row of the DataFrame. - Handle multiple conditions by combining them with logical operators like
&
(AND),|
(OR), and~
(NOT) to create complex conditional expressions for the new column.
1. Quick Examples of Pandas Create Conditional DataFrame Column
If you are in a hurry, below are some quick examples of creating conditional pandas DataFrame columns.
# Below are some quick examples.
# Create conditional DataFrame column by np.where() function.
df['Discount'] = np.where(df['Courses']=='Spark', 1000, 2000)
# Another way to create column conditionally.
df['Discount'] = [1000 if x == 'Spark' else 2000 for x in df['Courses']]
# Create conditional DataFrame column by map() and lambda.
df['Discount'] = df.Courses.map( lambda x: 1000 if x == 'Spark' else 2000)
# Create conditional DataFrame column by np.select() function.
conditions = [
(df['Courses'] == 'Spark') & (df['Duration'] == '30days'),
(df['Courses'] == 'Spark') & (df['Duration'] == '35days'),
(df['Duration'] == '50days')]
choices = [1000, 1050,200]
df['Discount'] = np.select(conditions,choices, default=0)
# Using Dictionary to map new values.
Discount_dictionary ={'Spark' : 1500, 'PySpark' : 800, 'Spark' : 1200}
df['Discount'] = df['Courses'].map(Discount_dictionary)
# Pandas create conditional DataFrame column by dictionary
df['Discount'] = [Discount_dictionary.get(v, None) for v in df['Courses']]
# Using DataFrame.assign() method.
def Courses_Discount(row):
if row["Courses"] == "Spark":
return 1000
else:
return 2000
df = df.assign(Discount=df.apply(Courses_Discount, axis=1))
# Conditions with multiple rand multiple columns.
def Courses_Discount(row):
if row["Courses"] == "Spark":
return 1000
elif row["Fee"] == 25000:
return 2000
else:
return 0
df = df.assign(Discount=df.apply(Courses_Discount, axis=1))
# Using .loc[] property for single condition.
df.loc[(df['Courses']=="Spark"), 'Discount'] = 1000
# Using loc[] method for Multiple conditions.
df.loc[(df['Courses']=="Spark")&(df['Fee']==23000)|(df['Fee']==25000), 'Discount'] = 1000
# Using DataFrame.apply() method with lambda function.
df['Discount'] = df['Courses'].apply(lambda x: '1000' if x=='Spark' else 1000)
# Pandas create conditional column using mask() method.
# Replace values where the condition is True
df['Discount'] = df['Discount'].mask(df['Courses']=='Spark', other=1000)
# Using where()
df['Discount'] = df['Discount'].where(df['Courses']=='Spark', other=1000)
# Using transform() with a lambda function.
df['Discount'] = df['Courses'].transform(lambda x: 1000 if x == 'Spark' else 2000)
Let’s create a pandas DataFrame with a few rows and columns and execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
and Duration
.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Spark","Python","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days', None,np.nan]
}
df = pd.DataFrame(technologies)
Yields below output.
# Output:
Courses Fee Duration
0 Spark 22000 30days
1 PySpark 25000 50days
2 Spark 23000 30days
3 Python 24000 None
4 PySpark 26000 NaN
2. Create Conditional DataFrame Column by np.where() function
To create a conditional DataFrame column in pandas use np.where()
function. Although pandas also have the similar where()
function, Numpy.where()
is very different. The difference being, where()
function of Numpy
provides greater flexibility and it treats the given condition differently from Pandas.
pandas where()
function only allows for updating the values that do not meet the given condition. However, the where
function of Numpy
allows for updating values that meet and do not meet the given condition.
import numpy as np
# Create conditional DataFrame column by np.where() function.
df['Discount'] = np.where(df['Courses']=='Spark', 1000, 2000)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1000
1 PySpark 25000 50days 2000
2 Spark 23000 35days 1000
3 Python 24000 None 2000
4 Spark 26000 NaN 1000
Another way to create a column conditionally.
# Another way to create another column conditionally.
df['Discount'] = [1000 if x == 'Spark' else 2000 for x in df['Courses']]
print(df)
Yields the same output as above.
Similarly, you can also create by using Series.map()
and lambda
. The lambda
functions are defined using the keyword lambda
. They can have any number of arguments but only one expression. These are very helpful when we have to perform small tasks with less code.
# Create conditional DataFrame column by lambda.
df['Discount'] = df.Courses.map( lambda x: 1000 if x == 'Spark' else 2000)
print(df)
Yields the same output as above.
3. Create Conditional DataFrame Column by numpy.select() function
You can create a conditional DataFrame column by checking multiple columns using numpy.select()
function. The select()
function is more capable than the previous methods. We can use it to give a set of conditions and a set of values. Thus, we are able to assign a specific value for each condition.
When no condition matches, it assigns the default value to the new column.
# Create conditional DataFrame column by np.select function.
conditions = [
(df['Courses'] == 'Spark') & (df['Duration'] == '30days'),
(df['Courses'] == 'Spark') & (df['Duration'] == '35days'),
(df['Duration'] == '50days')]
choices = [1000, 1050,200]
df['Discount'] = np.select(conditions,choices, default=0)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1000
1 PySpark 25000 50days 200
2 Spark 23000 35days 1050
3 Python 24000 None 0
4 Spark 26000 NaN 0
4. Using DataFrame.map() Function
We can use DataFrame.map()
function to achieve the same goal. It is a straight forward method where we use a dictionary to simply map
values to the newly added column based on the key. Map
values of Series
according to input correspondence. It is used for substituting each
value
in a Series with another value
.
# Using Dictionary to map new values.
Discount_dictionary ={'Spark' : 1500, 'PySpark' : 800, 'Spark' : 1200}
df['Discount'] = df['Courses'].map(Discount_dictionary)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1200
1 PySpark 25000 50days 800
2 Spark 23000 35days 1200
3 Python 24000 None NaN
4 Spark 26000 NaN 1200
5. Using Dict to Create Conditional DataFrame Column
Another method to create pandas conditional DataFrame column is by creating a Dict with key-value pair. dict.get
. The get()
method returns the value of the item with the specified key. However, if the key
is not found when you use dict[key]
it assigns NaN
.
# Pandas create conditional DataFrame column by dict.get
df['Discount'] = [Discount_dictionary.get(v, None) for v in df['Courses']]
print(df)
Yields the same output as above.
6. Using Series.apply() Function
We can use Series.apply()
function, when we have more than two values, in that case, we can use a dictionary to map new values onto the keys. It provides a lot of flexibility when we are having a larger number of categories for which we want to assign different values to the newly added column.
# Using DataFrame.apply() function.
def map_values(row, discount_dict):
return discount_dict[row]
discount_dict = {'Spark': 1000, 'PySpark': 2000, 'Python': 0}
df['Discount'] = df['Courses'].apply(map_values, args = (discount_dict,))
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1000
1 PySpark 25000 50days 2000
2 Spark 23000 30days 1000
3 Python 24000 None 0
4 PySpark 26000 NaN 2000
7. Using DataFrame.assign() Method
The DataFrame.assign(
) function is used to assign new columns to a DataFrame
. Returns a new object with all original columns in addition to new ones. Note that all the above examples create a new column on the existing DataFrame, this example creates a new DataFrame with the new column.
# Using DataFrame.assign() method.
def Courses_Discount(row):
if row["Courses"] == "Spark":
return 1000
else:
return 2000
df2 = df.assign(Discount=df.apply(Courses_Discount, axis=1))
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1000
1 PySpark 25000 50days 2000
2 Spark 23000 35days 1000
3 Python 24000 None 2000
4 Spark 26000 NaN 1000
8. Using Multiple Columns by Using DataFrame.assign() Method
If you need to check multiple columns to create a new column use DataFrame.assign()
function, you can see below example-
# Multiple discounts and multiple columns.
def Courses_Discount(row):
if row["Courses"] == "Spark":
return 1000
elif row["Fee"] == 25000:
return 2000
else:
return 0
df = df.assign(Discount=df.apply(Courses_Discount, axis=1))
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1000
1 PySpark 25000 50days 2000
2 Spark 23000 35days 1000
3 Python 24000 None 0
4 Spark 26000 NaN 1000
9. Using DataFrame.loc[] Method
The loc[]
property is used to access a group of rows and columns by label(s) or a boolean
array. The loc[]
is primarily label-based, but may also be used with a boolean array. You can apply a loc[]
property for a single condition-
# Using .loc() property for single condition.
df.loc[(df['Courses']=="Spark"), 'Discount'] = 1000
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1000.0
1 PySpark 25000 50days NaN
2 Spark 23000 35days 1000.0
3 Python 24000 None NaN
4 Spark 26000 NaN 1000.0
NOTE: Alternatively, to apply loc()
property for
multiple conditions and create a new column in pandas DataFrame. For example
# Using loc() method for Multiple conditions.
df.loc[(df['Courses']=="Spark")&(df['Fee']==23000)|(df['Fee']==25000), 'Discount'] = 1000
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1000.0
1 PySpark 25000 50days 1000.0
2 Spark 23000 35days 1000.0
3 Python 24000 None NaN
4 Spark 26000 NaN 1000.0
10. Using DataFrame.apply() method with lambda Function
You can also create conditional DataFrame column by using DataFrame.apply()
method with lambda
function. The apply
function along an axis of the DataFrame
. The lambda
functions are defined using the keyword lambda
.
# Using DataFrame.apply method with lambda function.
df['Discount'] = df['Courses'].apply(lambda x: '1000' if x=='Spark' else 1000)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1000
1 PySpark 25000 50days 1000
2 Spark 23000 30days 1000
3 Python 24000 None 1000
4 PySpark 26000 NaN 1000
11. Pandas Create Conditional Column Using Mask() Methods
Let’s see by using mask()
method. The mask()
method is used to replace values where the condition is True
.
# Pandas create conditional column using mask() method.
# Replace values where the condition is True
df['Discount'] = df['Discount'].mask(df['Courses']=='Spark', other=1000)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1000
1 PySpark 25000 50days 1000
2 Spark 23000 30days 1000
3 Python 24000 None 1000
4 PySpark 26000 NaN 1000
NOTE: You can replace values where the condition is false
by Series.where()
method. The where()
method is used to check a DataFrame
for one or more conditions and return the result accordingly.
# Replace values where the condition is False.
df['Discount'] = df['Discount'].where(df['Courses']=='Spark', other=1000)
print(df)
Yields the same output as above.
12. Using transform() with lambda function
Finally, you can use the method transform()
with a lambda
function. The transform()
function returns a self-produced DataFrame with transformed values after applying the function specified in its parameter.
# Using transform with a lambda function.
df['Discount'] = df['Courses'].transform(lambda x: 1000 if x == 'Spark' else 2000)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
0 Spark 22000 30days 1000
1 PySpark 25000 50days 2000
2 Spark 23000 35days 1000
3 Python 24000 None 2000
4 Spark 26000 NaN 1000
Frequently Asked Questions on Pandas Create Conditional Column
To create a conditional column in Pandas, you can use boolean conditions, apply functions, or np.where to define the conditions and assign values to the new column.
You can use logical operators like &
(AND), |
(OR), and ~
(NOT) to combine multiple conditions and create complex expressions for the conditional column.
Boolean indexing involves using boolean conditions to filter rows in a DataFrame. By applying boolean indexing, you can selectively assign values to a new column based on specific conditions.
Using chained assignments with .loc
allows you to modify the original DataFrame when creating a conditional column. This ensures that changes are made directly to the DataFrame rather than a copy.
Besides np.where
, you can use other methods such as boolean indexing, apply functions with lambda expressions, or chained assignments with .loc
to achieve the same goal of creating conditional columns. Choose the method that best fits your specific use case.
Conclusion
In this article, you have learned how Pandas create DataFrame conditional column by using np.where()
, np.select()
, DataFrame.apply()
, DataFrame.assign()
, DataFrame.map()
, loc[]
, mask()
method, transform()
and lambda
functions to create single and multiple functions.
Related Articles
- Pandas Rename Column with Examples
- How to Merge Series into Pandas DataFrame
- Create DataFrame From Multiple Series in Pandas
- Pandas Operator Chaining to Filter DataFrame Rows
- Drop Infinite Values From Pandas DataFrame
- Pandas Create DataFrame From Dict (Dictionary)
- Pandas Create DataFrame From List
- How to Get Size of Pandas DataFrame?
- How to use Pandas unstack() Function
- How to Union Pandas DataFrames using Concat?
- Pandas Create New DataFrame By Selecting Specific Columns
- Pandas Create Test and Train Samples from DataFrame