In pandas you can add/append a new column to the existing DataFrame using DataFrame.insert()
method, this method updates the existing DataFrame with a new column. DataFrame.assign()
is also used to insert a new column however, this method returns a new Dataframe after adding a new column.
In this article, I will cover examples of how to add/append multiple columns, add a constant value, deriving new columns from an existing column to the Pandas DataFrame.
1. Quick Examples of Add Column to DataFrame
# Below are quick examples
# Add new column to the DataFrame
tutors = ['William', 'Henry', 'Michael', 'John', 'Messi']
df2 = df.assign(TutorsAssigned=tutors)
# Add a multiple columns to the DataFrame
MNCCompanies = ['TATA','HCL','Infosys','Google','Amazon']
df2 =df.assign(MNCComp = MNCCompanies,TutorsAssigned=tutors )
# Derive New Column from Existing Column
df = pd.DataFrame(technologies)
df2=df.assign(Discount_Percent=lambda x: x.Fee * x.Discount / 100)
# Add a constant or empty value to the DataFrame.
df = pd.DataFrame(technologies)
df2=df.assign(A=None,B=0,C="")
# Add New column to the existing DataFrame
df = pd.DataFrame(technologies)
df["MNCCompanies"] = MNCCompanies
# Add new column at the specific position
df = pd.DataFrame(technologies)
df.insert(0,'Tutors', tutors )
# Add new column by mapping to the existing column
df = pd.DataFrame(technologies)
tutors = {"Spark":"William", "PySpark":"Henry", "Hadoop":"Michael","Python":"John", "pandas":"Messi"}
df['Tutors'] = df['Courses'].map(tutors)
print(df)
Let’s create a Pandas DataFrame with sample data and execute the above examples.
# Create DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fee' :[22000,25000,23000,24000,26000],
'Discount':[1000,2300,1000,1200,2500]
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
# Output:
Courses Fee Discount
0 Spark 22000 1000
1 PySpark 25000 2300
2 Hadoop 23000 1000
3 Python 24000 1200
4 Pandas 26000 2500
2. Pandas Add Column to DataFrame
DataFrame.assign()
is used to add/append a column to the Pandas DataFrame, this method returns a new DataFrame after adding a column to the existing DataFrame.
Below is the syntax of the assign() method.
# Syntax of DataFrame.assign()
DataFrame.assign(**kwargs)
Now let’s add a column ‘TutorsAssigned
” to the DataFrame. Using assign()
we cannot modify the existing DataFrame in-place instead it returns a new DataFrame after adding a column. The below example adds a list of values as a new column to the DataFrame.
# Add new column to the DataFrame
tutors = ['William', 'Henry', 'Michael', 'John', 'Messi']
df2 = df.assign(TutorsAssigned=tutors)
print(df2)
Yields below output.
# Output:
Courses Fee Discount TutorsAssigned
0 Spark 22000 1000 William
1 PySpark 25000 2300 Henry
2 Hadoop 23000 1000 Michael
3 Python 24000 1200 John
4 Pandas 26000 2500 Messi
3. Add Multiple Columns to the DataFrame
You can also use assign()
method to add multiple columns to the pandas DataFrame
# Add multiple columns to the DataFrame
MNCCompanies = ['TATA','HCL','Infosys','Google','Amazon']
df2 = df.assign(MNCComp = MNCCompanies,TutorsAssigned=tutors )
4. Adding a Column From Existing
In real-time, we are mostly required to add a column by calculating from an existing column. The below example derives Discount_Percent
column from Fee
and Discount
. Here, I will use lambda to derive a new column from the existing one.
# Derive New Column from Existing Column
df = pd.DataFrame(technologies)
df2 = df.assign(Discount_Percent=lambda x: x.Fee * x.Discount / 100)
print(df2)
Yields below output. Similarly, you can also derive multiple columns and add them to a DataFrame in a single statement, I will leave this to you to explore.
# Output:
Courses Fee Discount Discount_Percent
0 Spark 22000 1000 220000.0
1 PySpark 25000 2300 575000.0
2 Hadoop 23000 1000 230000.0
3 Python 24000 1200 288000.0
4 Pandas 26000 2500 650000.0
5. Add a Constant or Empty Column
The below example adds 3 new columns to the DataFrame, one column with all None values, a second column with 0 value, and the third column with an empty string value.
# Add a constant or empty value to the DataFrame.
df = pd.DataFrame(technologies)
df2=df.assign(A=None,B=0,C="")
print(df2)
6. Append Column to Existing Pandas DataFrame
The above examples create a new DataFrame after adding new columns instead of appending a column to an existing DataFrame. The example explained in this section is used to append a new column to the existing DataFrame.
# Add New column to the existing DataFrame
df = pd.DataFrame(technologies)
df["MNCCompanies"] = MNCCompanies
print(df)
Yields below output.
# Output:
Courses Fee Discount MNCCompanies
0 Spark 22000 1000 TATA
1 PySpark 25000 2300 HCL
2 Hadoop 23000 1000 Infosys
3 Python 24000 1200 Google
4 Pandas 26000 2500 Amazon
You can also use this approach to add a new column by deriving from an existing column,
# Derive a new column from existing column
df['Discount_Percent'] = df['Fee'] * df['Discount'] / 100
7. Add Column to Specific Position of DataFrame
DataFrame.insert()
method is used to add DataFrame at any position of the existing DataFrame. In most of the above examples you have seen inserts at the end of the DataFrame but this method gives the flexibility to add it at the beginning, in the middle, or at any column index of the DataFrame.
This example adds a Tutors
column at the beginning of the DataFrame.
#
# Add new column at the specific position
df = pd.DataFrame(technologies)
df.insert(0,'Tutors', tutors )
print(df)
Yields below output.
# Output:
Tutors Courses Fee Discount
0 William Spark 22000 1000
1 Henry PySpark 25000 2300
2 Michael Hadoop 23000 1000
3 John Python 24000 1200
4 Messi Pandas 26000 2500
8. Add Column From Dictionary Mapping
If you wanted to add a column with specific values for each row based on an existing value, you can do this using a Dictionary. Here, The values from the dictionary will be added as Tutors
column in df, by matching the key value with the column 'Courses'
.
# Add new column by mapping to the existing column
df = pd.DataFrame(technologies)
tutors = {"Spark":"William", "PySpark":"Henry", "Hadoop":"Michael","Python":"John", "pandas":"Messi"}
df['Tutors'] = df['Courses'].map(tutors)
print(df)
Yields below output. Note that it is unable to map pandas as the key in the dictionary is not exactly matched with the value in the Courses column (case sensitive).
# Output:
Courses Fee Discount Tutors
0 Spark 22000 1000 William
1 PySpark 25000 2300 Henry
2 Hadoop 23000 1000 Michael
3 Python 24000 1200 John
4 Pandas 26000 2500 NaN
9. Using loc[] Add Column
Using pandas loc[] you can access rows and columns by labels or names however, you can also use this for adding a new columns to pandas DataFrame. This loc[] property uses the first argument as rows and second argument for columns hence, I will use the second argument to add a new column.
# Assign the column to the DataFrame
df = pd.DataFrame(technologies)
tutors = ['William', 'Henry', 'Michael', 'John', 'Messi']
df.loc[:, 'Tutors'] = tutors
print(df)
Output same as the above.
Conclusion
In this article, I have explained you can add/append a column to the existing DataFrame by using DataFrame.assing(), DataFrame.insert() e.t.c. Also learned insert() is used to add a column at any position of the DataFrame.
Related Articles
- How to Drop Column From Pandas DataFrame
- Rename Pandas DataFrame Column
- Different Ways to Filter DataFrame Rows in Pandas
- Pandas Add Column Names to DataFrame
- Pandas Add Constant Column to DataFrame
- Pandas – Add an Empty Column to a DataFrame
- Append Pandas DataFrames Using for Loop
- Pandas Append Rows & Columns to Empty DataFrame
- Select pandas columns based on condition
Add Column From Dictionary Mapping:
your last example will not work as described in this article. The KEYS from the dictionary will be added as another COLUMN values in df, regardless of the dictionaly VALUES.
Thank you for pointing it out. I have fixed it now