How to concatenate two/multiple columns of Pandas DataFrame? You can use various methods, including the +
operator and several Pandas functions. This operation is often performed in data manipulation and analysis to merge or combine information from two different columns into a single column.
In this article, I will cover the most used ways in my real-time projects to concatenate two or multiple columns of string/text type. While concat based on your need, you may be required to add a separator; hence, I will explain examples with the separator as well.
Key Points –
- Pandas offers versatile methods like
.str.cat()
andDataFrame.agg()
to efficiently concatenate two columns. - The
+
operator can be used directly for concatenating string columns in Pandas, but it performs addition for numeric columns. - Additionally, the
DataFrame.apply()
method can be used with custom functions to concatenate columns along specified axes. - Selecting the appropriate concatenation method depends on factors such as data type, performance considerations, and specific concatenation requirements
Quick Examples of Concatenate Two Columns
Following are quick examples of concatenating two columns in DataFrame.
# Quick examples of concatenate two columns
# Example 1: Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
# Example 2: Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
# Example 3: Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
# Example 4: Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep="-")
# Example 5: Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)
# Example 6: Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
To run some examples of concatenating two columns in Pandas DataFrame, let’s create Pandas DataFrame using data from a dictionary.
# Create DataFrame
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30days','40days','35days','40days','60days'],
'Discount':[1000,1500,2500,2100,2000]
})
df = pd.DataFrame(technologies)
print("DataFrame:\n", df)
Yields below output.
Using +
Operator to Concatenate Two Columns
In a Pandas DataFrame, the +
operator concatenates two or more string/text columns, combining their values element-wise. However, it’s important to note that when applied to numeric columns, the +
operator performs arithmetic addition rather than string concatenation.
# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print("After concatenating the two DataFrames:\n", df)
In this code, use the +
operator to concatenate the Courses
and Duration
columns, and then store the result in a new column called Period
. This example yields the below output.
Using the apply() Function
You can consolidate two or more columns of a DataFrame into a single column efficiently using the DataFrame.apply() function. This function is used to apply a function on a specific axis. When you concatenate two string columns using the apply()
method, you can use a join() function to join this.
# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print("After concatenating the two DataFrames:\n", df)
Yields the same output as above.
Using agg() to Concat String Columns of DataFrame
To concatenate multiple string columns, you can utilize the df.agg()
method. Similar to the previous code, you can pass all the columns you want to concatenate as a list. Then apply the agg()
method along with the join()
function and get the desired output.
# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print("After concatenating the two DataFrames:\n", df)
Yields the same output as above.
Using Series.str.cat() Function to Concat Columns
The Series.str.cat()
function efficiently concatenates two Series with a delimiter/separator. You can certainly apply this to a DataFrame by using DataFrame columns, which return Series objects.
# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print("After concatenating the two DataFrames:\n", df)
Yields the same output as above.
Using apply() & Lambda
The apply()
method, combined with a lambda function, offers a versatile approach to achieve similar concatenation results. By replacing df[[Courses, Duration]
] with any column slice of your DataFrame, this method can be generalized to concatenate an arbitrary number of string columns.
# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: " ".join(x), axis =1)
print("After concatenating the two DataFrames:\n", df)
Yields the same output as above.
Concat Two Columns Using map() Function
You can utilize the map()
function to concatenate multiple columns, offering greater flexibility, including the ability to apply custom logic or conditions as needed.
# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + " " + df["Duration"]
print("After concatenating the two DataFrames:\n", df)
Yields the same output as above.
Complete Example of Concatenate Two Columns in Pandas
Below is a complete example of how to concat two or multiple columns on Pandas DataFrame.
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30days','40days','35days','40days','60days'],
'Discount':[1000,1500,2500,2100,2000]
})
df = pd.DataFrame(technologies)
print(df)
# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print(df)
# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print(df)
# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print(df)
# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print(df)
# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)
print(df)
# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
print(df)
FAQ on Concatenate Two DataFrame Columns
Concatenating two DataFrame columns means combining the data from two separate columns in a DataFrame to create a new column.
You can concatenate two columns in a Pandas DataFrame using the +
operator. When you use the +
operator between two columns, Pandas performs element-wise addition for numeric columns and concatenation for string/text columns.
To concatenate multiple columns in pandas, you can use methods like apply()
with a lambda function, str.cat()
function, or the map()
function. These methods allow you to concatenate multiple columns efficiently.
You can use methods like apply()
with a lambda function or the map()
function to concatenate columns while also incorporating conditions or custom logic as needed
Conclusion
In summary, concatenating two columns in a Pandas DataFrame by using +
operator, apply()
method, str.cat()
function, map()
function, or agg()
function.
Happy Learning !!
Related Articles
- Pandas Merge DataFrames on Index
- Pandas Merge Two DataFrames
- Pandas Merge DataFrames Explained Examples
- How to append two DataFrames with examples?
- How to combine two DataFrames?
- Pandas join two DataFrames
- How to Append Pandas Series?
- Append Pandas DataFrames Using for Loop