How to concatenate two/multiple columns of Pandas DataFrame? You can use various methods, including the +
operator and several Pandas functions. This operation is often performed in data manipulation and analysis to merge or combine information from two different columns into a single column.
In this article, I will cover the most used ways in my real-time projects to concatenate two or multiple columns of string/text type. While concat based on your need, you may be required to add a separator; hence, I will explain examples with the separator as well.
Key Points –
- Pandas offers versatile methods like
.str.cat()
andDataFrame.agg()
to efficiently concatenate two columns. - The
+
operator can be used directly for concatenating string columns in Pandas, but it performs addition for numeric columns. - Additionally, the
DataFrame.apply()
method can be used with custom functions to concatenate columns along specified axes. - Selecting the appropriate concatenation method depends on factors such as data type, performance considerations, and specific concatenation requirements
Quick Examples of Concatenate Two Columns
If you are in a hurry, below are some quick examples of how to concatenate two columns in Pandas DataFrame.
# Quick examples of concatenate two columns
# Example 1: Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
# Example 2: Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
# Example 3: Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
# Example 4: Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep="-")
# Example 5: Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)
# Example 6: Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
Now, let’s create a DataFrame with some rows and columns and then execute the examples provided to validate the results.
# Create DataFrame
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30days','40days','35days','40days','60days'],
'Discount':[1000,1500,2500,2100,2000]
})
df = pd.DataFrame(technologies)
print("DataFrame:\n", df)
Yields below output.
Concatenate Two Columns Using +
Operator in Pandas
You can use the +
operator to concatenate two or more string/text columns in a Pandas DataFrame. Note that When the +
operator is applied to numeric columns in a DataFrame, it performs arithmetic addition instead of string concatenation.
# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print("After concatenating the two DataFrames:\n", df)
In the above example, use the +
operator to concatenate the Courses
and Duration
columns, and then store the result in a new column called Period
. This example yields the below output.
Using the apply() Function
You can consolidate two or more columns of a DataFrame into a single column efficiently using the DataFrame.apply() function. This function is used to apply a function on a specific axis. When you concatenate two string columns using the apply()
method, you can use a join() function to join this.
# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print("After concatenating the two DataFrames:\n", df)
Yields the same output as above.
Using agg() to Concat String Columns of DataFrame
To concatenate multiple string columns, you can utilize the df.agg()
method. Similar to the previous code, you can pass all the columns you want to concatenate as a list. Then apply the agg()
method along with the join()
function and get the desired output.
# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print("After concatenating the two DataFrames:\n", df)
Yields the same output as above.
Using Series.str.cat() Function to Concat Columns
The Series.str.cat()
function efficiently concatenates two Series with a delimiter/separator. You can certainly apply this to a DataFrame by using DataFrame columns, which return Series objects.
# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print("After concatenating the two DataFrames:\n", df)
Yields the same output as above.
Using DataFrame.apply() and Lambda Function to Concat
The apply() method with a lambda function can be used to achieve the same result. You can generalize this method to concatenate an arbitrary number of string columns by replacing df[["Courses", "Duration"]]
with any column slice of your DataFrame.
# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: " ".join(x), axis =1)
print("After concatenating the two DataFrames:\n", df)
Yields the same output as above.
Concat Two Columns Using map() Function
You can utilize the map()
function to concatenate multiple columns, offering greater flexibility, including the ability to apply custom logic or conditions as needed.
# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + " " + df["Duration"]
print("After concatenating the two DataFrames:\n", df)
Yields the same output as above.
Complete Example of Concatenate Two Columns in Pandas
Below is a complete example of how to concat two or multiple columns on Pandas DataFrame.
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30days','40days','35days','40days','60days'],
'Discount':[1000,1500,2500,2100,2000]
})
df = pd.DataFrame(technologies)
print(df)
# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print(df)
# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print(df)
# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print(df)
# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print(df)
# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)
print(df)
# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
print(df)
FAQ on Concatenate Two DataFrame Columns
Concatenating two DataFrame columns means combining the data from two separate columns in a DataFrame to create a new column.
You can concatenate two columns in a pandas DataFrame using various methods such as the +
operator, the apply()
method, the str.cat()
function, or the map()
function.
You can concatenate two columns in a Pandas DataFrame using the +
operator. When you use the +
operator between two columns, Pandas performs element-wise addition for numeric columns and concatenation for string/text columns.
To concatenate multiple columns in pandas, you can use methods like apply()
with a lambda function, str.cat()
function, or the map()
function. These methods allow you to concatenate multiple columns efficiently.
You can use methods like apply()
with a lambda function or the map()
function to concatenate columns while also incorporating conditions or custom logic as needed
Conclusion
In this article, I have explained how to concatenate two columns in pandas DataFrame using +
operator, apply()
method, str.cat()
function, map()
function, or agg()
method.
Happy Learning !!
Related Articles
- How to Merge Series into Pandas DataFrame
- Pandas Merge DataFrames on Index
- Pandas Merge Two DataFrames
- Pandas Merge DataFrames Explained Examples
- How to append two DataFrames with examples?
- How to combine two DataFrames?
- Pandas join two DataFrames
- How to Append Pandas Series?
- Append Pandas DataFrames Using for Loop