When working with data we often would be required to combine/merge two or multiple columns of text/string in pandas DataFrame, you can do this in several ways. In this article, I will cover mostly used ways in my real-time projects to combine/merge multiple string/text columns. While merging based on your need, you may be required to add a separator hence, I will explain examples with the separator.
1. Quick Examples of Combine Two Columns of Text
If you are in a hurry, below are some quick examples of how to combine two columns of text in pandas DataFrame.
# Below are quick example
# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep="-")
# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)
# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
Now, let’s run these examples by creating a DataFrame. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
, I will merge columns Courses
& Duration
with ‘-
‘ separator and creates a new column Period
.
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30days','40days','35days','40days','60days'],
'Discount':[1000,1500,2500,2100,2000]
})
df = pd.DataFrame(technologies)
print(df)
Yields below output.
Courses Fee Duration Discount
0 Spark 20000 30days 1000
1 PySpark 25000 40days 1500
2 Hadoop 26000 35days 2500
3 Python 22000 40days 2100
4 pandas 24000 60days 2000
2. Combine Two Columns Using +
Operator
By use + operator
simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply +
operator on numeric columns it actually does addition instead of concatenation.
# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print(df)
Yields below output.
Courses Fee Duration Discount Period
0 Spark 20000 30days 1000 Spark-30days
1 PySpark 25000 40days 1500 PySpark-40days
2 Hadoop 26000 35days 2500 Hadoop-35days
3 Python 22000 40days 2100 Python-40days
4 pandas 24000 60days 2000 pandas-60days
3. Using .apply() Method to Combine Two String Columns
You can also use the .apply()
function compressing two or multiple columns of the DataFrame to a single column. join()
function is used to join strings. DataFrame.apply()
function is used to apply another function on a specific axis.
# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print(df)
Yields same output as above.
4. Using DataFrame.agg() Method to Merge String Columns
To join multiple string columns, you can also use DataFrame.agg() method. Like above pass all the columns you wanted to merge as a list.
# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print(df)
Yields same output as above.
5. Using Series.str.cat() Function to Combine Two Columns of Text
By using series.str.cat()
function you can combine two Series by a separator. You can apply this with DataFrame as below. Here df["courses"]
& df["Duration"]
returns series.
# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print(df)
Yields same output as above.
6. Using DataFrame.apply() and Lambda Function
apply() method with lambda can be used to achieve the same. You can use this method generalizes to an arbitrary number of string columns by replacing df[["Courses", "Duration"]]
with any column slice of your DataFrame.
# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: " ".join(x), axis =1)
print(df)
Yields same output as above.
7. Combine Two Columns of Text Using map() Function
Finally, map() is also used to concatenate multiple columns. Using map() you get more freedom even to check conditions.
# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + " " + df["Duration"]
print(df)
Yields same output as above.
8. Complete Example For Combine Two Columns of Text
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30days','40days','35days','40days','60days'],
'Discount':[1000,1500,2500,2100,2000]
})
df = pd.DataFrame(technologies)
print(df)
# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print(df)
# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print(df)
# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print(df)
# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print(df)
# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)
print(df)
# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
print(df)
Conclusion
In this article, you have learned how to combine two or multiple string columns in pandas DataFrame using + operator
, DataFrame.map()
, DataFrame.agg()
, and Series.str.cat()
, DataFrame.apply()
method.
Happy Learning !!
Related Articles
- How to Merge Series into Pandas DataFrame
- Operator Chaining to Filter Pandas DataFrame Rows
- How to Install Anaconda & Run Jupyter Notebook
- Pandas Count The Frequency of a Value in Column
- How to Get a Cell Value From Pandas DataFrame
- Add Column Name to Pandas Series?
- Check Values of Pandas Series is Unique
- Convert Pandas Series to DataFrame