To combine two columns of text in DataFrame, you can use the +
operator and series.str.cat()
function. When working with data we often would be required to combine/merge two or multiple columns of text/string in Pandas DataFrame.
In this article, I will explain how combining two columns of text/string using multiple ways of pandas such as the apply()
function, agg()
function, apply()
function with lambda
, and map()
function.
Key Points –
- Pandas allows combining two columns of text in a DataFrame using various methods.
- Use the
+
operator or thestr.cat()
method to concatenate text from two columns in a Pandas DataFrame. - The
.str.cat()
method provides more flexibility in concatenating columns and specifying separators. - Ensure to handle missing values appropriately using parameters like
na_rep
instr.cat()
to avoid errors during concatenation.
Quick Examples of Combine Two Columns of Text
If you are in a hurry, below are some quick examples of combining two columns of text in Pandas DataFrame.
# Quick examples of combine two columns of text
# Example 1: Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
# Example 2: Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
# Example 3: Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
# Example 4: Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep="-")
# Example 5: Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)
# Example 6: Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
Now, let’s run these examples by creating a DataFrame. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
, I will merge the columns Courses
& Duration
with ‘-
‘ separator and creates a new column Period
.
# Create DataFrame
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30days','40days','35days','40days','60days'],
'Discount':[1000,1500,2500,2100,2000]
})
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
Combine Two Columns Using +
Operator
Using the +
operator to combine/merge two or multiple text/string columns in Pandas DataFrame. Note that when you apply this operator on numeric columns it actually does addition instead of concatenation.
# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print("After combining two columns of DataFrame:\n", df)
Yields below output.
Using Series.str.cat() Function
By using series.str.cat()
function you can combine two Series by a separator. You can apply this with DataFrame as below. Here df["courses"]
& df["Duration"]
returns series.
# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print("After combining two columns of DataFrame:\n", df)
# Output:
# After combining two columns of DataFrame:
# Courses Fee Duration Discount period
#0 Spark 20000 30days 1000 Spark-30days
#1 PySpark 25000 40days 1500 PySpark-40days
#2 Hadoop 26000 35days 2500 Hadoop-35days
#3 Python 22000 40days 2100 Python-40days
#4 pandas 24000 60days 2000 pandas-60days
Using apply() Method
You can also use the Pandas apply() function to compress two or multiple columns of the DataFrame to a single column. join()
function is used to join strings. DataFrame.apply()
function is used to apply another function on a specific axis.
# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print("After combining two columns of DataFrame:\n", df)
# Output:
# After combining two columns of DataFrame:
# Courses Fee Duration Discount Period
# 0 Spark 20000 30days 1000 Spark-30days
# 1 PySpark 25000 40days 1500 PySpark-40days
# 2 Hadoop 26000 35days 2500 Hadoop-35days
# 3 Python 22000 40days 2100 Python-40days
# 4 pandas 24000 60days 2000 pandas-60days
Using DataFrame.agg() Method
To join multiple string columns, you can also use DataFrame.agg()
method. Like above pass all the columns you want to merge as a list.
# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print("After combining two columns of DataFrame:\n", df)
# Output:
# After combining two columns of DataFrame:
# Courses Fee Duration Discount Period
# 0 Spark 20000 30days 1000 Spark-30days
# 1 PySpark 25000 40days 1500 PySpark-40days
# 2 Hadoop 26000 35days 2500 Hadoop-35days
# 3 Python 22000 40days 2100 Python-40days
# 4 pandas 24000 60days 2000 pandas-60days
Using DataFrame.apply() and Lambda Function
apply() method with lambda can be used to achieve the same. You can use this method to generalize to an arbitrary number of string columns by replacing df[[“Courses”, “Duration”]] with any column slice of your DataFrame.
# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: " ".join(x), axis =1)
print("After combining two columns of DataFrame:\n", df)
# Output:
# After combining two columns of DataFrame:
# Courses Fee Duration Discount Period
# 0 Spark 20000 30days 1000 Spark-30days
# 1 PySpark 25000 40days 1500 PySpark-40days
# 2 Hadoop 26000 35days 2500 Hadoop-35days
# 3 Python 22000 40days 2100 Python-40days
# 4 pandas 24000 60days 2000 pandas-60days
Combine Two Columns of Text Using map() Function
Finally, map() is also used to concatenate multiple columns. Using map() you get more freedom even to check conditions.
# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + " " + df["Duration"]
print("After combining two columns of DataFrame:\n", df)
# Output:
# After combining two columns of DataFrame:
# Courses Fee Duration Discount Period
# 0 Spark 20000 30days 1000 Spark-30days
# 1 PySpark 25000 40days 1500 PySpark-40days
# 2 Hadoop 26000 35days 2500 Hadoop-35days
# 3 Python 22000 40days 2100 Python-40days
# 4 pandas 24000 60days 2000 pandas-60days
Complete Example For Combine Two Columns of Text
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30days','40days','35days','40days','60days'],
'Discount':[1000,1500,2500,2100,2000]
})
df = pd.DataFrame(technologies)
print(df)
# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print(df)
# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print(df)
# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print(df)
# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print(df)
# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)
print(df)
# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
print(df)
Frequently Asked Questions on Combine Two Columns of Text
You can combine more than two columns of text in a DataFrame. You can use the same methods like the +
operator or the .str.cat()
method to combine multiple columns.
You can concatenate (join) columns with a separator in a DataFrame using the .str.cat()
method. This method allows you to specify a separator between the concatenated values.
To merge two DataFrames by combining columns, you can use the pd.concat()
function or the .merge()
method. For example, df3 = pd.concat([df1, df2], axis=1)
You can use the .str.cat()
method with a space separator to combine text columns with a space in between. For example, df["comb_column"] = df["column1"].str.cat(df["column2"], sep = "-")
You can customize the separator between the values of the combined columns. For example, you can use spaces, hyphens, underscores, or any other character as a separator. The .str.cat()
method allows specifying a separator using the sep
parameter.
Conclusion
In conclusion, you have learned various methods to combine two or multiple string columns in a Pandas DataFrame, including the +
operator, DataFrame.map()
, DataFrame.agg()
, Series.str.cat()
, and DataFrame.apply()
. These techniques provide flexibility and efficiency for handling and manipulating text data within your DataFrame.
Happy Learning !!
Related Articles
- How to Merge Series into Pandas DataFrame
- Operator Chaining to Filter Pandas DataFrame Rows
- How to Install Anaconda & Run Jupyter Notebook
- Pandas Count The Frequency of a Value in Column
- How to Get a Cell Value From Pandas DataFrame
- Add Column Name to Pandas Series?
- Check Values of Pandas Series is Unique
- Convert Pandas Series to DataFrame
- Pandas Merge DataFrames Explained Examples
- Pandas Merge DataFrames on Index