Pandas Combine Two Columns of Text in DataFrame

  • Post author:
  • Post category:Pandas
  • Post last modified:November 8, 2023

When working with data we often would be required to combine/merge two or multiple columns of text/string in Pandas DataFrame, you can do this in several ways. In this article, I will cover the most used ways in my real-time projects to combine/merge multiple string/text columns. While merging based on your need, you may be required to add a separator hence, I will explain examples with the separator.

Related: You can also combine/merge/join the Pandas two/multiple DataFrames.

1. Quick Examples of Combine Two Columns of Text

If you are in a hurry, below are some quick examples of combining two columns of text in Pandas DataFrame.


# Below are some quick examples

# Example 1: Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]

# Example 2: Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)

# Example 3: Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)

# Example 4: Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep="-")

# Example 5: Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)

# Example 6: Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]

Now, let’s run these examples by creating a DataFrame. Our DataFrame contains column names Courses, Fee, Duration, and Discount, I will merge the columns Courses & Duration with ‘-‘ separator and creates a new column Period.


# Create DataFrame
import pandas as pd
technologies = ({
     'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
     'Fee' :[20000,25000,26000,22000,24000],
     'Duration':['30days','40days','35days','40days','60days'],
     'Discount':[1000,1500,2500,2100,2000]
               })
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

Pandas Combine Two Columns

2. Combine Two Columns Using + Operator

Using the + operator to combine/merge two or multiple text/string columns in Pandas DataFrame. Note that when you apply this operator on numeric columns it actually does addition instead of concatenation.


# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print("After combining two columns of DataFrame:\n", df)

Yields below output.

Pandas Combine Two Columns

3. Using the .apply() Method to Combine Two String Columns

You can also use the Pandas apply() function to compress two or multiple columns of the DataFrame to a single column. join() function is used to join strings. DataFrame.apply() function is used to apply another function on a specific axis.


# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print("After combining two columns of DataFrame:\n", df)

Yields the same output as above.

4. Using DataFrame.agg() Method to Merge String Columns

To join multiple string columns, you can also use DataFrame.agg() method. Like above pass all the columns you want to merge as a list.


# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print("After combining two columns of DataFrame:\n", df)

Yields the same output as above.

5. Using Series.str.cat() Function to Combine Two Columns of Text

By using series.str.cat() function you can combine two Series by a separator. You can apply this with DataFrame as below. Here df["courses"] & df["Duration"] returns series.


# Using Series.str.cat() function 
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print("After combining two columns of DataFrame:\n", df)

Yields the same output as above.

6. Using DataFrame.apply() and Lambda Function

apply() method with lambda can be used to achieve the same. You can use this method to generalize to an arbitrary number of string columns by replacing df[[“Courses”, “Duration”]] with any column slice of your DataFrame.


# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: " ".join(x), axis =1)
print("After combining two columns of DataFrame:\n", df)

Yields the same output as above.

7. Combine Two Columns of Text Using map() Function

Finally, map() is also used to concatenate multiple columns. Using map() you get more freedom even to check conditions.


# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + " " + df["Duration"]
print("After combining two columns of DataFrame:\n", df)

Yields the same output as above.

8. Complete Example For Combine Two Columns of Text


import pandas as pd
technologies = ({
     'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
     'Fee' :[20000,25000,26000,22000,24000],
     'Duration':['30days','40days','35days','40days','60days'],
     'Discount':[1000,1500,2500,2100,2000]
               })
df = pd.DataFrame(technologies)
print(df)

# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print(df)

# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print(df)

# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print(df)

# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print(df)

# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)
print(df)

# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
print(df)

Frequently Asked Questions on Combine columns of DataFrame

How do I combine two columns in a DataFrame?

You can combine two columns in a DataFrame by using the + operator. For example, df = pd.DataFrame({'column1': [1, 2, 3], 'column2': [4, 5, 6]})<br/>df['comb_column'] = df['column1'] + df['column2']

How do I concatenate (join) columns with a separator in a DataFrame?

You can use the .str.cat() method to concatenate columns with a separator. For example, df["comb_column"] = df["column1"].str.cat(df["column2"], sep = "-")

How can I merge two DataFrames by combining columns?

To merge two DataFrames by combining columns, you can use the pd.concat() function or the .merge() method. For example, df3 = pd.concat([df1, df2], axis=1)

How do I combine text columns with a space in between in a DataFrame?

You can use the .str.cat() method with a space separator to combine text columns with a space in between. For example, df["comb_column"] = df["column1"].str.cat(df["column2"], sep = "-")

Conclusion

In this article, you have learned how to combine two or multiple string columns in Pandas DataFrame using + operator, DataFrame.map(), DataFrame.agg(), and Series.str.cat(), DataFrame.apply() method.

Happy Learning !!

References

Naveen

I am a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, I have honed my expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. My journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. I have started this SparkByExamples.com to share my experiences with the data as I come across. You can learn more about me at LinkedIn

Leave a Reply

You are currently viewing Pandas Combine Two Columns of Text in DataFrame