Pandas Concatenate Two Columns

When working with data we often would be required to concatenate two or multiple columns of text/string in pandas DataFrame, you can do this in several ways. In this article, I will cover the most used ways in my real-time projects to concatenate two or multiple columns of string/text type. While concat based on your need, you may be required to add a separator hence, I will explain examples with the separator as well.

1. Quick Examples of pandas Concatenate Two Columns of DataFrame

If you are in a hurry, below are some quick examples of how to concatenate two columns of text in pandas DataFrame.


# Below are some quick examples

# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]

# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)

# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)

# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep="-")

# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)

# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]

Now, let’s run these examples by creating a DataFrame. Our DataFrame contains column names Courses, Fee, Duration, and Discount, I will merge columns Courses & Duration with ‘-‘ separator and creates a new column Period.


# Create DataFrame
import pandas as pd
technologies = ({
     'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
     'Fee' :[20000,25000,26000,22000,24000],
     'Duration':['30days','40days','35days','40days','60days'],
     'Discount':[1000,1500,2500,2100,2000]
               })
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  20000   30days      1000
1  PySpark  25000   40days      1500
2   Hadoop  26000   35days      2500
3   Python  22000   40days      2100
4   pandas  24000   60days      2000

2. Concatenate Two Columns Using + Operator in pandas

By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.


# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print(df)

Yields below output.


# Output:
   Courses    Fee Duration  Discount          Period
0    Spark  20000   30days      1000    Spark-30days
1  PySpark  25000   40days      1500  PySpark-40days
2   Hadoop  26000   35days      2500   Hadoop-35days
3   Python  22000   40days      2100   Python-40days
4   pandas  24000   60days      2000   pandas-60days

3. Using apply() Method to Concat Two String Columns

You can also use the DataFrame.apply() function compressing two or multiple columns of the DataFrame to a single column. join() function is used to join strings. DataFrame.apply() function is used to apply a function on a specific axis.


# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print(df)

Yields same output as above.

4. Using agg() to Concat String Columns of DataFrame

To concat multiple string columns, you can also use DataFrame.agg() method. Like above pass all the columns, you wanted to concat as a list.


# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print(df)

Yields same output as above.

5. Using Series.str.cat() Function to Concat Columns

By using series.str.cat() function you can concatenate two Series by a delimiter/separator. You can apply this with DataFrame as below. Here df["courses"] & df["Duration"] returns series.


# Using Series.str.cat() function 
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print(df)

Yields same output as above.

6. Using DataFrame.apply() and Lambda Function to Concat

apply() method with lambda can be used to achieve the same. You can use this method to generalize to an arbitrary number of string columns by replacing df[["Courses", "Duration"]] with any column slice of your DataFrame.


# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: " ".join(x), axis =1)
print(df)

Yields same output as above.

7. Concat Two Columns Using map() Function

Finally, map() is also used to concatenate multiple columns. Using map() you get more freedom even to check conditions.


# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + " " + df["Duration"]
print(df)

Yields same output as above.

8. Complete Example of Concatenate Two Columns in pandas

Below is a complete example of how to concat two or multiple columns on pandas DataFrame.


import pandas as pd
technologies = ({
     'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
     'Fee' :[20000,25000,26000,22000,24000],
     'Duration':['30days','40days','35days','40days','60days'],
     'Discount':[1000,1500,2500,2100,2000]
               })
df = pd.DataFrame(technologies)
print(df)

# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print(df)

# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print(df)

# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print(df)

# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print(df)

# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)
print(df)

# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
print(df)

Conclusion

In this article, you have learned how to concatenate two or multiple string columns in pandas DataFrame using + operator, DataFrame.map(), DataFrame.agg(), and Series.str.cat(), DataFrame.apply() method.

Happy Learning !!

References

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing Pandas Concatenate Two Columns