Pandas Concatenate Two Columns

How to concatenate two/multiple columns of Pandas DataFrame? You can use various methods, including the + operator and several Pandas functions. This operation is often performed in data manipulation and analysis to merge or combine information from two different columns into a single column.

In this article, I will cover the most used ways in my real-time projects to concatenate two or multiple columns of string/text type. While concat based on your need, you may be required to add a separator hence, I will explain examples with the separator as well.

Related: You can concatenate the two DataFrames in Pandas.

1. Quick Examples of Pandas Concatenate Two Columns of DataFrame

If you are in a hurry, below are some quick examples of how to concatenate two columns of text in Pandas DataFrame.


# Below are some quick examples

# Example 1: Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]

# Example 2: Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)

# Example 3: Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)

# Example 4: Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep="-")

# Example 5: Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)

# Example 6: Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]

Now, let’s run these examples by creating a DataFrame. Our DataFrame contains column names Courses, Fee, Duration, and Discount, I will merge the columns Courses & Duration with ‘-‘ separator and creates a new column Period.


# Create DataFrame
import pandas as pd
technologies = ({
     'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
     'Fee' :[20000,25000,26000,22000,24000],
     'Duration':['30days','40days','35days','40days','60days'],
     'Discount':[1000,1500,2500,2100,2000]
               })
df = pd.DataFrame(technologies)
print("DataFrame:\n", df)

Yields below output.

Pandas concatenate two columns

2. Concatenate Two Columns Using + Operator in Pandas

You can use + operator to concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply a + operator on numeric columns it actually does addition instead of concatenation.


# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print("After concatenating the two DataFrames:\n", df)

Yields below output.

Pandas concatenate two columns

3. Using the apply() Method to Concat Two String Columns

You can also use the DataFrame.apply() function to compress two or multiple columns of the DataFrame to a single column. This function is used to apply a function on a specific axis. When we concatenate two string columns using the apply() method, you can use a join() function to join this. For example,


# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print("After concatenating the two DataFrames:\n", df)

Yields the same output as above.

4. Using agg() to Concat String Columns of DataFrame

To concatenate multiple string columns, you can also use the df.agg() method. Like the above code, pass all the columns, you want to concatenate as a list. Then apply the agg() method along with the join() function and get the desired output.


# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print("After concatenating the two DataFrames:\n", df)

Yields the same output as above.

5. Using Series.str.cat() Function to Concat Columns

By using series.str.cat() function you can concatenate two Series by a delimiter/separator. You can apply this with DataFrame as below. Here df["courses"] & df["Duration"] returns series.


# Using Series.str.cat() function 
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print("After concatenating the two DataFrames:\n", df)

Yields the same output as above.

6. Using DataFrame.apply() and Lambda Function to Concat

apply() method with lambda can be used to achieve the same. You can use this method to generalize to an arbitrary number of string columns by replacing df[["Courses", "Duration"]] with any column slice of your DataFrame.


# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: " ".join(x), axis =1)
print("After concatenating the two DataFrames:\n", df)

Yields the same output as above.

7. Concat Two Columns Using map() Function

Finally, you can use the map() function to concatenate multiple columns. Use this function to get more freedom even to check conditions.


# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + " " + df["Duration"]
print("After concatenating the two DataFrames:\n", df)

Yields the same output as above.

8. Complete Example of Concatenate Two Columns in Pandas

Below is a complete example of how to concat two or multiple columns on Pandas DataFrame.


import pandas as pd
technologies = ({
     'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
     'Fee' :[20000,25000,26000,22000,24000],
     'Duration':['30days','40days','35days','40days','60days'],
     'Discount':[1000,1500,2500,2100,2000]
               })
df = pd.DataFrame(technologies)
print(df)

# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) +"-"+ df["Duration"]
print(df)

# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis=1)
print(df)

# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis=1)
print(df)

# Using Series.str.cat() function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")
print(df)

# Using DataFrame.apply() and lambda function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis =1)
print(df)

# Using map() function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
print(df)

Frequently Asked Questions on Concatenate Two DataFrame Columns

What does it mean to concatenate two DataFrame columns?

Concatenating two DataFrame columns means combining the data from two separate columns in a DataFrame to create a new column.

How can I concatenate two DataFrame columns in Python using pandas?

You can use the pandas library in Python to concatenate two DataFrame columns. The most common method is by using the pd.concat() function or by directly using the + operator.

What is the difference between using pd.concat() and the + operator to concatenate columns?

The main difference is that pd.concat() provides more flexibility for combining columns and handling missing values. It allows you to specify the axis, handling of indices, and more. The + operator performs simple element-wise addition or string concatenation and may not handle missing values as gracefully.

Conclusion

In this article, you have learned how to concatenate two or multiple string columns in pandas DataFrame using + operator, DataFrame.map(), DataFrame.agg(), Series.str.cat(), and DataFrame.apply() method.

Happy Learning !!

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply

You are currently viewing Pandas Concatenate Two Columns