In PySpark, the concat() function is used to concatenate multiple string columns into a single column without any separator. It joins the values of two or more columns or string expressions directly, producing a new string column.
Both concat() and concat_ws() belong to the pyspark.sql.functions module and are often used for combining multiple string columns into one. However, unlike concat_ws(), the concat() function does not include a separator between values and does not skip null values automatically.
In this article, we’ll explore how the concat() function works, how it differs from concat_ws(), and several use cases such as merging multiple columns, adding fixed strings, handling null values, and using it in SQL queries.
Key Points-
- You can use
concat()to merge multiple columns or string expressions into a single string column. - Unlike
concat_ws(), it does not add any separator between values. - If any column contains
null, the result will also benull. - You can combine column values with fixed strings for formatting.
- Works seamlessly with both DataFrame API and Spark SQL.
- Commonly used for generating IDs, full names, or concatenated keys without separators.
- To handle nulls automatically, prefer using
concat_ws()instead.
PySpark concat() Function
The concat() function merges multiple input string columns into one single string column without any separator. It returns a column containing the concatenated values in order.
Syntax
Following is the syntax of the concat() function.
# Syntax of concat()
concat(*cols)
Parameters
*cols: (string or Column)
One or more column names or column expressions to concatenate.
Return Value
Returns a single string column that joins all specified input columns or string expressions without any separator. If any column is null, the entire result becomes null.
We’ll use the following sample DataFrame throughout the examples:
from pyspark.sql import SparkSession
from pyspark.sql.functions import concat, concat_ws, col
# Create SparkSession
spark = SparkSession.builder.appName("sparkbyexamples").getOrCreate()
# Sample Data
data = [
('James','','Smith','1991-04-01','M',3000),
('Michael','Rose','','2000-05-19','M',4000),
('Robert','','Williams','1978-09-05','M',4000),
('Maria','Anne','Jones','1967-12-01','F',4000),
('Jen','Mary','Brown','1980-02-17','F',-1)
]
columns = ["firstname","middlename","lastname","dob","gender","salary"]
df = spark.createDataFrame(data=data, schema=columns)
df.show(truncate=False)
Yields below the output.

Concatenate Multiple Columns using concat()
You can use the concat() function to join multiple columns directly into one string without any separator.
from pyspark.sql.functions import concat
# Concatenate multiple columns
df_concat = df.select(
concat(df.firstname, df.middlename, df.lastname).alias("FullName"),
"dob", "gender", "salary"
)
df_concat.show(truncate=False)
Yields below the output.

Null Values with concat()
By default, if any column involved in concatenation is null, concat() will return null.
# Null Values with concat()
data = [
("James", None, "Smith"),
("Michael", "Rose", None),
("Robert", None, None)
]
columns = ["firstname", "middlename", "lastname"]
df_null = spark.createDataFrame(data, columns)
df_null_concat = df_null.select(
concat("firstname", "middlename", "lastname").alias("FullName")
)
df_null_concat.show(truncate=False)
Yields below the output.
# Output:
+--------+
|FullName|
+--------+
|NULL |
|NULL |
|NULL |
+--------+
To ignore nulls during concatenation, use concat_ws() instead.
Add Fixed Strings using concat()
You can include fixed strings (like separators, prefixes, or suffixes) inside the concat() function by wrapping them with lit() from pyspark.sql.functions.
from pyspark.sql.functions import lit
# Add fixed string between columns
df_fixed = df.select(
concat(col("firstname"), lit(" "), col("lastname")).alias("FullNameWithSpace"),
"gender"
)
df_fixed.show(truncate=False)
Yields below the output.
# Output:
+-----------------+------+
|FullNameWithSpace|gender|
+-----------------+------+
|James Smith |M |
|Michael |M |
|Robert Williams |M |
|Maria Jones |F |
|Jen Brown |F |
+-----------------+------+
Use concat() in SQL Queries
You can register your DataFrame as a temporary SQL view and use concat directly in SQL SELECT statements to merge multiple columns into one string.
# Use concat() in SQL Queries
# Register DataFrame as a temporary view
df.createOrReplaceTempView("people")
# Use concat() in SQL
df_sql = spark.sql("""
SELECT concat(firstname, middlename, lastname) AS FullName, dob, gender, salary
FROM people
""")
df_sql.show(truncate=False)
Yields below the output.
# Output:
+--------------+----------+------+------+
|FullName |dob |gender|salary|
+--------------+----------+------+------+
|JamesSmith |1991-04-01|M |3000 |
|MichaelRose |2000-05-19|M |4000 |
|RobertWilliams|1978-09-05|M |4000 |
|MariaAnneJones|1967-12-01|F |4000 |
|JenMaryBrown |1980-02-17|F |-1 |
+--------------+----------+------+------+
PySpark concat() vs concat_ws()
Both functions are used for concatenating string columns, but they differ in handling separators and null values.
| Feature | concat() | concat_ws() |
|---|---|---|
| Separator | No separator | Adds specified separator |
| Null Handling | Returns null if any column is null | Ignores null values |
| Use Case | When joining raw strings | When you need a delimiter like space, comma, or dash |
For example:
# concat() vs concat_ws()
from pyspark.sql.functions import concat_ws
df_compare = df.select(
concat(df.firstname, df.middlename, df.lastname).alias("concat_output"),
concat_ws(" ", df.firstname, df.middlename, df.lastname).alias("concat_ws_output")
)
df_compare.show(truncate=False)
Yields below the output.
# Output:
+--------------+----------------+
|concat_output |concat_ws_output|
+--------------+----------------+
|JamesSmith |James Smith |
|MichaelRose |Michael Rose |
|RobertWilliams|Robert Williams|
|MariaAnneJones|Maria Anne Jones|
|JenMaryBrown |Jen Mary Brown |
+--------------+----------------+
Frequently Asked Questions of PySpark concat()
The concat function in PySpark is used to combine multiple string columns or expressions into a single column. It merges values directly without adding any separator between them.
If any of the columns involved in concatenation contain a null value, the entire result will be null. This is a key difference between concat and concat_ws, as concat_ws automatically skips nulls.
You can concatenate string columns, string literals, or column expressions. For numeric columns, you should first cast them to string before using concat.
concat merges multiple columns directly, whereas concat_ws allows you to include a custom separator such as space, comma, or hyphen between values. Additionally, concat_ws automatically ignores null values.
You can include fixed strings by wrapping them with the lit() function. For example, to add a space between two names, you can use lit(” “) inside concat.
You can register your DataFrame as a temporary SQL view and use concat directly in SQL SELECT statements to merge multiple columns into one string.
Conclusion
In this article, you learned how to concatenate multiple columns into a single string using PySpark’s concat() function.
While it joins columns directly without separators, it does not handle null values automatically. To manage separators and nulls effectively, you can use concat_ws() instead.
By combining concat() with lit() and conditional expressions, you can easily format and generate clean string outputs for IDs, labels, or full names.
Happy Learning!!