• Post author:
  • Post category:PySpark
  • Post last modified:March 27, 2024
  • Reading time:3 mins read
You are currently viewing PySpark – TypeError: Column is not iterable

Problem 1: When I try to add a month to the data column with a value from another column I am getting a PySpark error TypeError: Column is not iterable.


from pyspark.sql.functions import add_months
data=[("2019-01-23",1),("2019-06-24",2),("2019-09-20",3)] 
df=spark.createDataFrame(data).toDF("date","increment") 
df.select(df.date,df.increment,add_months(df.date,df.increment)).show()

Get’s below PySpark Error during run-time.


TypeError: Column is not iterable

Solution for TypeError: Column is not iterable

PySpark add_months() function takes the first argument as a column and the second argument is a literal value. if you try to use Column type for the second argument you get “TypeError: Column is not iterable”. In order to fix this use expr() function as shown below.


df.select(df.date,df.increment,
     expr("add_months(date,increment)")
  .alias("inc_date")).show()

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium