PySpark – TypeError: Column is not iterable

Problem 1: When I try to add a month to the data column with a value from another column I am getting a PySpark error TypeError: Column is not iterable.


from pyspark.sql.functions import add_months
data=[("2019-01-23",1),("2019-06-24",2),("2019-09-20",3)] 
df=spark.createDataFrame(data).toDF("date","increment") 
df.select(df.date,df.increment,add_months(df.date,df.increment)).show()

Get’s below PySpark Error during run-time.


TypeError: Column is not iterable

Solution for TypeError: Column is not iterable

PySpark add_months() function takes the first argument as a column and the second argument is a literal value. if you try to use Column type for the second argument you get “TypeError: Column is not iterable”. In order to fix this use expr() function as shown below.


df.select(df.date,df.increment,
     expr("add_months(date,increment)")
  .alias("inc_date")).show()

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply