Problem 1: When I try to add a month to the data column with a value from another column I am getting a PySpark error TypeError: Column is not iterable
.
from pyspark.sql.functions import add_months
data=[("2019-01-23",1),("2019-06-24",2),("2019-09-20",3)]
df=spark.createDataFrame(data).toDF("date","increment")
df.select(df.date,df.increment,add_months(df.date,df.increment)).show()
Get’s below PySpark Error during run-time.
TypeError: Column is not iterable
Solution for TypeError: Column is not iterable
PySpark add_months()
function takes the first argument as a column and the second argument is a literal value. if you try to use Column type for the second argument you get “TypeError: Column is not iterable”. In order to fix this use expr() function as shown below.
df.select(df.date,df.increment,
expr("add_months(date,increment)")
.alias("inc_date")).show()