Problem: In PySpark I am getting error AttributeError: ‘DataFrame’ object has no attribute ‘map’ when I use map() transformation on DataFrame.
df2=df.map(lambda x: [x[0],x[1]])
File "C:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\dataframe.py", line 1401, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'map'
Solution of AttributeError: ‘DataFrame’ object has no attribute ‘map’ in PySpark
PySpark DataFrame doesn’t have a map() transformation instead it’s present in RDD hence you are getting the error AttributeError: ‘DataFrame’ object has no attribute ‘map’
So first, Convert PySpark DataFrame to RDD using df.rdd
, apply the map() transformation which returns an RDD and Convert RDD to DataFrame back, let’s see with an example.
data = [('James',3000),('Anna',4001),('Robert',6200)]
df = spark.createDataFrame(data,["name","salary"])
df.show()
#converts DataFrame to rdd
rdd=df.rdd
print(rdd.collect())
# apply map() transformation)
rdd2=df.rdd.map(lambda x: [x[0],x[1]*20/100])
print(rdd2.collect())
#conver RDD to DataFrame
df2=rdd2.toDF(["name","bonus"])
df2.show()
Hope this helps. Happy Learning !!