Site icon Spark By {Examples}

AttributeError: ‘DataFrame’ object has no attribute ‘map’ in PySpark

pyspark attribute map

Problem: In PySpark I am getting error AttributeError: ‘DataFrame’ object has no attribute ‘map’ when I use map() transformation on DataFrame.



df2=df.map(lambda x: [x[0],x[1]])

  File "C:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\dataframe.py", line 1401, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))

AttributeError: 'DataFrame' object has no attribute 'map'

Solution of AttributeError: ‘DataFrame’ object has no attribute ‘map’ in PySpark

PySpark DataFrame doesn’t have a map() transformation instead it’s present in RDD hence you are getting the error AttributeError: ‘DataFrame’ object has no attribute ‘map’

So first, Convert PySpark DataFrame to RDD using df.rdd, apply the map() transformation which returns an RDD and Convert RDD to DataFrame back, let’s see with an example.


data = [('James',3000),('Anna',4001),('Robert',6200)]
df = spark.createDataFrame(data,["name","salary"])
df.show()

#converts DataFrame to rdd
rdd=df.rdd
print(rdd.collect())

# apply map() transformation)
rdd2=df.rdd.map(lambda x: [x[0],x[1]*20/100])
print(rdd2.collect())

#conver RDD to DataFrame
df2=rdd2.toDF(["name","bonus"])
df2.show()

Hope this helps. Happy Learning !!

Exit mobile version