AttributeError: 'DataFrame' object has no attribute 'map' in PySpark

| *** Please Subscribe for Ad Free & Premium Content ***

Post author:Naveen Nelamali
Post category:PySpark
Post last modified:March 27, 2024
Reading time:3 mins read

You are currently viewing AttributeError: ‘DataFrame’ object has no attribute ‘map’ in PySpark

Problem: In PySpark I am getting error AttributeError: ‘DataFrame’ object has no attribute ‘map’ when I use map() transformation on DataFrame.



df2=df.map(lambda x: [x[0],x[1]])

  File "C:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\dataframe.py", line 1401, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))

AttributeError: 'DataFrame' object has no attribute 'map'

Solution of AttributeError: ‘DataFrame’ object has no attribute ‘map’ in PySpark

PySpark DataFrame doesn’t have a map() transformation instead it’s present in RDD hence you are getting the error AttributeError: ‘DataFrame’ object has no attribute ‘map’

So first, Convert PySpark DataFrame to RDD using df.rdd, apply the map() transformation which returns an RDD and Convert RDD to DataFrame back, let’s see with an example.


data = [('James',3000),('Anna',4001),('Robert',6200)]
df = spark.createDataFrame(data,["name","salary"])
df.show()

#converts DataFrame to rdd
rdd=df.rdd
print(rdd.collect())

# apply map() transformation)
rdd2=df.rdd.map(lambda x: [x[0],x[1]*20/100])
print(rdd2.collect())

#conver RDD to DataFrame
df2=rdd2.toDF(["name","bonus"])
df2.show()

Hope this helps. Happy Learning !!

Solution of AttributeError: ‘DataFrame’ object has no attribute ‘map’ in PySpark

Related Articles