AttributeError: ‘DataFrame’ object has no attribute ‘map’ in PySpark

Problem: In PySpark I am getting error AttributeError: ‘DataFrame’ object has no attribute ‘map’ when I use map() transformation on DataFrame.



df2=df.map(lambda x: [x[0],x[1]])

  File "C:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\dataframe.py", line 1401, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))

AttributeError: 'DataFrame' object has no attribute 'map'

Solution of AttributeError: ‘DataFrame’ object has no attribute ‘map’ in PySpark

PySpark DataFrame doesn’t have a map() transformation instead it’s present in RDD hence you are getting the error AttributeError: ‘DataFrame’ object has no attribute ‘map’

So first, Convert PySpark DataFrame to RDD using df.rdd, apply the map() transformation which returns an RDD and Convert RDD to DataFrame back, let’s see with an example.


data = [('James',3000),('Anna',4001),('Robert',6200)]
df = spark.createDataFrame(data,["name","salary"])
df.show()

#converts DataFrame to rdd
rdd=df.rdd
print(rdd.collect())

# apply map() transformation)
rdd2=df.rdd.map(lambda x: [x[0],x[1]*20/100])
print(rdd2.collect())

#conver RDD to DataFrame
df2=rdd2.toDF(["name","bonus"])
df2.show()

Hope this helps. Happy Learning !!

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply