• Post author:
  • Post category:PySpark
  • Post last modified:March 27, 2024
  • Reading time:3 mins read
You are currently viewing AttributeError: ‘DataFrame’ object has no attribute ‘map’ in PySpark

Problem: In PySpark I am getting error AttributeError: ‘DataFrame’ object has no attribute ‘map’ when I use map() transformation on DataFrame.



df2=df.map(lambda x: [x[0],x[1]])

  File "C:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\dataframe.py", line 1401, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))

AttributeError: 'DataFrame' object has no attribute 'map'

Solution of AttributeError: ‘DataFrame’ object has no attribute ‘map’ in PySpark

PySpark DataFrame doesn’t have a map() transformation instead it’s present in RDD hence you are getting the error AttributeError: ‘DataFrame’ object has no attribute ‘map’

So first, Convert PySpark DataFrame to RDD using df.rdd, apply the map() transformation which returns an RDD and Convert RDD to DataFrame back, let’s see with an example.


data = [('James',3000),('Anna',4001),('Robert',6200)]
df = spark.createDataFrame(data,["name","salary"])
df.show()

#converts DataFrame to rdd
rdd=df.rdd
print(rdd.collect())

# apply map() transformation)
rdd2=df.rdd.map(lambda x: [x[0],x[1]*20/100])
print(rdd2.collect())

#conver RDD to DataFrame
df2=rdd2.toDF(["name","bonus"])
df2.show()

Hope this helps. Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium