You are currently viewing Spark DataFrame – Fetch More Than 20 Rows & Column Full Value

Problem: Could you please explain how to fetch more than 20 rows from Spark/PySpark DataFrame and also explain how to get the column full value?

1. Solution: Spark DataFrame – Fetch More Than 20 Rows

By default Spark with Scala, Java, or with Python (PySpark), fetches only 20 rows from DataFrame show() but not all rows and the column value is truncated to 20 characters, In order to fetch/display more than 20 rows and column full value from Spark/PySpark DataFrame, you need to pass arguments to the show() method. Let’s see with an example.

Note: If you are looking to display the entire DataFrame with all rows to the console(stdout) or log file, it’s not advisable as to show the entire dataset, Spark driver needs to pull all records from all workers. If Spark driver memory is not enough to hold all records, it returns OutOfMemory error and your spark job fails.

1.1 Spark with Scala/Java

Spark show() method takes several arguments to fetch more than 20 rows & get full column value, following is the examples of the DataFrame show().


df.show() // Show 20 rows & 20 characters for columns
df.show(50) // Show 50 rows
df.show(false) // Show 20 rows with full column value
df.show(50,false) // Show 50 rows & full column value
df.show(20,20,true) // Show 20 rows, column length 20 & displays data in vertical

The First overloaded method that doesn’t take arguments, default returns 20 rows and the column values are truncated to 20 characters.

1.2 PySpark (Spark with Python)

Similarly PySpark show() also takes similar arguments to fetch more than 20 rows & show full DataFrame column value but the usage is slightly different (need to specify the argument name).


# Show 50 rows
df.show(50) 

# Show 20 rows with full column value
df.show(truncate=False) 

# Show 50 rows & full column value
df.show(50,truncate=False) 

# Show 20 rows, column length 20 & displays data in vertical
df.show(n=20,truncate=20,vertical=True)

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has One Comment

  1. Fernandez Saenz

    What is the advantage of using show() vs display()?

Comments are closed.