Problem: Could you please explain how to fetch more than 20 rows from Spark/PySpark DataFrame and also explain how to get the column full value?
1. Solution: Spark DataFrame – Fetch More Than 20 Rows
By default Spark with Scala, Java, or with Python (PySpark), fetches only 20 rows from DataFrame show() but not all rows and the column value is truncated to 20 characters, In order to fetch/display more than 20 rows and column full value from Spark/PySpark DataFrame, you need to pass arguments to the show() method. Let’s see with an example.
Note: If you are looking to display the entire DataFrame with all rows to the console(stdout
) or log file, it’s not advisable as to show the entire dataset, Spark driver needs to pull all records from all workers. If Spark driver memory is not enough to hold all records, it returns OutOfMemory
error and your spark job fails.
1.1 Spark with Scala/Java
Spark show()
method takes several arguments to fetch more than 20 rows & get full column value, following is the examples of the DataFrame show().
df.show() // Show 20 rows & 20 characters for columns
df.show(50) // Show 50 rows
df.show(false) // Show 20 rows with full column value
df.show(50,false) // Show 50 rows & full column value
df.show(20,20,true) // Show 20 rows, column length 20 & displays data in vertical
The First overloaded method that doesn’t take arguments, default returns 20 rows and the column values are truncated to 20 characters.
1.2 PySpark (Spark with Python)
Similarly PySpark show() also takes similar arguments to fetch more than 20 rows & show full DataFrame column value but the usage is slightly different (need to specify the argument name).
# Show 50 rows
df.show(50)
# Show 20 rows with full column value
df.show(truncate=False)
# Show 50 rows & full column value
df.show(50,truncate=False)
# Show 20 rows, column length 20 & displays data in vertical
df.show(n=20,truncate=20,vertical=True)
Happy Learning !!
What is the advantage of using show() vs display()?