Problem: In Spark or PySpark, when you do DataFrame show, it truncates column content that exceeds longer than 20 characters, wondering how to show full column content of a DataFrame as an output?
1. Solution: PySpark Show Full Contents of a DataFrame
In Spark or PySpark by default truncate column content if it is longer than 20 chars when you try to output using show()
method of DataFrame, in order to show the full contents without truncating you need to provide a boolean argument false
to show(false)
method. Following are some examples.
1.1 Spark with Scala/Java
// Shows only 20 characters for each column (Scala/java)
df.show(true)
// Show full column contents of DataFrame (Scala/java)
df.show(false)
// Show top 5 rows and full column contents of DataFrame (Scala/java)
df.show(5,false)
1.2 PySpark (Spark with Python)
// Show full contents of DataFrame (PySpark)
df.show(truncate=False)
// Show top 5 rows and full column contents (PySpark)
df.show(5,truncate=False)
// Shows top 5 rows and only 10 characters of each column (PySpark)
df.show(5,truncate=10)
// Shows rows vertically (one line per column value) (PySpark)
df.show(vertical=True)
Let’s see with an example. First, let’s create a DataFrame with some long data in a column.
val spark:SparkSession = SparkSession.builder()
.master("local[1]")
.appName("SparkByExamples.com")
.getOrCreate()
import spark.implicits._
val columns = Seq("Seqno","Quote")
val data = Seq(("1", "Be the change that you wish to see in the world"),
("2", "Everyone thinks of changing the world, but no one thinks of changing himself."),
("3", "The purpose of our lives is to be happy."))
val df = data.toDF(columns:_*)
df.show()
Yields below output.
// Output:
+-----+--------------------+
|Seqno| Quote|
+-----+--------------------+
| 1|Be the change tha...|
| 2|Everyone thinks o...|
| 3|The purpose of ou...|
+-----+--------------------+
By default, show() method truncate long columns however, you can change this behavior by passing a boolean value false
to show()
method to display the full content.
df.show(false)
This yields the below output.
// Output:
+-----+-----------------------------------------------------------------------------+
|Seqno|Quote |
+-----+-----------------------------------------------------------------------------+
|1 |Be the change that you wish to see in the world |
|2 |Everyone thinks of changing the world, but no one thinks of changing himself.|
|3 |The purpose of our lives is to be happy. |
+-----+-----------------------------------------------------------------------------+
2. PySpark Show Full Contents of a DataFrame
Let’s assume you have a similar DataFrame mentioned above, for PySpark the syntax is slightly different to show the full contents of the columns. Here you need to specify truncate=False
to show() method.
df.show(truncate=False)
This yields same output as above.
Happy Learning !!