Spark’s or PySpark’s support for various Python, Java, and Scala versions advances with each release, embracing language enhancements and optimizations. So, it is important to understand what Python, Java, and Scala versions Spark/PySpark supports to leverage its capabilities effectively.
Related: Spark 3.5.0 Compatible Java and Scala Versions
Using an incorrect or unsupported Python, Java, or Scala version with Spark might result in various issues or errors when running Spark applications or working within the Spark environment; hence, it is always best practice to install the right compatibility versions.
Spark Compatibility Matrix of Java and Scala
If you are using Spark with Scala, you must install the right Java and Scala versions. Here’s a table summarizing Spark versions along with their compatible Java and Scala versions:
Spark Version | Compatible Java Versions | Compatible Scala Versions |
---|---|---|
Spark 1.x | Java 7 or later | Scala 2.10.x |
Spark 2.0 – 2.3 | Java 7 or later | Scala 2.11.x |
Spark 2.4.x | Java 8, 11 | Scala 2.11.x, 2.12.x |
Spark 3.0.x | Java 8, 11, 16 | Scala 2.12.x |
Spark 3.1.x | Java 8, 11, 16 | Scala 2.12.x, 2.13.x |
Spark 3.2.x | Java 8, 11, 16 | Scala 2.12.x, 2.13.x |
Spark 3.3.x | Java 8, 11, 16 | Scala 2.12.x, 2.13.x |
Spark 3.4.x | Java 8, 11, 16 | Scala 2.12.x, 2.13.x |
Spark 3.5.x | Java 8, 11, 17 | Scala 2.12.x, 2.13.x |
While these are common compatibilities for each Spark version, it’s always advisable to refer to the official Spark documentation or release notes for the most accurate and updated information regarding compatible Java and Scala versions for a specific Spark release. Compatibility might vary slightly or be enhanced in minor releases within a major Spark version.
PySpark Compatible Matrix of Python Versions
If you use Spark with Python (PySpark), you must install the right Java and Python versions. Here’s a table summarizing PySpark versions along with their compatible and supported Python versions:
PySpark Version | Compatible Python Versions |
---|---|
PySpark 1.x | Python 2.6, 2.7 |
PySpark 2.0 – 2.3 | Python 2.7, 3.4, 3.5, 3.6 |
PySpark 2.4.x | Python 3.5, 3.6, 3.7, 3.8 |
PySpark 3.0.x | Python 3.6, 3.7, 3.8, 3.9 |
PySpark 3.1.x | Python 3.6, 3.7, 3.8, 3.9 |
PySpark 3.2.x | Python 3.6, 3.7, 3.8, 3.9 |
PySpark 3.3.x | Python 3.6, 3.7, 3.8, 3.9 |
PySpark 3.4.x | Python 3.6, 3.7, 3.8, 3.9 |
PySpark 3.5.x | Python 3.8, 3.9 |
Please note that these are general compatibility guidelines for PySpark versions and Python versions. Always refer to the official documentation or release notes for the specific PySpark version you use for the most accurate and updated information regarding compatible Python versions. Compatibility might vary slightly or be enhanced in minor releases within a major PySpark version.
Happy Learning !!
For the Scala table:
3.4.x
https://learn.microsoft.com/en-us/azure/databricks/release-notes/runtime/#:~:text=Mar%2011%2C%202024-,13.3%20LTS,Aug%2022%2C%202026,-13.2
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/sdk-java#create-a-cluster-that-uses-jdk-17:~:text=JDK%208%20is%20fully%20supported.%20JDK%2017%20is%20in%20Public%20Preview%20for%20Databricks%20Runtime%20versions%2013.1%20and%20above.