Site icon Spark By {Examples}

Spark Convert Avro file to JSON

Spark Convert Avro to JSON

In this Spark article, you will learn how to convert Avro file to JSON file format with Scala example, In order to convert first, we will read an Avro file into DataFrame and write it in a JSON file.

1. What is Apache Avro

Apache Avro is an open-source, row-based, data serialization and data exchange framework for Hadoop projects, originally developed by databricks as an open-source library that supports reading and writing data in Avro file format. it is mostly used in Apache Spark especially for Kafka-based data pipelines. When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program.

It has build to serialize and exchange big data between different Hadoop based projects. It serializes data in a compact binary format and schema is in JSON format that defines the field names and data types.

2. Avro Advantages

3. Read Avro File

avro() function is not provided in Spark DataFrameReader  hence, we should use DataSource format as “avro” or “org.apache.spark.sql.avro” and load() is used to read the Avro file.


  // Read avro file
  val df = spark.read.format("avro")
    .load("src/main/resources/zipcodes.avro")
  df.show()
  df.printSchema()

In case, if you have Avro data partitioned, use where() function to load a specific partition, below snippet loads an Avro file with Zipcode 19802


spark.read
      .format("avro")
      .load("zipcodes_partition.avro")
      .where(col("Zipcode") === 19802)
      .show()

If you want to read more on Avro, I would recommend checking how to Read and Write Avro file with a specific schema along with the dependencies it needed.

4. Spark Convert Avro to JSON file

In the previous section, we have read the Avro file into DataFrame now let’s convert it to JSON by saving it to JSON file format.


  // Convert to json
  df.write.mode(SaveMode.Overwrite)
    .json("/tmp/json/zipcodes.json")

Alternatively, you can also write


df.write
.json("/tmp/json/zipcodes.json")

If you want to read more on JSON, I would recommend checking how to Read and Write JSON file with a specific schema.

5. Complete Example of convert Avro file to JSON file format


package com.sparkbyexamples.spark.dataframe

import org.apache.spark.sql.{SaveMode, SparkSession}

object AvroToJson extends App {

  val spark: SparkSession = SparkSession.builder()
    .master("local[1]")
    .appName("SparkByExample")
    .getOrCreate()

  spark.sparkContext.setLogLevel("ERROR")

  // Read avro file
  val df = spark.read.format("avro")
    .load("src/main/resources/zipcodes.avro")
  df.show()
  df.printSchema()

  // Convert to json
  df.write.mode(SaveMode.Overwrite)
    .json("/tmp/json/zipcodes.json")
}

Conclusion

In this Spark article, you have learned how to convert an Avro file to a JSON file format with Scala examples. Though we literally don’t convert from Avro format to JSON straight, first we convert it to DataFrame and then DataFrame can be saved to any format Spark supports.

Happy Learning !!

Exit mobile version