Magnolia for Avro
This describes a library I’ve written for serialization of scala data into Avro format.
Magnolia is a macro-based automatic typeclass generator, used in a number of projects. It ultimately provides similar functionality to the shapeless typeclass generation. It can be used to automatically generate typeclasses for case classes and sealed traits. Magnolia has a good tutorial so I won’t attempt to describe how it works.
Avro is an apache project for data serialization. It is normally used as a binary format serializer (json is an option) and is often associated with Kafka streaming platform, which is how it will be used later in this blog series.
Avro binary format is schema based. Unlike json or xml, in which fields are tagged with the field name in each record, in Avro the binary format is pure data. The schema is required to construct and extract data from the binary records.
Avro, and the tooling around it (e.g. from Confluent), provides mechanisms for schema migration and schema registry. Schema migration allows users of one version of a schema to read data produced by another version, by filling in defaults or transforming values. However, we will not be looking at that, here.
Other tooling allows automatic generation of reading and writing classes - in java for the jvm, but also for other languages. There are also scala libraries - notably avro4s. This latter project provides far more capabilities than the example here, especially if you need to process data from external sources where non-scala field name conventions are used.
#UPDATE For production purposes I am switching to avro4s
The rest of this article
The remainder of this article is split into the following sections:
Source Code
All source code, including build.sbt can be found here. This blog refers to the avro-magnolia sub-project.