C++ serialization API (diagnostics part 3)

In the previous issue in this series, I described the schema and data elements of the mjlib serialization format.  Here, I’ll describe the API used to convert between C++ structures and the corresponding schema and data serializations.

First, I’ll start by saying this API is far from perfect.  It hits a certain tradeoff in the design space that may not be appropriate for every system.  I have developed and used similar APIs professionally both at Jaybridge and TRI, so it has seen use in millions of lines of code, but not billions by any stretch.  It is also mostly orthogonal to the rest of the design, and alternate serialization APIs could be built while still maintaining the performance and schema evolution properties described in parts 1 and 2.  Now with that out of the way, the library API:

Structure annotation

Structures are annotated for serialization in one of two ways, either intrusively or externally.  Intrusive serialization is the easiest if the structures are under your control, while external serialization can be used for structures from libraries or other systems.

The intrusive interface requires defining a templated visitor method, in the same vein as boost serialization.  This is a single method template, which accepts an unknown “archive” and calls the “Visit” method on the archive for all children of the structure.  It looks like:

struct MyStruct {
  int32_t field1 = 0;
  std::string field2;
  std::vector<double> field3;

  template <typename Archive>
  void Serialize(Archive* a) {

There is a helper macro named MJ_NVP which is just used to capture the textual name of the field as well as its address without duplication.  It can be equivalently written as:

  a->Visit(mjlib::base::MakeNameValuePair("field1", &field1));

with more verbosity.

Serialization and Deserialization

Once a structure has been annotated, then binary schema and data blobs can be generated through various writing classes:

namespace tl = mjlib::telemetry;

// Generate a binary schema
std::string binary_schema = 

// Generate a binary data
MyStruct my_struct;
std::string binary_data = 

When reading data, there is one class which parses the schema, and another which allows reading of the data back into a C++ structure while accounting for schema evolution rules.

tl::BinarySchemaParser parsed_schema{binary_schema};
tl::MappedBinaryReader reader{&parsed_schema};
MyStruct reconstituted_my_struct = reader.Read(binary_data);

These quick examples used the std::string value interface, but there exist interfaces for reading into existing structures as well as operating on streams of data instead of std::string.

Comparison to other systems

While some systems, notably boost serialization use this templated visitor pattern, many other C++ serialization schemes use a separate code generation step.  That includes most of the modern ones like protobuf, flatbuffers, capnproto, etc.  Here, C++ was chosen instead to minimize build complexity and permit the natural use of existing C++ structures.  For instance, mjlib defines an external visitor for Eigen matrices (a C++ linear algebra library).  That allows one to write:

struct MyStruct {
  Eigen::Vector3d point;
  Eigen::Matrix4f matrix;

  template <typename Archive>
  // ...

And have it “just work”.

The API is also sufficiently general to implement memcpy optimization for structures that are suitable candidates.

Secondly, structures annotated with templated visitor pattern can be used to implement many other types of transformations as well, such as JSON serialization and deserialization or command line parsing.


Next in this series I’ll talk about the file format used to record the binary schema and data elements over time from an embedded system.