Tag Archives: diagnostics

Primitive derived fields in tplot2

One of the features that I wanted to get working in the newer tplot2 is some facility for rendering values which are calculated from the things in the log, even if not directly logged there. Straightforward simple cases would be things like the lengths of vectors, unit conversions, or quaternion to euler angles. You could imagine needing arbitrarily complex values plotted after the fact.

In past systems I’ve designed, I built in a generic scripting interface to allow arbitrary things to be plotted. I’d like to do that here as well eventually, but in the short term I had a need to plot the total normal force exerted on the ground by all stance legs. And I didn’t want to spend a lot of time designing a generic mechanism. Thus, I rigged up a very primitive C++ only mechanism, where a function can be registered which returns an arbitrary serializable structure. That is then rendered in the tplot2 tree view in a dedicated area, and has a pretty “hacky” way of getting its values on the plot if necessary.

With some luck, I’ll get a more robust mechanism in the future, but this works for now.

Video and telemetry synchronization (diagnostics part 8)

This is part of a continuing series on updated diagnostic tools for the mjbots quad A1 robot.  Previous editions are in 1, 2, 3, 4, 5, 6, and 7.  Here I’ll be looking at one of the last pieces of the puzzle, synchronizing the video with the rest of the telemetry.

As mentioned previously, recording video of a robot running is an easy, cheap, and fast way to provide ground truth information on all of the sensors and actuators.  However, it is only truly useful if it can be accurately synchronized in time to the other telemetry streams for the robot.


This was part of the puzzle that I spent a long time thinking about before I got started, as there are several possible options that seemed like they could maybe work:


The concept here would be to put an LED beacon on the robot that is visible from all angles.  It could strobe a synchronizing pattern, like the output from an LFSR which could be identified in the subsequent video frames.

Pros: This should be able to give frame accurate synchronization, and works even for my 1000 fps camera which can’t record audio.

Cons: It is hard to find a good place to mount a light which could be observed from all angles.  The top is the best bet, but I have plans to attach further things there, which would then render synchronization infeasible.


In this concept, I put a microphone on the robot and have it record audio of the environment during its run.  Then standard audio synchronization algorithms can be used to align the two streams.  I actually included a microphone on the most recent version of the pi3 hat to potentially use this approach.

Pros: This has no visibility requirements, and should be able to give synchronization accuracy well under a single frame of video.

Cons: Getting the microphone data off the pi3 hat was looking to be moderately annoying, as the STM32 which it is connected to is already streaming IMU and RF data back to the robot over its single SPI bus.  When I brought up the board, I verified I could get 1kHz audio off, but that isn’t enough to be useful.


This was the idea I had last, and what I am using now.  Here, I slap the side of the robot in a semi-random pattern during the video.  That results in an audio signature in the video, as well as lateral accelerometer readings.

Pros: No additional hardware or software is required anywhere on the robot.

Cons: This has worse accuracy than pure audio, as the IMU is only sampled at 400Hz and doesn’t perfectly correspond to the audio found in the video.


I took a stab at the IMU version, since it looked to be the easiest and still gave decent performance.  I made up a simple python tool which reads in the robot telemetry data, the audio stream of a video file, and lets the user select rough ranges for the audio and video streams to work from.

It then uses scipy.signal.correlate to do its best job of finding an alignment that best matches both data streams, producing a plot of the alignment.


As you can see, the audio rings out for some time after the IMU stops its high frequency response, largely due to the mechanical damping of the robot.  However, it is enough for the correlation to work with and give frame accurate results.

3D rendering in tplot (diagnostics part 7)

In previous posts of this series, I covered some diagnostics improvements I’ve made to help work on more advanced gaits for the mjbots quad A1 (1, 2, 3, 4, 5, 6).  This post will cover the last major new piece of diagnostics I added to tplot2, 3d rendering of telemetry data.

3D rendering

While it should be obvious, I’ll give a little exposition.  tplot2 in its state prior to this could show a “tree view” of all data logged in numeric form.  It had a “plot view” which let you plot any single floating point scalar vs time.  As of recently, it could also render video associated with a given point in time in the log.  However, as anyone who has ever tried to debug a 3d dimensional software application, much less a 3d dimensional robot, can attest, debugging with scalar numbers and time plots is only productive for a very limited range of problems.

I’ve been wanting to extend my plotting tools with 3d rendering for some time, and now have gotten around to a minimal first pass.  The logic itself isn’t terribly complicated.  A separate GL Framebuffer object is created in order to render into a texture, then pretty standard GL vertex and fragment shaders are used to render some triangles and lines.  Initially, I’m just doing the robot body, the commanded and actual feet positions, speeds, and forces, and an estimate of the ground underneath them.



While there is a lot of room for improvement here, both in terms of the visual quality of the existing renderings, and new features that could be rendered, this is already proving itself to be invaluable in diagnosing longstanding problems with the gait motion.

Video in tplot2 (diagnostics part 6)

This is part of a continuing series on diagnostics tooling for the mjbots quad series of robots.  The previous editions can be found at 1, 2, 3, 4, and 5.  Here, I’ll cover the first extension I developed for tplot2 to make it more useful to diagnose dynamic locomotion issues.


Diagnosing problems on robots is hard.  The data rates are high, sensing is imperfect, and there are many state variables to keep track of.  Keeping track of problems that are related to erroneous perception are doubly challenging.  Without a recording of the ground truth of an event, it can be hard to even know if the sensing was off, or if some other aspect was broken.  Fortunately, for things the size and scope of small dynamic quadrupeds, video recording provides a great way to keep a record of the ground truth state of the machine.  Relatively inexpensive equipment can record high resolution images at hundreds of frames a second documenting exactly where all the extremities of the robot were and what it was doing in time.

To take advantage of that, my task here is to get video playback integrated into tplot2, so that the current image from some video can be shown on the screen synchronized with the timeline scrubber.

Making it happen

Here I was able to use large amounts of the code that I developed for the Mech Warfare control application.  I already had the ability to render ffmpeg data to an OpenGL texture.  The missing pieces I needed were getting that texture into an imgui window and adding seek support.

The former was straightforward.  imgui has a Image and ImageButton widgets which allow you to draw an arbitrary OpenGL texture into an imgui widget.

The latter was a little more annoying, only because the ffmpeg API had a slightly unusual behavior.  Even after av_seek_frame was called, one frame from the old point would still be emitted.  This confused my seeking logic, possibly causing it to ignore frames.  However, after discarding that one stale frame, it worked seemingly just fine.


Next I’ll cover the last major piece I added to tplot2 to help with issue diagnosis.

tplot2 (diagnostics part 5)

In previous posts, (1, 2, 3, 4), I covered the updates I made to the underlying serialization and log file format used in mjlib and the quad A1.  This time I’ll talk about the graphical application that uses that data to investigate live operation.


You might note the “2” in the name and realize that yes, this is the second incarnation in the mjmech repository, tplot being the initial.  The original tplot.py was a largely a one-day hack job that glued together the python log bindings I had with matplotlib.  It provided a time scrubber, a tree view, and a plot window where any number of things could be plotted against one another.


It did have a number of problems:

  • Speed: The original tplot read the entirety of the log into memory before letting you view any of it.  Further, while reading it into memory, it converted everything into python structures.  This took some time for even relatively short logs.
  • Coding efficiency: This might seem paradoxical, but developing GUIs in PySide still takes a decent amount of time, even if you don’t care what they look like at all.  Either you have all the overhead of using Qt Designer and thus have to manage either UI file loading or compiling, or you design the layouts in code and have mysterious layout issues because the exact construction requirements to get valid layouts are very hard to determine without looking at the QT source.  There are so many signals to connect to slots, and so much state to manage, and anything non-trivial requires deriving custom widget classes with many virtual methods to overload.
  • Integration with video: Yes, QT has a video subsystem, but it is intended for live playback, not frame accurate seeking, and also has a lot of overhead to use it effectively.
  • Build footprint: Except for tplot, I have moved the entirety of the code and its transitive dependencies for the quad A1 to be built from source under bazel.  This makes cross compiling easy, and well as making cross platform and cross distribution support relatively painless.  While I have converted some large things to bazel (gstreamer), QT and PySide was a bridge too far.
  • Python support: PySide1 only supports QT 4.  QT5 had no permissive python bindings until very recently, which while they are in Ubuntu 20.04, didn’t make it into 18.04.  That isn’t of course a deal-breaker, just an annoyance.


For tplot2, I decided to try my hand using the Dear Imgui library that I used for the Mech Warfare control interface.  It is remarkably concise, very quick to develop for, looks at least “OK”, and has no dependencies other than OpenGL.  Once I had multiple axis support in implot, getting to tplot1 level functionality was remarkably quick, maybe a day of effort in total:



Next up, I’ll cover the improvements that I made to tplot2 that made it worth all the effort.

Log file format (diagnostics part 4)

In parts 1, 2, and 3 I covered some motivation for the updated mjlib diagnostics system and the serialization of individual structures.  In this post, I’ll cover how those structures are written into a file from an embedded system like a robot and how diagnostic tools can access them efficiently.


The top level goals are:

  • Efficient to write live from an embedded system: The quad A1 generates log data currently at 400Hz, consisting of hundreds to thousands of telemetry data points in every update.  It does this on a relatively low-end raspberry pi 3b+.  The format should be able to support writing data at high rates without a significant CPU burden.
  • Efficient seeking by time and record: Readers of the file should be able to efficiently seek by time in the stream, as well as extract all of a single record without having to process unnecessary data from the log.
  • Self contained: While this property  in the log comes from the underlying mjlib serialization format, it is worth re-iterating here.  All information necessary to return a JSON or CSV like structure for each instance should be present within the log.


The detailed design of the log format is documented at README.md, here I will give a brief summary.

The log consists of a header followed by a series of “Blocks” concatenated together.  The two primary block types are one that contain the schema for an individual record and one for the data.  For a given record the schema will only be present once in the log, typically near the beginning.  The data block, contains a single serialized instance of the record, along with some optional flags and data.  The optional flags include a timestamp, a checksum, whether the data is compressed, and a pointer to the most recent data block for this record.

Another block is the SeekMarker block, which contains a timestamp and a 64 bit long unique-ish byte code and a checksum.  When readers need to perform random seeks in the log, they can binary search to an arbitrary byte offset, then search to find an instance of this unique code.  If it is present in conjunction with the necessary header and a validated checksum, it can be assumed that the framing has been recovered and the time for that point in the log.

Finally, there is an Index block, written at the very end of the log.  This includes pointers to the schema entries for all records in the file, as well as the most recent data block for that record.  That allows readers to find the set of records in a log, and extract a single record (albeit backwards) from the log while reading no extra data.

Future extensions

Most of the entities in the log have flag bitmasks to control additional future features or extensions.  Current readers throw errors when unknown bits are discovered, which makes it safe to almost arbitrarily modify the log structure at the expense of forward compatibility.

The mostly likely extensions are related to compression.  The current per-data compression format is snappy, from google.  It is fast, but has relatively poor compresson performance.  At some point, I’d like to switch to Zstandard, which has even better runtime performance, much better compression performance, and supports incremental dictionary manipulation.  I have actually integrated into in a test manner into the C++ writer and reader and the effort was trivial, however the other languages that I support, python and TypeScript are more challenging.  With snappy, there are operating system provided packages that work just fine in Debian and Ubuntu, but not so for Zstandard.  Bazel has rules that support pulling in pip packages for python and npm for TypeScript, but both of those mechanisms don’t have very straightforward support for the recursive WORKSPACE workarounds I am using now.  For now, it is easiest just to stick to snappy.


Now that we have the data structures out of the way, I’ll move on to the tools that use them!

C++ serialization API (diagnostics part 3)

In the previous issue in this series, I described the schema and data elements of the mjlib serialization format.  Here, I’ll describe the API used to convert between C++ structures and the corresponding schema and data serializations.

First, I’ll start by saying this API is far from perfect.  It hits a certain tradeoff in the design space that may not be appropriate for every system.  I have developed and used similar APIs professionally both at Jaybridge and TRI, so it has seen use in millions of lines of code, but not billions by any stretch.  It is also mostly orthogonal to the rest of the design, and alternate serialization APIs could be built while still maintaining the performance and schema evolution properties described in parts 1 and 2.  Now with that out of the way, the library API:

Structure annotation

Structures are annotated for serialization in one of two ways, either intrusively or externally.  Intrusive serialization is the easiest if the structures are under your control, while external serialization can be used for structures from libraries or other systems.

The intrusive interface requires defining a templated visitor method, in the same vein as boost serialization.  This is a single method template, which accepts an unknown “archive” and calls the “Visit” method on the archive for all children of the structure.  It looks like:

struct MyStruct {
  int32_t field1 = 0;
  std::string field2;
  std::vector<double> field3;

  template <typename Archive>
  void Serialize(Archive* a) {

There is a helper macro named MJ_NVP which is just used to capture the textual name of the field as well as its address without duplication.  It can be equivalently written as:

  a->Visit(mjlib::base::MakeNameValuePair("field1", &field1));

with more verbosity.

Serialization and Deserialization

Once a structure has been annotated, then binary schema and data blobs can be generated through various writing classes:

namespace tl = mjlib::telemetry;

// Generate a binary schema
std::string binary_schema = 

// Generate a binary data
MyStruct my_struct;
std::string binary_data = 

When reading data, there is one class which parses the schema, and another which allows reading of the data back into a C++ structure while accounting for schema evolution rules.

tl::BinarySchemaParser parsed_schema{binary_schema};
tl::MappedBinaryReader reader{&parsed_schema};
MyStruct reconstituted_my_struct = reader.Read(binary_data);

These quick examples used the std::string value interface, but there exist interfaces for reading into existing structures as well as operating on streams of data instead of std::string.

Comparison to other systems

While some systems, notably boost serialization use this templated visitor pattern, many other C++ serialization schemes use a separate code generation step.  That includes most of the modern ones like protobuf, flatbuffers, capnproto, etc.  Here, C++ was chosen instead to minimize build complexity and permit the natural use of existing C++ structures.  For instance, mjlib defines an external visitor for Eigen matrices (a C++ linear algebra library).  That allows one to write:

struct MyStruct {
  Eigen::Vector3d point;
  Eigen::Matrix4f matrix;

  template <typename Archive>
  // ...

And have it “just work”.

The API is also sufficiently general to implement memcpy optimization for structures that are suitable candidates.

Secondly, structures annotated with templated visitor pattern can be used to implement many other types of transformations as well, such as JSON serialization and deserialization or command line parsing.


Next in this series I’ll talk about the file format used to record the binary schema and data elements over time from an embedded system.

Revised mjlib serialization design (diagnostics part 2)

As discussed previously, I recently significantly revised the serialization format used by the mjbots quad A1 based on experience in previous professional domains, and from studying newer external projects like Apache AVRO.  Here I’ll describe the design of the serialized representation, which is more completely defined at: mjlib/telemetry/README.md

Refresher and definitions

As a brief refresher, this serialization format is intended to be used primarily to record telemetry from embedded systems, where that telemetry data may be persisted on disk for a long time.  Secondarily, it can be used to inspect the results of a live system.  The primitive it operates on is a “record”, which is logically a structure of elements which is emitted at some intervals over time.  For any given record, it logically breaks it up into a “schema” and a “data” portion.  The schema describes what types of elements are present in the structure, their names and relationships.  The “data” portion contains the minimum amount of information necessary to communicate one instance of the structure, assuming that the receiver already has a copy of the schema.


A schema consists of one “type”.  There exist a number of “primitive” types which directly, or close to directly, map to machine storage.  For instance an abbreviated subset:

  • boolean can be true or false
  • float64 is a 64 bit floating point value
  • fixeduint is an unsigned integer of size 1, 2, 4, or 8
  • varuint is an unsigned integer of dynamic encoding length
  • string is a sequence of UTF-8 characters
  • bytes is a sequence of arbitrary bytes

After that, there are “complex” types, which consist of:

  • object is a list of fields, each with its own type
  • enum is an unsigned integer, along with a mapping from those integers to strings
  • array is a variable length array of some other type
  • fixedarray is a fixed length array of some other type
  • map is a mapping from strings to another type
  • union is an index discriminated union between multiple types


The data associated with each type is a direct mapping for the primitive types.  For the “complex” types, the associated data is as follows:

  • object the data consists of the data from each field in order
  • enum the data consists of a single unsigned integer
  • array the data consists of a size, followed by that many instances of the types data
  • fixedarray consists of the types data repeated the number of times from the schema
  • map just consists of the keys and values from the map
  • union contains a single unsigned integer index, followed by the selected type’s data


For both the schema and the data there are two encodings defined, a JSON* one, and a binary one.  The JSON data encoding is what would be traditionally exchanged in Javascript applications.  It is not completely minimal, since field names and object and list delimiters are present.  For example, a simple object type consisting of a boolean, a string, and a list of fixedint might have a data representation in JSON like:

  "field1" : true,
  "field2" : "my string data",
  "field3" : [4, 5, 6],

The JSON schema encoding contains the entirety of the information from the schema.  For the above record it might look like:

  "type" : "object",
  "name" : "MyObject",
  "aliases" : ["AnOldName"],
  "fields" : [
    { "name" : "field1", "type" : "boolean" },
    { "name" : "field2", "type" : "string" },
    { "name" : "field3", "type" : "array", "items" : "fixedint32" }

A binary encoding for both the schema and the data is defined as well.  The schema is straightforward, if uninteresting and can be found in the README.  The data encoding for the primitive types for those which have direct machine analogs are the little endian machine representation.  The object data binary representation is merely the concatenation of all the field’s data fields.  This makes it possible to construct record definitions that exactly match a useful set of in memory structures to make serialization for those structures be a noop.

Next steps

In the next issue of this series, I’ll describe the C++ API for serializing and deserializing objects.

*Actually JSON5, which supports comments and final trailing commas among other improvements for human readability.

Updated serialization library (diagnostics part 1)

Now that I have the qdd100 servo in beta phase, the IMU working at full rate, and the quad A1 is moving around I’m getting closer to actually working to improve the gaits that the machine can execute.  To date, the gaits I have used completely ignore the IMU and only use the feedback from the joints in order to maintain force in 3D.  With tuning and on controlled surfaces this can work well, but if you go outside the happy regime, then it can undergo significant pitch and roll movements during the leg swing phase, which at best results in a janky walk, and at worst results in oscillation or outright instability.

There are also a number of as-yet-unidentified problems that seemingly cause the feet to not track the ground position properly, resulting in the feet slipping on the floor despite being nearly fully loaded.

To tackle all these new domains requires some improvements to my diagnostics infrastructure and tools.  I’ll cover the improvements I’ve made in a few posts, since the work that has gone into it has covered a fair amount of ground.  I’ll start with something I mostly completed back in the summer of 2019 and has the least direct impact, but gives at least a background for some of the other upcoming changes.

Telemetry format

Super Mega Microbot since its inception in 2014 used a self-describing serialization and telemetry format that was loosely based on work I had done professionally previously at Bluefin Robotics and then Jaybridge Robotics.  This format was then the basis for later work at Jaybridge and Toyota Research Institute.  The basic idea breaks down like this:

  • The schema which describes the data and the data are separate entities
  • The schema is recorded alongside the data whenever it is written to persistent storage
  • The schema contains sufficient information to reconstruct a CSV or JSON like representation of the data with no additional meta-data
  • Structure tools can map a given on disk-schema to a possibly different in-memory one using a schema evolution algorithm
  • The data is serialized and stored in a manner which is very efficient to write at high rates from realtime processes

Compared to other serialization mechanisms, this has different trade-offs.

  • Formats like JSON, XML, either completely include the schema in each data instance, or include a large amount of self-describing information in each data instance that is not strictly necessary to represent it
  • Formats like protobuf, capnproto, flatbuffers, and SBE have a different tradeoff.  They are geared towards performance, but largely also assume a single canonical source of schema data that is shared through an independent side channel and has a single linear revision history.  This makes sense for server RPC, where client and server are each distributed (possibly different) versions of the schema and want to communicate without having to exchange it.  They also include more metadata in the data stream than is strictly required many of them are more expensive to serialize or deserialize.
  • The closest to this work is Apache AVRO.  It uses the same principle of separate schema and data, and expects the schema to be stored alongside the data.  It also requires no code generation, which many of the above tools do require.

The unique pieces in this work over AVRO are that:

  • The data format is such that many common in-memory structures can simply be bit copied as serialized data with no further effort.  Those that do require some manipulation still require no additional in-memory structures associated with serialization.  This combines the properties of protobuf in that the serialization objects can be used as mutable state, with those of capnproto that allows zero cost serialization.
  • No recursion or pointers are supported, which renders the necessary code very simple.  The entirety of the C++ serialization and deserialization library is only a few hundred lines of code and took less than a week overall to write, unit test, and debug over the 6 years I’ve been using it.  It also functions perfectly fine in microcontroller-based embedded environments like the moteus controller.
  • The on-disk format is designed for rapid random seek access in time, assuming that small-ish records are written regularly.

The downsides are that it isn’t widely supported, isn’t optimized to handle single structures which have very large serialized representations, and the only language bindings aside from C++ are read only ones for python and TypeScript.

In future articles, I’ll describe a bit of the detail of the recently revised design, then go into the tools that use it.


Multiple axes in implot

I used Dear Imgui for the simple Mech Warfare control application I built earlier and was relatively impressed with the conciseness with which one could develop effective (although not necessarily the prettiest), interactive and response user interfaces in C++.  For some time I had been planning on developing a new diagnostic application for the mjbots quad that would allow plotting like the original tplot.py, but would also integrate recorded video and 3D rendering and diagnostics.  I had assumed I would use HTML/JS because it is the cool new thing, but I never got up the energy to make it happen, because every technical step along the way had big hurdles.  I figured I would give Dear Imgui a try, but the big thing it was missing was plotting support.

In the original tplot.py, I used matplotlib for plotting integration.  It is a high quality python library that can make interactive plots in nearly every imaginable form as well as production quality static plots.  It integrates with a number of GUI toolkits, in tplot I used it along with PySide.  The downside is, that given that it supports nearly anything under the sun, the code itself is relatively complex and hard to tweak.  In order to make tplot.py support multiple axes I had to do some careful source inspection to figure out which undocumented things could be poked.

Dear ImGui itself has a bare bones plotting system, but that doesn’t have anywhere near the feature set I would need.  The next system I seriously considered is implot.  It is very new, as in its repository is only a few weeks old, but already supported most of what I needed for a diagnostic tool.  The biggest thing it didn’t have was support for multiple Y axes.

So I took a stab at adding them!

One weekend later, I was largely successful:


Only a day after that and Evan had fixed up a few remaining problems and got it merged into master: https://github.com/epezent/implot/commit/5eb4b713849