Tag Archives: c++

Native moteus tools for Windows

To date, all of the development tools for the moteus brushless controller have been available exclusively for Linux based operating systems. I’ve been doing some behind the scenes work, and have gotten to the point where moteus_tool now runs natively on windows and can communicate with moteus controllers using a fdcanusb.

Check out the Windows installer for the latest release:

To make this work, I started from the excellent grailbio/bazel-toolchain, which provides LLVM toolchains for Linux based systems based on the official LLVM pre-compiled binaries. I forked that into mjbots/bazel-toolchain and added Windows support. It isn’t perfect, because the LLVM project only distributes Windows binaries in installer form, and it isn’t possible to extract binaries from them without specialized tooling. So, this version relies on a manually re-packed compressed archive of all the executables.

I also added support for building the libc++ standard library, and using that instead of the MSVC standard library. This let me get c++20 coroutines working with clang on Windows.

All put together, the porting was pretty painless after having a toolchain in place. Just a few #if’s here and there, and I had to write a custom Windows specific console stream, as stdio and stderr do not support asynchronous completion ports on Windows.

New compilation commands for moteus

To stay on top of bazel development, and to prepare for some future improvements, I’ve gone ahead and upgraded the moteus firmware build system, rules_mbed to use the new bazel “platforms” toolchain resolution mechanism.

Previously, rules_mbed used the “crosstool_top” bazel mechanism for toolchain configuration. This allowed a single package to contribute a set of C++ toolchains which would be selected based on CPU and compiler. One of the downsides from the rules_mbed perspective, is that it made it difficult to make a build that included both mbed targets and host targets (or anything else non-mbed). rules_mbed worked around this by including a functioning clang host toolchain within it.

With the new toolchain resolution support, at the command line you specify to bazel what “platform” you want to be the final target. The updated rules_mbed specifies a “platform” for each of the STM32 processors that are supported. So for instance you could use:

bazel build --platforms=@rules_mbed//:stm32g4 //:target

It then uses that platform to find a toolchain with a compatible operating system and CPU, which for rules_mbed is the arm-gcc compiler for the correct chip.

Because of this, the command lines necessary to compile the moteus firmware and host side tools have changed. Rather than expose the raw bazel options, it now just uses a bazel config to abstract away whatever mechanisms are required. So, the two necessary new command lines are:

tools/bazel test --config=target //:target  # build firmware
tools/bazel test --config=host //:host # build host tools

Soon, we’ll use this capability to add some useful new features to the moteus tools, such as support for non-linux operating systems.

Measuring the pi3hat r4.2 performance

Last time I covered the new software library that I wrote to help use all the features of the pi3hat, in an efficient manner. This time, I’ll cover how I measured the performance of the result, and talk about how it can be integrated into a robotic control system.

pi3hat r4.2 available at mjbots.com

Test Setup

To check out the timing, I wired up a pi3hat into the quad A1 and used the oscilloscope to probe one of the SPI clocks and CAN bus 1 and 3.

Then, I could use pi3hat_tool incantations to experiment with different bus utilization strategies and get to one with the best performance. The sequence that I settled on was:

  1. Write all outgoing CAN messages, using a round-robin strategy between CAN buses. The SPI bus rate of 10Mhz is faster than the 5Mbps maximum CAN-FD rate, so this gets each bus transmitting its first packet as soon as possible, then queues up the remainder.
  2. Read the IMU. During this phase, any replies over CAN are being enqueued on the individual STM32 processors.
  3. Optionally read CAN replies. If any outgoing packets were marked as expecting a reply, that bus is expected to receive the appropriate number of responses. Additionally, a bus can be requested to “get anything in the queue”.

With this approach, a full command and query of the comprehensive state of 12 qdd100 servos, and reading the IMU takes around 740us. If you perform that on one thread while performing robot control on others, it allows you to achieve a 1kHz update rate.

CAN1 SPI clock on bottom, CAN1 and CAN3 bus on top

These results were with the Raspberry Pi 3b+. On a Raspberry Pi 4, they seem to be about 5% better, mostly because the Pi 4’s faster CPU is able to execute the register twiddling a little faster, which reduces dead time on the SPI bus.

Bringing up the pi3hat r4.2

The pi3hat r4.2, now in the mjbots store, has only minor hardware changes from the r4 and r4.1 versions. What has changed in a bigger way is the firmware, and the software that is available to interface with it. The interface software for the previous versions was tightly coupled to the quad A1s overall codebase, that made it basically impossible to use with without significant rework. So, that rework is what I’ve done with the new libpi3hat library:

It consists of a single C++11 header and source file with no dependencies aside from the standard C++ library and bcm_host.h from the Raspberry Pi firmware. You can build it using the bazel build files, or just copy the source file into your own project and build with whatever system you are using.


Using all of the pi3hat’s features in a runtime performant way can be challenging, but libpi3hat makes it not so bad by providing an omnibus call which sequences accesses to all the CAN buses and peripherals in a way that maximizes pipelining and overlap between the different operations, while simultaneously maximizing the usage of the SPI bus. The downside is that it does not use the linux kernel drivers for SPI and thus requires root access to run. For most robotic applications, that isn’t a problem, as the controlling computer is doing nothing but control anyways.

This design makes it feasible to operate at least 12 servos and read the IMU at rates over 1kHz on a Raspberry Pi.


There is a command line tool, pi3hat_tool which provides a demonstration of how to use all the features of the library, as well as being a useful diagnostic tool on its own. For instance, it can be used to read the IMU state:

# ./pi3hat_tool --read-att
ATT w=0.999 x=0.013 y=-0.006 z=-0.029  dps=(  0.1, -0.1, -0.1) a=( 0.0, 0.0, 0.0)

And it can be used to write and read from the various CAN buses.

# ./pi3hat_tool --write-can 1,8001,1300,r \
                --write-can 2,8004,1300,r \
                --write-can 3,8007,1300,r
CAN 1,100,2300000400
CAN 2,400,2300000400
CAN 3,700,230000fc00

You can also do those at the same time in a single bus cycle:

# ./pi3hat_tool --read-att --write-can 1,8001,1300,r
CAN 1,100,2300000400
ATT w=0.183 x=0.692 y=0.181 z=-0.674  dps=(  0.1, -0.0,  0.1) a=(-0.0, 0.0,-0.0)

Next steps

Next up I’ll demonstrate my performance testing setup, and what kind of performance you can expect in a typical system.

C++ serialization API (diagnostics part 3)

In the previous issue in this series, I described the schema and data elements of the mjlib serialization format.  Here, I’ll describe the API used to convert between C++ structures and the corresponding schema and data serializations.

First, I’ll start by saying this API is far from perfect.  It hits a certain tradeoff in the design space that may not be appropriate for every system.  I have developed and used similar APIs professionally both at Jaybridge and TRI, so it has seen use in millions of lines of code, but not billions by any stretch.  It is also mostly orthogonal to the rest of the design, and alternate serialization APIs could be built while still maintaining the performance and schema evolution properties described in parts 1 and 2.  Now with that out of the way, the library API:

Structure annotation

Structures are annotated for serialization in one of two ways, either intrusively or externally.  Intrusive serialization is the easiest if the structures are under your control, while external serialization can be used for structures from libraries or other systems.

The intrusive interface requires defining a templated visitor method, in the same vein as boost serialization.  This is a single method template, which accepts an unknown “archive” and calls the “Visit” method on the archive for all children of the structure.  It looks like:

struct MyStruct {
  int32_t field1 = 0;
  std::string field2;
  std::vector<double> field3;

  template <typename Archive>
  void Serialize(Archive* a) {

There is a helper macro named MJ_NVP which is just used to capture the textual name of the field as well as its address without duplication.  It can be equivalently written as:

  a->Visit(mjlib::base::MakeNameValuePair("field1", &field1));

with more verbosity.

Serialization and Deserialization

Once a structure has been annotated, then binary schema and data blobs can be generated through various writing classes:

namespace tl = mjlib::telemetry;

// Generate a binary schema
std::string binary_schema = 

// Generate a binary data
MyStruct my_struct;
std::string binary_data = 

When reading data, there is one class which parses the schema, and another which allows reading of the data back into a C++ structure while accounting for schema evolution rules.

tl::BinarySchemaParser parsed_schema{binary_schema};
tl::MappedBinaryReader reader{&parsed_schema};
MyStruct reconstituted_my_struct = reader.Read(binary_data);

These quick examples used the std::string value interface, but there exist interfaces for reading into existing structures as well as operating on streams of data instead of std::string.

Comparison to other systems

While some systems, notably boost serialization use this templated visitor pattern, many other C++ serialization schemes use a separate code generation step.  That includes most of the modern ones like protobuf, flatbuffers, capnproto, etc.  Here, C++ was chosen instead to minimize build complexity and permit the natural use of existing C++ structures.  For instance, mjlib defines an external visitor for Eigen matrices (a C++ linear algebra library).  That allows one to write:

struct MyStruct {
  Eigen::Vector3d point;
  Eigen::Matrix4f matrix;

  template <typename Archive>
  // ...

And have it “just work”.

The API is also sufficiently general to implement memcpy optimization for structures that are suitable candidates.

Secondly, structures annotated with templated visitor pattern can be used to implement many other types of transformations as well, such as JSON serialization and deserialization or command line parsing.


Next in this series I’ll talk about the file format used to record the binary schema and data elements over time from an embedded system.

Multiple axes in implot

I used Dear Imgui for the simple Mech Warfare control application I built earlier and was relatively impressed with the conciseness with which one could develop effective (although not necessarily the prettiest), interactive and response user interfaces in C++.  For some time I had been planning on developing a new diagnostic application for the mjbots quad that would allow plotting like the original tplot.py, but would also integrate recorded video and 3D rendering and diagnostics.  I had assumed I would use HTML/JS because it is the cool new thing, but I never got up the energy to make it happen, because every technical step along the way had big hurdles.  I figured I would give Dear Imgui a try, but the big thing it was missing was plotting support.

In the original tplot.py, I used matplotlib for plotting integration.  It is a high quality python library that can make interactive plots in nearly every imaginable form as well as production quality static plots.  It integrates with a number of GUI toolkits, in tplot I used it along with PySide.  The downside is, that given that it supports nearly anything under the sun, the code itself is relatively complex and hard to tweak.  In order to make tplot.py support multiple axes I had to do some careful source inspection to figure out which undocumented things could be poked.

Dear ImGui itself has a bare bones plotting system, but that doesn’t have anywhere near the feature set I would need.  The next system I seriously considered is implot.  It is very new, as in its repository is only a few weeks old, but already supported most of what I needed for a diagnostic tool.  The biggest thing it didn’t have was support for multiple Y axes.

So I took a stab at adding them!

One weekend later, I was largely successful:


Only a day after that and Evan had fixed up a few remaining problems and got it merged into master: https://github.com/epezent/implot/commit/5eb4b713849

C++20 coroutines and moteus_tool

I’ve had a confusing mismash of development tools for the moteus servos for a while now.  My original development tool was in python, which worked just fine.  Coroutines allowed me to express complex asynchronous logic succinctly, the program itself was rather simple, and I could easily integrate it with matplotlib for plotting.  However, when looking to run this on the raspberry pi, I needed a newer python version than came with raspbian, which turned out to be a royal pain to get installed in a repeatable manner.  Thus I rewrote a portion of the moteus_tool in C++ and just used my normal cross-compiling toolchain to generate the binaries.  What I didn’t do was port the calibration logic, as the state machine required with standard boost::asio would have bloated the logic size by 5x, and I didn’t really need to calibrate servos from the raspberry pi ever.

Still, I’ve wanted to consolidate these tools for a while now, and while working towards other telemetry and development tool goals, I decided to make another pass at removing the duplicity.  I figured it was time to try using the new C++20 coroutines to implement the asynchronous logic in C++ and see if I could get rid of the python tool.  Since I’m currently vendoring all the compilers and libraries for the C++ applications, I am already running clang-9.0 with libc++ and boost 1.72 for all the host side tools which theoretically should all support coroutines just fine.

Why coroutines?

To recap, typical callback based boost::asio code looks something like this:

void DoSomething() {
    boost::asio::buffer("command to send"),
    std::bind(&Class::DoneWriting, this, pl::_1));

void DoneWriting(boost::system::error_code ec) {
    std::bind(&Class::HandleRead, this, pl::_1));

void HandleRead(boost::system::error_code ec) {
  // Do something with the result.

This gets even worse if you have embedded control flow.  Typically the best you can do is construct an object to hold the state, and bind it around to keep track of things:

struct Context {
  int iteration_count = 0;

void Start() {
  auto ctx = std::make_shared<Context>();

void Write(std::shared_ptr<Context> ctx) {
  message_ = fmt::format("{}", ctx->iteration_count);
    std::bind(&Class::HandleWrite, this, pl::_1, ctx));

void HandleWrite(boost::system::error_code ec,
                 std::shared_ptr<Context> ctx) {
  if (ctx->iteration_count >= kLoopCount) {
  } else {

Now, when coroutines are used, you can relinquish control with a simple “co_await”, and all the logic can be written out as if it were purely procedural:

boost::asio::awaitable<void> Start() {
  for (int i = 0; i < kLoopCount; i++) {
    auto message = fmt::format("{}", i);
    co_await boost::asio::async_write(

Not only is this significantly shorter, it more directly expresses the intent of the operation, and lifetime of the various values is easier to manage.  No longer do we have to keep around the buffer as an instance variable or a shared ptr to ensure it lives beyond the write operation.  It stays in scope until it is no longer needed like any other automatic variable.


Switching moteus_tool to coroutines was straightforward, and resulted in a big reduction in the verbosity required.

DART now in bazel_deps

A previous simulator I had built for Super Mega Microbot was based on the “DART” robotics toolkit.  It is a C++ library with python bindings that includes kinematics, dynamics, and graphical rendering capabilities under a BSD license.  I wanted to use some of its dynamics capabilities for future gait work on the quad A0, and eventually re-incorporate its simulation capabilities, so integrated a subset of it into mjbots/bazel_deps.

ICFP 2019 Programming Contest

For many years now on and off I’ve played in the programming contest associated with the International Conference on Function Programming.  Each time it has been fun, exhilarating, and definitely humbling!  www.icfpcontest.org

This year, the competition runs from Friday June 21st through Monday June 24th.  Once again we’ll be competing as “Side Effects May Include…”.  Look for us on the leaderboard and cheer on!


log4cpp updates

While updating the mjmech code-base to interoperate with the moteus servos, I simultaneously was updating it to use C++17.  C++ rarely removes or deprecates features, but one of those which actually was removed in C++17, after being deprecated in C++11, was auto_ptr.  unique_ptr is strictly superior in all regards, and so there is no real reason not to switch.

However, mjmech depends upon a large amount of third party software.  Amazingly, only one package actually was broken by this removal of auto_ptr, log4cpp.  It actually saw some updates for C++17 compliance back in 2017, but otherwise hasn’t been updated since then.  I went ahead and forked it, and got it compiling with clang in C++17 mode at least:

After doing that, I discovered that someone had already posted a similar patch 2 years ago to the sourceforge page, but which was never applied.  Oh well, it wasn’t that much duplicated effort.