Tag Archives: python

Debugging bare-metal STM32 from the seventh level of hell

Here’s a not-so-brief story about troubleshooting a problem that was at times vexing, impossible, incredibly challenging, frustrating, and all around just a terrible time with the bare-metal STM32G4 firmware for the moteus brushless motor controller.

Background

First, some things for context:

moteus has a variety of testing done on every firmware release. There are unit tests that run with pieces of the firmware compiled to run in a host environment. There is a hardware-in-the-loop dynamometer test fixture that is used to run a separate battery of tests. There is also an end-of-line test fixture that is used to run tests on every board and some other firmware level performance tests.

Because of all that testing, we’re pretty confident to release new firmware images once all the tests have passed, and try to ship out boards with firmware that is within a week or two of the newest on all boards and devices that go out the door. That said, there is some effort made to ensure that large orders all have the same firmware on them. Thus, my saga started when I went to re-program a few dozen boards using the end-of-line test fixture so that they could all match the most recent version.

The first symptom

When I went to re-program them, a large portion of the boards failed tests surrounding the quality of the current sense measurements, indicating there was too much noise in the current sense measurements, specifically when driving 0 current. That could mean that there were soldering problems on the board, or that the test fixture had corroded contacts, or potentially firmware issues. In response, the test fixture got its contacts cleaned very thoroughly, I verified this was happening across many boards all of which had passed earlier, and there were only 3 changesets that affected the firmware in any way, all of which seemed pretty innocuous.

Once I had given up on the problem being a fluke, I opened up tview on the end-of-line fixture and sure enough, wow, there was a problem:

Note how the values of servo_stats.adc_cur3_raw seem to bounce between what looks like their true value and 2048. I have seen problems like this before, related to ADC configuration and clock rate (as – haveothers), but absolutely nothing about the ADC configuration has changed in more than a year, so surely that can’t be it, can it?

The first diagnostic step

So, first things first. Now that I can observe a problem, is it reproducible. I used git bisect across the relevant firmware versions, and sure enough, one of the changes was positively correlated with the problem: 64f2a82575795d782ff3806ea2036f4cd2f02ef0 However, that change does absolutely nothing with the ADCs or the current sense pipeline, or the STM32 register configuration at all. So, I tried to create a more minimal version of that change which would still trigger the problem. What I got was this:

diff --git a/fw/bldc_servo_structs.h b/fw/bldc_servo_structs.h
index abbe26e..f06c16c 100644
--- a/fw/bldc_servo_structs.h
+++ b/fw/bldc_servo_structs.h
@@ -509,7 +509,7 @@ struct BldcServoConfig {
   // debug UART at full control rate.
   uint32_t emit_debug = 0;
 
-  uint32_t field1;
+  uint32_t field1 = 0;
 
   BldcServoConfig() {
     pid_dq.kp = 0.005f;

So, adding the initialization of a member in a random structure (the one that holds PID gains among others), triggered the issue. If the initialization was only of a uint8_t or uint16_t, no problem, but a uint32_t, float, or uint64_t did it.

Well, “that’s odd”.

Clearly that change shouldn’t have any impact, so if the problem is at the C++ level, it must be undefined behavior somewhere else, and if it isn’t at the C++ level, it could be anywhere. So, my next step was to look at the difference in the disassembly to see what that code change wrought that the STM32 would see.

This is from “meld”, with a set of custom filters to remove most spurious changes related to addresses changing. But yikes, that one extra initialization results in a *lot* of churn in the assembly. If we look at the structure constructor, the change we expect is there in that we can see that the field is getting newly initialized.

However, with “-O3” optimizations on, gcc-11 makes all kinds of different decisions at various points. Instructions are re-ordered, different registers are used, entire blocks of code are re-ordered in their memory layout and execution, and extra padding is added or removed. There are many changes, any of which could be interacting with whatever undefined behavior is in the system.

Taking a step back

Since looking at the disassembly wasn’t going to be easy, I decided to take a step back and see if I could observe what was different in the system when it was running between the good and not-good states. Most likely some peripheral was configured incorrectly, with the ADCs being a prime candidate, but the clock tree could also be a culprit.

When debugging STM32s, I sometimes use the PyCortexMDebug project, which lets gdb use the vendor provided SVDs to interpret the contents of all registers. Here, I wanted to print out every register on every peripheral just to see what was different. PycortexMDebug doesn’t natively give you a way to do that. However, it can list all the peripherals it knows about, which I wrote to a file and pre-processed to remove the human level annotation. Then using gdb’s “python-interactive” mode, I could do a:

python-interactive
> regs = [x.strip() for x in open('/tmp/all_regs.txt').readlines()']
> for reg in regs:
>   gdb.execute('svd/x ' + reg)

Which did the trick — at least after copy and pasting the output from the terminal. I didn’t bother figuring out how to get it written to a file. So, now, I have two giant files with every peripheral register, one from a firmware that was working, and one from a firmware that was exhibiting the extra noise. I went through them line by line and found…. nothing.

Some registers were different of course, but the only ones were timer values, and data registers on the ADC and SPI peripherals, and the system control block depending upon if the code happened to be in an interrupt when I stopped to sample it. No configuration values or anything that would point to a problem. Sigh.

More backing up

OK. So maybe there is a peripheral register that isn’t in the SVD that would correlate with the problem? My next step was to use gdb to dump the entire peripheral address space to an srec file in both cases.

dump srec memory /tmp/out.srec 0x40000000 0x51000000

Note, this does take a *long* time, at least 15 minutes with the hardware I was using.

What did I earn for my hard earned wait? Bupkis, nothing, nada, squat. After looking through every single byte that was different, the only ones that had changed were the same ones that the svd method above turned up, plus a bit of random noise in the “reserved” section between peripherals that looked like genuine bus noise. Notably, not any configuration registers on any peripheral at all.

Even more backing up

OK. So if the problem isn’t in a peripheral register, maybe there is some difference in program state that is causing the problem? Maybe a stack overflow or something? So, I switched to SRAM dumps. First, I modified my startup assembly to start out with guard bytes across all of SRAM so that I could verify the stack hadn’t overflowed (not even close). I also used that to verify that the code which was copied into CCM SRAM on startup hadn’t overflowed or been stomped on (it hadn’t). Next I did a diff between the working and non-working states.

Here, there were a lot more differences as the firmware has a lot of state that varies from run to run. With the structure of the moteus firmware, most storage ends up being allocated on the C/C++ stack from a fixed size pool. This means that most of the variables don’t have a useful entry in the symbol table, even though their address is consistent from run to run. To identify what each change was, I started the firmware afresh with a breakpoint on _start, then added a hardware watchpoint on the address of interest.

b _start
run
watch *0x20004560 # (for example)
continue (as many times as necessary)

And then looked to see what modified that particular memory location to determine what it was doing. I methodically went through every difference, about 50 of them. I found things like the buffer used to hold CAN-FD frames, timers, nonce counters, the values read by the position sensor and current sensor, and many other things that all seemed perfectly reasonable.

Yet another approach doomed to give no useful information.

Back to an earlier approach

Whatever the problem was, it appeared to be in state on the STM32 that was not accessible to mere mortals. Probably a peripheral got into a bad state that wasn’t exposed via its registers or something. If I couldn’t find the state that was different, could I at least make a “minimal code difference” which was actually minimal?

My C++ minimal difference was pretty small, just the addition of an “=0” to a field initializer. However, that resulted in significant changes in the output program. To make things a little bit more controllable, I tried adding some __asm__("nop") entries to the constructor in question and sure enough, some counts of NOPs would trigger the problem and others wouldn’t. However, they still resulted in large differences in the output.

So then I undertook the painstaking step of gradually turning off optimizations in each function that I saw had changed. In some cases it was as easy as sticking a __attribute__((optimize("O1"))) on the definition. However, in many cases gcc/C++ requires the inline definitions be pulled out-of-line to make that annotation. Both because of that, and just because of bad luck, often these changes would result in my “nop” trick no longer triggering a failure. I worked methodically though, trying new functions until I was eventually able to make a minimal assembly diff that failed.

diff --git a/fw/bldc_servo_structs.h b/fw/bldc_servo_structs.h
index 95db9fe..8916d4e 100644
--- a/fw/bldc_servo_structs.h
+++ b/fw/bldc_servo_structs.h
@@ -533,6 +533,11 @@ struct BldcServoConfig {
     pid_position.ilimit = 0.0f;
     pid_position.kd = 0.05f;
     pid_position.sign = -1.0f;
+
+    asm volatile (
+        "nop;"
+        "nop;"
+    );
   }
 
   template <typename Archive>

And the assembly diff is solely:

Solely the addition of the 2 nops!

WTF!

As before, I’m using the same regexes with meld to exclude spurious changes related to addresses and literals. The exact set of expressions is below:

asm_address      ^.{20}
stm32_pc         08[0-9a-f]{6}
stm32_pc2        (80[012345][0-9a-f]{4})
stm32_addr       \+0x[0-9a-f]+>
stm32_literal    #[0-9]{2,5}

Trying to understand this a bit more

So far we have learned that simply adding two NOPs to one function that is totally unrelated to the problem in question causes the ADC to become noisy in an odd way. I tried some experimenting to learn more about the failure.

What does adding more NOPs do? The answer… 1 or 2 NOPs fails, 3 or 4 NOPs works, 5 or 6 fails, etc.

Hmmm…. my current top two theories are that either a) it is the instruction layout or b) the execution timing that results in the difference. To rule out one or the other, I made up a series of 8 NOPs, and then substituted a jump in for the first NOP that skipped to one of the later NOPs. That way I could adjust the execution cycle time of the relevant function one by one without changing any layout. That had no effect. Which meant it must have to be the physical layout of the code, not the timing.

The grind

At this point, I undertook what was perhaps the most arduous debugging task yet. To figure out which code was unhappy about having its instruction address changed, I bisected adding NOPs. This wasn’t super straightforward, because as mentioned, gcc’s optimizations generally mean that adding a NOP to a random function results in all kinds of changes all over the place. My procedure was roughly like this:

  1. Identify where in the address space I wanted to add a NOP.
  2. Find a nearby function that was written by me, and not a template expansion or library function.
  3. Switch it to be O1/O0
  4. See if I can still trigger the problem at any of my former test points by adding NOPs (turning off optimizations on the one function sometimes re-ordered everything)
  5. If I can’t, then pick a different function and go back to 1
  6. If I can, then bisect over all my current test points (which may be in a different order than the last bisection) to find the latest address space point where I can add a NOP to trigger the problem

While brutal, I figured this was sure to result in finding the culprit.

And sure enough, after about 15 steps, each taking around 5-10 minutes, it did. I thought the following two lines were the culprit:

    ADC12_COMMON->CCR =
        (map_adc_prescale(kAdcPrescale) << ADC_CCR_PRESC_Pos) |
        (1 << ADC_CCR_DUAL_Pos); // dual mode, regular + injected
    ADC345_COMMON->CCR =
        (map_adc_prescale(kAdcPrescale) << ADC_CCR_PRESC_Pos) |
        (1 << ADC_CCR_DUAL_Pos); // dual mode, regular + injected

The two lines that configure the ADC prescaler! But, wait, didn’t we verify that the ADC prescaler as read from the peripheral registers was the same in both instances? Why yes, we certainly did.

Working:

(gdb) svd/x ADC12_COMMON
Registers in ADC12_Common:
	CSR:  0x000A000A  ADC Common status register
	CCR:  0x000C0001  ADC common control register
	CDR:  0x00000000  ADC common regular data register for dual and triple modes
(gdb) svd/x ADC345_COMMON
Registers in ADC345_Common:
	CSR:  0x000A000A  ADC Common status register
	CCR:  0x000C0001  ADC common control register
	CDR:  0x05250000  ADC common regular data register for dual and triple modes

Not working:

(gdb) svd/x ADC12_COMMON
Registers in ADC12_Common:
	CSR:  0x000A000A  ADC Common status register
	CCR:  0x000C0001  ADC common control register
	CDR:  0x00000000  ADC common regular data register for dual and triple modes
(gdb) svd/x ADC345_COMMON
Registers in ADC345_Common:
	CSR:  0x000A000A  ADC Common status register
	CCR:  0x000C0001  ADC common control register
	CDR:  0x05270002  ADC common regular data register for dual and triple modes

For good measure, I tested using stepi to walk through the initialization in the bad state to see if it was somehow related to wall clock timing, but that didn’t make a difference.

Narrowing things down

To avoid the “flavor-of-the-day” the gcc optimizer gives you and make my life easier for experimenting, I rewrote those two lines in inline assembler, just hard-coding the required CCR value:

    asm volatile(
        "str %2, [%0];"
        "str %2, [%1];"
        :
        : "r" (&ADC12_COMMON->CCR),
          "r" (&ADC345_COMMON->CCR),
          "r" (0x000C0001)
    );

I added in NOPs before, in between, and after the two stores. To my surprise, in all 3 places failures could be induced, but only on every 4th NOP. Which meant my identification of these two lines was incorrect.

Thus, false alarm. I kept moving down the function, replacing sections with inline assembler and then bisecting with NOPs until I reached the following section:

    ADC1->CR |= ADC_CR_ADEN;
    ADC2->CR |= ADC_CR_ADEN;
    ADC3->CR |= ADC_CR_ADEN;
    ADC4->CR |= ADC_CR_ADEN;
    ADC5->CR |= ADC_CR_ADEN;

Here, all 5 ADCs are turned on in rapid succession after previously having all their pre-requisite startup operations and delays performed. NOPs placed before this could cause the ADCs to get into the bad state, but NOPs immediately after did not. Placing NOPs between them always seemed to make the following sections work without problem. Once I had at least 3 NOPs between each, then no amount of change above could cause a failure.

Finally, a decent hypothesis and solution

It seems that the ADCs on the STM32G4 do not like to be turned on in rapid succession, and if they do, bad things can happen like having the prescaler flipped to a different value without it showing in the corresponding register. In this case, the flash accelerator was probably delaying the initialization when the ADEN sets happened such that they crossed a fetch boundary. Then when two of them ended up in the same pre-fetch block, they would get turned on too quickly together. Maybe it causes a local brownout or something? Somewhat recently I upgraded to gcc-11, which probably did a better job of packing these enables into a smaller amount of code space.

I guess that’s an errata for you.

With that understanding, a solution is trivial. Just initialize the ADCs one by one instead of all at once. The initialization sequence for the ADC is documented as requiring a wait until the ADRDY flag is set, so the fix is just to wait for that for each ADC in turn before enabling the next one. For good measure, since initialization isn’t time critical, I switched the whole process to be serial for each ADC, as I expect that is the more tested path with the hardware.

What is the lesson? Hardware is hard? Persistence pays off? I guess you can decide!

As a bonus, now that I know one of the prime symptoms to look for to troubleshoot bad prescalers (unusual bit flips around 2048), I discovered that I could get a bit more performance around the 0 current point by increasing the moteus prescalers a bit (75df013).

Improved mjbots labels

For most of mjbots’ existence, all of our products were labeled with a dependable, if lackluster, Brother P-Touch label maker. In line with other packaging improvements, I recently upgraded that labeling setup to bigger, higher resolution, and full color!

This is using an Epson TM-C3500, which I had expected to operate directly from my Linux based test fixtures. However, upon receiving it, discovered that alas only Windows drivers were available. Thankfully, it wasn’t too bad to print from python in a simple way as long as you manually select the media type from the Windows dialogs. Thus I made up a simple Flask app to receive label images over HTTP and print them. That runs on a Windows computer, and the test fixture applications just POST their label images to it.

The qdd100 even has a nifty graphical summary of some of the automated test results graphed on the label.

Since I was hacking on it the bar code URLs are live now too. They show a summary of the test results for each individual unit, although the HTML they produce is pretty minimal even by my standards!

pi3hat python raw CAN-FD

The pi3hat, among other things has 5 CAN-FD ports. You can use them to drive a lot of moteus servos, but they are perfectly fine CAN-FD ports generally. The C++ library has always been able to send and receive arbitrary frames (and recently at arbitrary bitrates), but the python interface was lacking, only exposing a portion of this functionality.

As of version 0.3.11, the python library (pip3 install moteus-pi3hat) now exposes everything you need to be able to send and receive arbitrary CAN frames from any of the ports, as well as configure all the timing options for waiting for responses from slave devices.

This is what a sample usage of raw frames, mixed in with moteus frames, looks like:

# To send a raw CAN, you must manually instantiate a
# 'moteus.Command' and fill in its fields, along with which
# bus to send it on.
raw_message = moteus.Command()
raw_message.raw = True
raw_message.arbitration_id = 0x0405
raw_message.bus = 5
raw_message.data = b'1234'
raw_message.reply_required = False

# A single 'transport.cycle' call's message list can contain a
# mix of "raw" frames and those generated from
# 'moteus.Controller'.
#
# If you want to listen on a CAN bus without having sent a
# command with 'reply_required' set, you can use the
# 'force_can_check' optional parameter.  It is a 1-indexed
# bitfield listing which additional CAN buses should be
# listened to.

results = await transport.cycle([
       raw_message,
       controller.make_query(),
   ], force_can_check = (1 << 5))

# If any raw CAN frames are present, the result list will be a
# mix of moteus.Result elements and can.Message elements.
# They each have the 'bus', 'arbitration_id', and 'data'
# fields.
#
# moteus.Result elements additionally have an 'id' field which
# is the moteus servo ID and a 'values' field which reports
# the decoded response.
for result in results:
    if hasattr(result, 'id'):
        # This is a moteus structure.
        print(f"{time.time():.3f} MOTEUS {result}")
    else:
        # This is a raw structure.
        print(f"{time.time():.3f} BUS {result.bus}  " +
              f"ID {result.arbitration_id:x}  DATA {result.data.hex()}")

It isn’t the cleanest API, but it does get the job done!

pi3hat configurable CAN

To date, the pi3hat CAN channels only supported CAN properties suitable for use with moteus controllers. Given that’s what most people are using them for, that’s fine. However, there was no real constraint behind that, just laziness.

Thus, I’ve released new firmware for the pi3hat that supports configuring the bitrate, FD-ness, and other properties of all 5 CAN channels.

Currently only the C++ library exposes the configuration functionality, but it will be easy enough to add to python when someone needs it.

New cross-platform moteus tools!

After receiving many requests via youtube, discord, and email, I’ve finally gone ahead, bitten the bullet, and updated all of the moteus tools to be pure python and work in a cross platform manner. Now, the only thing you need to do to install pre-compiled versions of tview and moteus tool on most* platforms is:

pip3 install moteus_gui
python3 -m moteus_gui.tview    # (or maybe just tview)
python3 -m moteus.moteus_tool  # (or maybe just moteus_tool)

I’ve personally tested these on Linux, Windows, and Raspberry Pi, and others have at least verified basic operation on Macs. Python 3.7 or greater is required.

….

But wait, there’s more!

Now, both moteus_tool, tview, and the python bindings more generally can use python-can as a transport. That means tview can now be used with socketcan, pcan, and a bunch of other options. To one up that, most users won’t have to even specify any command line options, as tview and moteus tool will automatically select a fdcanusb or python-can depending upon what is available.

I’ll be updating the devkit introduction video soon, although the commands in there will largely continue working for the time being.

Errata

  • Neither pypi or piwheels has pyside2 for the Raspberry Pi, but it is packaged in Raspberry Pi OS. You can follow the instructions in git to find a recipe that works.
  • To use the pi3hat, you need to also do pip3 install moteus_pi3hat

pip3 install moteus

I’m excited to announce new python bindings for communicating with moteus controllers! A simple example from the README:

import asyncio
import math
import moteus

async def main():
  c = moteus.Controller()
  print(await c.set_position(position=math.nan, query=True))
  await asyncio.sleep(1.0)

asyncio.run(main())

This code will try to locate an fdcanusb on your host and use it to communicate with controller with ID 1. All of those details can be customized through code depending upon how you construct things. The library is pure python, although it doesn’t work on Windows currently because it relies on an asyncio aware pyserial wrapper that doesn’t work there.

At the same time, there is a parallel python library “moteus-pi3hat” which only has an armv7l package. This provides an identical API for working with the pi3hat on a Raspberry Pi. It lets you configure which controllers are attached to which bus (by default it assumes everything is on bus #1). After setting that up you can use an identical API to command and monitor the controllers.

Thanks to everyone in discord who helped test!

Primitive gait balancing – 1D

I’m working to improve the walking gait of the mjbots quad A1. In this iteration, I wanted to tackle an incremental step towards a more fully dynamic gait, but one that will still greatly increase the capability of the machine. As mentioned last time, the current walking gait cycles between all four legs, and then alternative opposing corner legs in order to move laterally. I’d like to keep that same basic structure, but be a bit smarter about what happens during the swing phase.

Setup

First, the obvious: When the robot has legs on the ground, they support it. If all four legs are on the ground, they form a support polygon. If the center of mass of the robot is within this polygon when projected onto the ground, the robot will happily sit there forever without tipping over.

During the flight phase of the gait, only two legs are in contact with the ground. Now, we no longer have a support polygon, but a support line. The center of mass of the robot is either on one side of it or on the other. Well, I suppose it could be exactly over, but then the slightest breeze would push it to one side or the other. Whichever side the robot is on, gravity will exert a torque and cause it to continue to tip over in that direction.

The current gait sequencer is oblivious to this fact. It will happily pick up the legs when the robot is far away from the support polygon, and then move the center of mass even further away. Needless to say, this results in the robot tipping over rather rapidly.

The “Idea”

Clearly the robot can’t move while keeping itself exactly over the support line, but can we arrange the gait such that it spends equal amounts of time on both sides (to be more precise, the integral of the distance from the support line over the time of the swing cycle is 0)? If we could, that would result in the minimum possible angular rotation during the swing phase, which would hopefully make things be a lot better without a whole lot more work.

In this post, I’ll consider the problem in 1 dimension, along a line as a simplification. Given that the robot can only move in one direction at a time, we should be able to generalize the 1 dimensional case to the 2 dimensional case later.

For a constant velocity and a fixed swing time, the answer is pretty clearly yes. If we start the swing phase right when the center of mass is 1/2 of the swing time from the opposing support, then the swing will be completed when it is on the other side spending exactly the same amount of time on each side of the support line.

In fact, this also holds for a more general case, as long as we don’t attempt to accelerate during the swing phase itself. All that is necessary is to start the swing when the center of mass is half a swing times away from the support point at the current velocity and not accelerate until the foot is down again.

Granted, in addition to the one dimensional simplification, in this model the feet are allowed to be infinitely far away from the center of mass. However, if we just apply a maximum time spent with all legs in stance, that places a limit on how far the legs can get away with only a slight decrease in performance.

The model also breaks down when the deceleration causes the robot to change directions. A simple heuristic is to change the leg order in that case.

Summary

Well, that looks pretty good in 1 dimension. Next up I’ll extend it to 2 dimensions, and try it out on the robot.

C++20 coroutines and moteus_tool

I’ve had a confusing mismash of development tools for the moteus servos for a while now.  My original development tool was in python, which worked just fine.  Coroutines allowed me to express complex asynchronous logic succinctly, the program itself was rather simple, and I could easily integrate it with matplotlib for plotting.  However, when looking to run this on the raspberry pi, I needed a newer python version than came with raspbian, which turned out to be a royal pain to get installed in a repeatable manner.  Thus I rewrote a portion of the moteus_tool in C++ and just used my normal cross-compiling toolchain to generate the binaries.  What I didn’t do was port the calibration logic, as the state machine required with standard boost::asio would have bloated the logic size by 5x, and I didn’t really need to calibrate servos from the raspberry pi ever.

Still, I’ve wanted to consolidate these tools for a while now, and while working towards other telemetry and development tool goals, I decided to make another pass at removing the duplicity.  I figured it was time to try using the new C++20 coroutines to implement the asynchronous logic in C++ and see if I could get rid of the python tool.  Since I’m currently vendoring all the compilers and libraries for the C++ applications, I am already running clang-9.0 with libc++ and boost 1.72 for all the host side tools which theoretically should all support coroutines just fine.

Why coroutines?

To recap, typical callback based boost::asio code looks something like this:

void DoSomething() {
  boost::asio::async_write(
    *stream_,
    boost::asio::buffer("command to send"),
    std::bind(&Class::DoneWriting, this, pl::_1));
}

void DoneWriting(boost::system::error_code ec) {
  FailIf(ec);
  boost::asio::async_read_until(
    *stream_,
    response_,
    "\n",
    std::bind(&Class::HandleRead, this, pl::_1));
}

void HandleRead(boost::system::error_code ec) {
  FailIf(ec);
  // Do something with the result.
}

This gets even worse if you have embedded control flow.  Typically the best you can do is construct an object to hold the state, and bind it around to keep track of things:

struct Context {
  int iteration_count = 0;
};

void Start() {
  auto ctx = std::make_shared<Context>();
  Write(ctx);
}

void Write(std::shared_ptr<Context> ctx) {
  message_ = fmt::format("{}", ctx->iteration_count);
  boost::asio::async_write(
    *stream_,
    boost::asio::buffer(message_),
    std::bind(&Class::HandleWrite, this, pl::_1, ctx));
}

void HandleWrite(boost::system::error_code ec,
                 std::shared_ptr<Context> ctx) {
  FailIf(ec);
  ctx->iteration_count++;
  if (ctx->iteration_count >= kLoopCount) {
    Done();
  } else {
    Write(ctx);
  }
}

Now, when coroutines are used, you can relinquish control with a simple “co_await”, and all the logic can be written out as if it were purely procedural:

boost::asio::awaitable<void> Start() {
  for (int i = 0; i < kLoopCount; i++) {
    auto message = fmt::format("{}", i);
    co_await boost::asio::async_write(
      *stream_,
      boost::asio::buffer(message),
      boost::asio::use_awaitable);
  }
}

Not only is this significantly shorter, it more directly expresses the intent of the operation, and lifetime of the various values is easier to manage.  No longer do we have to keep around the buffer as an instance variable or a shared ptr to ensure it lives beyond the write operation.  It stays in scope until it is no longer needed like any other automatic variable.

moteus_tool

Switching moteus_tool to coroutines was straightforward, and resulted in a big reduction in the verbosity required.

My first non-Fusion generated g-code

While working to build a weight reduced moteus servo mk2, I reworked my outer housing CAM to do all the machining on the Pocket NC v2-50.  For this part I didn’t necessarily need any challenging workholding and since I could get the stock in tube form, there wasn’t an inordinate amount of material to remove either.

The one challenge is that when mounted in the Sherline Chuck, the mill can’t actually reach all the way to the edge of the part without hitting the X travel limit (which is why most of the other 100mm diameter parts I do are fixtured slightly off-center).  In this case I tackled the problem in two iterations.

Iteration 1

For the first iteration, I just used an adaptive clear from 8 different directions to get most of the material out of the way, then used a multi-axis “flow” to finish the outer diameter causing the B axis to rotate while the end-mill remained roughly in place.  Then a subsequent pass came in from the top to clean up all the stuff that was left behind.

20191216-outer_housing_3_plus_1.png
Coming at the problem from all sides

20191216-outer_housing_od_finish
A “flow” toolpath to do the outside

This worked, but had a couple of problems.  First, it was slow.  A full cycle time was something like 10 hours, largely because all the adaptive clears spent a lot of time not removing much material and rapiding around.  Second, it left a non-ideal surface finish on the outer diameter.  The “flow” toolpath for some reason seemed to jiggle the mill around in X and Y for no great reason, and occasionally sped the B axis up by like 3 times the normal rate for a quarter revolution for no apparent reason.

Iteration 2

I figured this would be a lot easier if I could just have more control over the mill while spinning the B axis.  I could take all the extra material off the top using the side of the cutter, and produce a nicer surface finish on the outside.  Since that wasn’t possible within Fusion 360, I figured this wouldn’t be a terrible time to try doing some “manual” g-code for the first time.  The “manual” is in quotes only because I ended up writing a python script to do the actual generation.

To begin with, I started with the g-code from a Fusion 360 generated toolpath so that I could get the tool setup, probing and such configured in a way that I knew the Pocket NC would accept.  Then my python script had two options, the first generated the g-code to turn down all the material on the top of the stock to the final length.  It moved the mill into position, spun the B axis by 340 degrees, then gradually moved down a Y step while moving 20 degrees, then spun another 340 again until reaching the end.  This worked out just great, used much more of the cutting length of my Datron 4mm mill, and got done in something like 20 minutes.

The second option in the script was for turning down the OD to the final size.  This used the same basic approach, but instead of setting the Z past the inside of the tube, set it to exactly the OD.  The the Y stepped down in the same manner as before, just over a different range (the top of the finished part to slightly past the bottom).

This got me to where I could start on the internal features in only about 40 minutes of Pocket NC v2-50 time, which is a big improvement over a trip to Artisan’s Asylum and an hour on the lathe in order to get it set up, turning the part, and then cleaning up.