Savage Solder: Identifying Cones

Cone Tracking Block Diagram

Cone Tracking Block Diagram

One of the key aspects of our Savage Solder robomagellan solution is the ability to identify and track the orange traffic cones that serve as goal and bonus targets. Our approach to this is divided up into two parts, the first identification, the second tracking. They are roughly linked together with the block diagram to the right. Raw camera imagery is fed to the identification algorithm, which produces putative cones denoted by the bearing and range where they are estimated in the camera field of fiew. The tracking algorithm then takes those range and bearing estimates, along with the current navigation solution, and fuses them into a set of cartesian coordinate cones in the current neighborhood. Those local coordinates are then fed to the higher level race logic, which can decide to target or clip a cone if it is at the appropriate place in the script.

In this post, I’ll cover the Cone Identification Algorithm, listing out the individual steps within it.

Cone Detection Dataflow

Original Image from Webcam

Original Image from Webcam

Our identification algorithm consists of a relatively simple image processing pipeline described below. We will work through a sample image starting from the source shown to the right:

Step A: First, we convert the image into the Hue Saturation Value color space, from the RGB values that our camera data source provides. In the HSV space, hue corresponds to color, and saturation and value determine how much color there is, and how white the color is. Since the traffic cones have a known color (orange), this helps us in later steps reject a lot of very unlikely regions of the visual field.

Hue Channel of HSV Transform

Hue Channel of HSV Transform

 

Step B: Second, we apply a limit filter to select only a subset of the hues and saturations, ignoring the value channel entirely. Both hue and saturation are parameterized by a center point and a range on either side of that center point which will be accepted. Since the hue values wrap around at 0, (red), we take care to select values on both sides of the center point. If the hue range does wrap around, we just do two limit operations, one for each side and merge the two together after the fact. The result of this process is a two color bitmap. It is black if there was no cone detected, white if there was a potential cone.

Bitmap with specific hue range selected

Bitmap with specific hue range selected

Step C: Next, we perform what OpenCV calls a “distance” transform on the resulting bitmap. This determines, for any given white pixel, how far it is to the nearest black pixel in any direction. Effectively, this will tell us how big any matched section is. It is not a terribly reliable indicator, as even a single black pixel in the middle of an otherwise large white area can result in halving the distance for all the pixels in that region. However, we use this to just discard regions that have a distance of 2 or less, which mostly throws away speckle noise. We also process each region in order of its distance score, so that we handle the biggest regions first.

 

Distance metric

Distance metric

Step D: At this point, we iterate through the connected regions starting at those which contain the largest “distance” points. For each, we measure the bounding rectangle of the region, and from that caculate the aspect ratio and the fill percentage of the region. The aspect ratio is just the width divided by the height. The fill percentage is the percentage of pixels inside the bounding rectangle which were marked as being a cone. To be considered a valid cone, the aspect ratio and fill percentage must be within certain configured limits.

Step E: Finally, if the putative cone passed all the above checks, the bounding rectangle is used to estimate the bearing and distance. The bearing estimation is relatively straightforward. We assume our camera has no distortion and a field of view of 66 degrees. From there, the x coordinate of the bounding box center linearly relates to the bearing angle. Range is estimated using the total number of pixels detected in this region assuming the cone is a perfect sphere. We calibrated the size of the sphere to match the visible surface area of a cone. If the detected region abuts one of the edges of the camera image, then range is not emitted at all, as the range will be suspect due to some of the pixels being off screen. The very last step is to estimate the bearing and range uncertainty, which the detector generates for 1 standard deviation away from the mean. A guess for the bearing value is made assuming a fixed number of pixels of error in the x direction. For range, we assume that some percentage of the pixels will be missed, which would errnoneously make the cone seem farther away than it actually is.

Next Steps

One all the qualifying regions are processed, or the maximum number of visible cones are found (4 for our case), then we call this image done and record the outputs. Next, these values go on to the “cone tracker”, which generates Cartesian coordinate positions for each cone. I’ll discuss that module later.

Hawaii 2013

Generating Usable Depth Maps from Aerial LIDAR Data

Earlier, we looked at how I used publicly available map data to create a simulation environment for Savage Solder, our robomagellan entry. Here, I’ll describe the process I used to source and manipulate the public GIS data into a form our simulation environment could use. There are two halves to this: the visible light imagery, and the elevation data used to build the 3D model.

Aerial Imagery

I started at the MASS GIS website located at: www.mass.gov. For Cambridge, that comes up with a tile map showing which tile corresponds to which area. From there, you can download MrSID data for the areas of interest.

MASS GIS Aerial Tile Map

MASS GIS Aerial Tile Map

The only free application I could find which adequately manipulated MrSID files was LizardTech’s GeoViewer, available for Windows only. I was able to use it to export a geotiff file containing the areas of interest.

Next, I used some tools from the GDAL open source software suite to manipulate the imagery (ubuntu package “gdal-bin”). The first was “gdalwarp”, which I used to do a preliminary projection into the Universal Transverse Mercator (UTM) projection. All of our vehicle and simulation software operates in this grid projection for simplicities sake.

gdalwarp -r cubic -t_srs '+proj=utm +zone=19 +datum=WGS84' \
  input.tif output_utm.tif

The next step was a little finicky. I used “listgeo” (from ubuntu “geotiff-bin”), to determine the UTM bounding coordinates of the image. Then I selected a new bounding area which, at the same pixel size, would result in an image with a power of two number of pixels in each direction. Then, I used “gdalwarp” to perform the final projection with those bounds.

listgeo output_utm.tif

gdalwarp -r cubic -t_srs '+proj=utm +zone=19 +datum=WGS84' \
  -te 323843.8 4694840.8 324458.2 4695455.2 \
  output_utm.tif final_image.tif

For the final step with the aerial imagery, I used “gdal_translate” to convert this .tif file into a .png.

gdal_translate -of png final_image.tif final_image.png
Final Downsampled Aerial Imagery

Final Downsampled Aerial Imagery

LIDAR Elevation

For the elevation side of things, I downloaded LIDAR data also from MASS GIS, on a site dedicated to a city of Boston high resolution LIDAR data drop. There you can use the index map to identify the appropriate tile(s) and download them individually.

For the LIDAR data, I used gdalwarp again for multiple purposes in one pass:

  • To mosaic multiple LIDAR data files together.
  • To project them into the UTM space.
  • And finally, to crop and scale the final image to a known size and resolution.

The resulting gdalwarp command looks like:

gdalwarp -r cubic -t_srs '+proj=utm +zone=19 +datum=WGS84' \
  -te 323843.8 4694840.8 324458.2 4695455.2 -ts 1025 1025 \
  lidar_input_*.tif lidar_output.tif

Where the UTM bounds are the same ones used for reprojecting the aerial imagery. Our simulation environment requires a terrain map be sized to an even power of 2 plus 1, so here 1025 is chosen.

Finally, this tif file (which is still in floating point format), can be converted to a discrete .png using gdal_translate. I use “tifffile” from the ubuntu package “tifffile” to determine the minimum and maximum elevation to capture as much dynamic range as possible. In this case, the elevations run from about 0m above sea level to 50m.

gdal_translate -of png -scale 0 50 lidar_output.tif lidar_output.png
Final Downsampled LIDAR Elevation

Final Downsampled LIDAR Elevation

Autonomous Racing Rotorcraft: Camera Signal Integrity

IGEP COM Pro to TechToys OV9650 Adapter

IGEP COM Pro to TechToys OV9650 Adapter

Last time working with the racing helicopter’s camera system, I managed to capture a poor quality image from the camera. My next step was attempting to diagnose the problems with the image quality. However, before I could do so, I ran into another problem with signal integrity in my breadboard setup.

The TI DM3730’s ISP with the 3.6 kernel is relatively sensitive to the quality of the pixel clock and vertical sync. If a single frame has the wrong number of pixels detected, the driver does not seem to be able to recover. This was a big problem for me, as the breadboard setup I have runs many of the 12MHz signal lines over 24 gauge wire springing around a breadboard. What I found was that I was only intermittently able to get the ISP to capture data, and eventually it got to a point where I could not get a single frame to capture despite all the wiggling I could attempt.

Rather than spending a large amount of time trying to tie up my breadboard wires just so, I instead just printed up a simple adapter board which contains all the level translation and keeps the signal paths short. This time, I tried printing it at oshpark.com, a competitor to batchpcb. The OV9650’s FFC connector has pads with a 7.8mil spacing, and batchpcb only supports 8.1mil, while oshpark has 6mil design rules. They also claim to ship faster, and are slightly cheaper.

The results were pretty good. From start to finish, it took 14 days to arrive, and the 3 boards appeared to have no major defects. My art had one minor error which required rework. The output enable pin on the level converters were tied to the wrong polarity, thus the lifted pins and blue wiring. Despite that, it appears to be working as intended.

Final OV9650 adapter attached to IGEP COM Pro

Final OV9650 adapter attached to IGEP COM Pro

Savage Solder: Staying on the pavement part 2

Previously, I looked at how Savage Solder, our robomagellan entry, uses online replanning to avoid nearby cones. This time, I’ll cover the terrain oracle and how it integrates into the path planner.

For our simulation environment, we currently rely on digital elevation maps from the city of Cambridge along with aerial imagery of the environment. The simulator places the car in that fake world, and pretends to be each of the sensors that the car has. For Savage Solder, that is the IMU, the GPS, and the video camera.

In this first stage of staying on the pavement, we added an additional “oracle” sensor which has roughly the same field of view as the camera, but instead of reporting images, reports how fast the car can go at any point visible in the frame. The simulator gets this information from a hand-annotated map of the test area, where each pixel value corresponds to the maximum allowed speed. For example, below is downsampled aerial imagery of Danehy park in Cambridge, along with the hand annotated maximum speed map.

The locally referenced terrain data is fed into the online replanner. Currently, the maximum speed is just used as a penalty function, so that the car prefers to driver over terrain labeled with higher speeds. A sample online replanner output is shown below. Here, darker shades of red imply a higher maximum speed, and the black line is where the global GPS waypoints are in the current local reference frame. You can see the planned trajectory follow the red path despite it being offset some distance from the global trajectory. In practice, this will happen all the time, if nothing else because the GPS receiver on Savage Solder is comparatively low accuracy and often drifts by 3 or 4 meters in any direction.

Online replanning on a sidewalk

Online replanning on a sidewalk

As a caveat, the implementation now actually still drives at the speed programmed in by the global waypoints regardless of what the local maximum speeds are. The local sensor just modifies where it drives. Fixing that up is the immediate next step.

We hope this approach to also be useful for events like the Sparkfun AVC, where the traversable path is of the same size or narrower than the GPS accuracy.

Savage Solder: Staying on the pavement part 1

The Robogames 2012 event venue in San Mateo

The Robogames 2012 event venue in San Mateo

As we gear towards higher speeds with our robomagellan entry, Savage Solder, one of the harder challenges is managing the terrain that the car drives on. In general, robomagellan competitions are held in campus or park like environments that consist of a mix of grass, pavement, wooded areas, and buildings. All of these terrain features affect how the car can navigate, and at what speeds. For instance, it shouldn’t drive into trees or buildings. In grass, the maximum speed should generally be lower than on pavement. Transition points may need an even lower speed as there could be a large bump where an area of grass ends, or where a manhole or other negative obstacle is present. Last year, we ran the car at about a maximum of 5mph, at which speed our platform can handle most all of the common event terrain features. However, as we go faster, we will want to make sure that if possible, the car stays on pavement, and avoids areas that could cause large problems.

We broke down our approach to tackle this problem into a number of discrete stages:

  • **Online Replanning:**First, the car will run a local A* planner over short distances in the future. Initially, this will just avoid unexpected cones, since we are already tracking all cones in the vicinity of the car.
  • Oracular Terrain:Next, we will hand annotate a map of one of our simulated environments, and create a synthetic “sensor” which reports the exact maximum speed in the robot’s local coordinate frame. This will feed into the local planner to keep the car on the path and at an appropriate speed.
  • Derive Terrain from Camera:In the final stage, we will use our camera data to estimate what the maximum speed of visible terrain elements are, likely by using a simple color classifier.

We have implemented a rough prototype of the local replanner and have had success both in simulation and in the field with it. It operates as a hybrid A* planner with a primary state space consisting of x, y, heading, and velocity. The auxiliary, non-searched, states are just the previous steering command. At each time step, the planner searches a range of steering angle adjustments, as the steering servo on our platform requires about 1.5s to move from lock to lock. In a given plan time step only a small fraction of that motion is possible. The cost metric incorporates the rate at which the steering is changed, the distance away from the global waypoint path, and the proximity to any nearby cones.

Savage Solder online replanner

Savage Solder online replanner

In the picture above, the green vectors show the different possible vehicle positions and orientations that were considered, the black line shows the global waypoint path, and the blue line shows the best path explored so far. Each of the circles is an estimated cone (in this case there was actually only one cone, but the estimator hallucinated a few more in the vicinity).

Next, on this front, I’ll look into our efforts with an oracular terrain sensor.

The state of investing in America

Presume something is becoming more and more expensive, and in fact is approaching its record for being the most costly ever. Do you:

  • a) Buy more
  • b) Buy more
  • c) Buy more
  • d) Wait for it to become even more expensive, then buy more
Yahoo Finance poll on January 27, 2013

Yahoo Finance poll on January 27, 2013

LCD Enclosure for Savage Solder

One of the minor improvements we had planned for Savage Solder, our robomagellan entry, was to build a new enclosure for our LCD. I used this as an opportunity to experiment with some new manufacturing techniques in order to make something that was both more customized and still looked relatively professional.

In our previous incarnation of Savage Solder, our laptop was generally left closed while the car was in motion to save wear and tear on its hinges. During those periods, the LCD provided information on which stage of the plan the software was in, whether it was homing on a cone, and other auxiliary pieces of status. Watching the display as the car was moving added a lot of value over just looking at the results after the fact, as we had a lot more context.

FeeCAD Model of Enclosure

FeeCAD Model of Enclosure

My plan was to build an enclosure box using custom machined faces of ABS plastic, similar to Gary Chemelec’s approach documented at http://chemelec.com/Projects/Boxes/Boxes.htm. Basically, you cut sheets of ABS plastic into the correct dimensions for each edge, machine any holes for panel mounts, and then weld the result together with a plastic solvent.

Since I have been experimenting with some 3D printing, I drew up a 3D solid model of the display and an enclosure in FreeCAD. This gave me some confidence that I would be able to get the mounting hardware lined up properly with the display, and that the box dimensions would be big enough to encompass all the resulting hardware. In addition to the raw ABS box, a sheet of clear polycarbonate is inserted in between the LCD and the ABS cutout to better protect the LCD.

FreeCAD Model of Button

FreeCAD Model of Button

With that model, I ordered some ABS sheet from amazon, and went down to the Artisan’s Asylum to cut each side to the correct dimensions and machine the various holes. I used a jump shear to get each piece to roughly the correct dimension, then the Bridgeport mill with an Onsrud end mill bit to get each side square and to as close to the correct dimensions as I could. This portion of the process didn’t go as smoothly as I would have liked as I broke two mill bits making novice mistakes on the milling machine. I had planned on milling out the large LCD face using the mill as well, but instead of ordering a third mill bit, I just drilled a long series of holes using a standard ABS drill bit and completed the final cut and polish with a Dremel tool.

The welding process went relatively smoothly. I used an ABS glue compound that was a mixture of MEK (methyl ethyl ketone) and acetone and worked one edge at a time. The first edge was clamped to a square aluminum tube for alignment, the others self aligned against the already installed edges.

Unpopulated LCD Control Board

Unpopulated LCD Control Board

For the buttons on the right side of the display, I built a small backing board with a few momentary contact switches, then 3D printed a set of plastic buttons to fit into the front face holes that would be captured between the front face and the momentary contact switch. I used small bits of sandpaper to finish the face holes so that the buttons would have a snug, but freely moving fit.

The LCD itself was a 40x4 character version that Mikhail had previously developed an ATTiny hand built board to convert to RS232. The hand built board wouldn’t fit easily in the enclosure, so I drew up a PCB based on an ATMega32U4 which would go straight to USB, take up a bit less space, and use up some of the old parts I have lying around the lab. There were only two minor problems bringing up that board when it arrived. First, I did not realize that PORTF on the ATMega32U4 comes from the factory reserved for JTAG, where I was expecting it to be immediately available for GPIO. You can select the mode either with a boot fuse, or by configuring appropriate registers in software. Second, the LCD didn’t have any documentation, and at the time I didn’t have the source code to the original controller, so I mostly guessed at the HD44780 pinout, with the assumption that I could change things around in firmware later if things proved wrong. My guesses were actually mostly correct, but I got the ordering of the HD44780’s 4 data pins mixed up, so some bit re-arranging was required in the AVR that I hadn’t intended.

The final enclosure is pictured below, hopefully soon I will have pictures of it mounted on Savage Solder itself!

Final Savage Solder LCD Enclosure

Final Savage Solder LCD Enclosure

Enclosure Interior

Enclosure Interior

Autonomous Racing Rotorcraft: Camera Driver

In the last post on my autonomous racing helicopter I managed to get the OV9650 camera communicating in a most basic way over the I2C bus. In order to actually capture images from it, the next step is to get a linux kernel driver going which can connect up the DM3730’s ISP (Image Signal Processor) hardware and route the resulting data into user space.

After some preliminary investigation, I found the first major problem. In the linux tree as of 3.6, there is a convenient abstraction in the video4linux2 (V4L2) universe for system on chips where the video device as a whole can represent the entire video capture chain, but each new “sensor” only needs a subdevice created for it. And there are in fact a lot of sensors already existing in the mainline kernel tree, including one for the very related OV9640 camera sensor. The downside however, is that there are two competing frameworks that sensor drivers can be written to, the “soc-camera” framework and the raw video4linux2 subdevice API. Sensors written for these two frameworks are incompatible, and from what I’ve seen, each platform supports only one of the frameworks. Of course, the omap3 platform which is used in the DM3730 only supports the video4linux2 API whereas all the similar sensors are written for the “soc-camera” framework!

Laurent Pinchart, a V4L2 developer has been working on this effort some, but I had a hard time locating a canonical description of the current state of affairs. Possibly the closest thing to a state of the soc-camera/v4l2 subdev universe can be found in this mailing list post:

Subject: Re: hacking MT9P031 for i.mx
From: Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx>
Date: Fri, 12 Oct 2012 15:11:09 +0200

...

soc-camera already uses v4l2_subdev, but requires soc-camera specific support
in the sensor drivers. I've started working on a fix for that some time ago,
some cleanup patches have reached mainline but I haven't been able to complete
the work yet due to lack of time.

--
Regards,

Laurent Pinchart

Since I don’t have a lot of need to upstream this work, I took the easy route and started with an existing v4l2 subdevice sensor, specifically the mt9v034 which is in the ISEE git repository. I copied it and replaced the guts with the ov9640 “soc-camera” driver from the mainline kernel. After a number of iterations I was able to get a driver that compiled, and appeared to operate the chip.

To test, I have been using Laurent Pinchart’s media-ctl and yavta tools. The media controller framework provides a general way to configure the image pipelines on system on chips. With it, you can configure whether the sensor output flows through the preview engine or the scaler and in what order if any. yavta is just a simple command line tool to set and query V4L2 controls and do simple frame capture.

One of the first images with correct coloring is below. Single frame capture out of the box was recognizable, but the quality was pretty poor. Also, as more frames were captured, the images became more and more washed out, with all the pixel values approaching the same bright gray color. That problem I will tackle in a later post.

First Image from OV9650

First Image from OV9650

Autonomous Racing Rotorcraft: Initial Camera Exploration: I2C Smoke Test

I am now far far down the rabbit hole of trying to validate a camera for the low altitude altimetry of my prototype autonomous racing helicopter. In the last post I got to the point where I could build system images for my IGEP COM Module that included patches on top of the ISEE 3.6 linux kernel. The next step was to use that ability to turn on the clock at the TI DM3730’s external camera port.

First, what is the path by which a normal camera driver turns on the clock on the IGEP? To discover this, I traced backwards from the board expansion file for the CAMR0010 produced by ISEE, just because it was the easiest place with a thread to start grasping. In the board expansion file, “exp-camr0010.c”, a function is defined specifically to configure the ISP’s (Image Signal Processor) clock:

static void mt9v034_set_clock(struct v4l2_subdev *subdev, unsigned int rate)
{
        struct isp_device *isp = v4l2_dev_to_isp_device(subdev->v4l2_dev);

        isp->platform_cb.set_xclk(isp, rate, ISP_XCLK_A);
}

However, in the absence of a full camera driver, it was not entirely clear how to get a hold of a “struct isp_device*” that you could use to configure the clock. To understand more, I traced the many layers this function is passed down through before leaving the board expansion source file:

  • mt9v034_platform_data: This structure was defined by ISEE and is exposed from the new mt9v034 driver.
  • i2c_board_info: The mt9v034_platform_data structure is passed into this one as the “.platform_data” member.
  • isp_subdev_i2c_board_info:The i2c_board_info structure is passed as the “.board_info” member of this structure.
  • isp_v4l2_subdevs_group_camera_subdevs: The board_info structure is passed in here as the “.subdevs” member.
  • isp_platform_data: The camera_subdevs member is passed in here as the “.subdevs” member.
  • omap3_init_camera: Finally, the platform_data structure is passed in here.

Eventually, this clock setting callback is stashed inside the mt9v034 driver where it is invoked in a couple of places. Yikes! I tried to backtrack this route to get an isp_device, but had no luck. What did end up working was grabbing the driver, then device by name: (error checking and the “match any” function omitted for clarity)

struct device_driver* isp_driver;
struct device* isp_device;
struct isp_device* isp;
isp_driver = driver_find("omap3isp", &platform_bus_type);
isp_device = driver_find_device(isp_driver, NULL, NULL, match_any);
isp = dev_get_drvdata(isp_device)

Then, I exposed this functionality through a simple debugfs entry that appears in /sys/kernel/debug/arr_debug/xclka_freq (when debugfs is mounted of course). Then I was able to write frequencies from the command line and get the external clock to run at any frequency I chose. Yay!

There was one final piece to the puzzle before I could claim the camera was functional. The OV9650, while electrically compatible with I2C, is not an SMBus device. The standard linux command line tools, i2cget and friends were not able to drive the camera in a useful way. To get over the final hurdle, I wrote a simple user-space C program which opens “/dev/i2c-3”, sets the slave address using the I2C_SLAVE ioctl, and then uses the bare “read” and “write” API to send and receive bytes of data. With this, I was able to extract the product identifier from the chip of 0x9652! I guess it is likely a subsequent revision of the 9650.