Tag Archives: vision

Sparkfun AVC: Visual Hoop Detection

While it caused us some grief during the actual competition, Savage Solder for the most part successfully used a vision based system to detect and track the bonus hoop necessary for extra points in this year’s Sparkfun AVC. The premise of our entry into the competition was a relatively cheap car, with nothing but cheap sensors. We compensated for our low accuracy GPS with a camera which allowed us to avoid barrels, and find the hoop and ramp at speed. Our barrel and hoop detection and tracking is basically unchanged from the cone detection we use in robomagellan style events. You can find a series describing it in part 1part 2, and part 3. In this post, I’ll describe specifically our hoop detection algorithm, and give some results showing how well it performs on real world data.

Original Image from Webcam

As with the cone detection system earlier, we use a standard USB webcam with a pipeline that consists of part OpenCV primitives and part custom detection routines. The output of the algorithm is an estimated range and bearing (and their estimated errors) to any hoops that might be present in the current frame.

In the example below, I’ll show how the process works on the sample image to the right, taken from Savage Solder during the AVC 2013.


Edge Detection


Step A: First, we apply a Canny edge detection filter. We just use the default OpenCV “Canny” function. The thresholds are selected as part of our overall tuning process, although an initial guess was made based on what looked reasonable in some sample data sets.

Step B: As with the cone/barrel/ramp detection, we compute the distance transform. Here, since the objects of interest are thin lines, the distance transform lets us quickly compute how far away individual points are from that line. This is done using the OpenCV “distanceTransform” method.

Distance Transform


Step C: Using the original edge detected image, the OpenCV “findContours” method is used to identify all the contiguous lines. Contours which have fewer than a configurable number of points are dropped from the list of contours.

Step D: Now comes the first key part of the algorithm. At this point, we pick a random contour from our list of valid ones, then pick three random points on that contour. A circle is drawn circumscribing through those three points, to create a “proto-hoop”. A number of preliminary checks are run, if any of them fail, the proto-hoop is discarded and another sample is selected:

A Single Sampled Circle
  • Are all 3 points on the upper half of the circle?
  • Is the upper half of the circle entirely in the visual frame?
  • Is the center of the circle above a configurable minimum height?
  • Is the radius larger than a configurable minimum size?

If all the checks pass, then the circle is passed on to the next step.


Step E: At this point, we have a proto-hoop drawn in the image, but it might be three points that touched a square building, or a cloud, or a few people practicing their cheerleading. To distinguish these cases, we evaluate a support metric to determine how likely it is actually a hoop. To do so, we step through each pixel in the upper half of the circle in a clockwise manner. For each pixel, we compute two separate metrics which are later summed together. For clarity, the metric is defined such that smaller is better.

  1. The square of the distance from that point on the circle to the nearest edge pixel. This prefers circles which are close to edges for most of their path.
  2. The square root of the distance minus the square root of the previous point’s distance. This portion of the metric penalizes circles which are near very jagged edges. This is intended to penalize shapes like trees which are often circular-ish, but not very smooth.

If the support metric is too large for a given circle, then that circle is dropped and another sample is chosen.

Step F: Go back to step D and repeat a couple of hundred times.

Step G: Finally, before reporting potential hoops, a last pass is taken. The proto-hoops are examined in order from best support to worst support. For each one, a local optimization function is run to find a nearby circle center and radius which improves the overall support. Then, for the first proto-hoop, it is just added to the output list. For subsequent hoops, they are compared against previously reported ones, and circles that substantially overlap are discarded as likely false positives.

Final Result: Two Detections

At this final stage, there are now a set of curated hoop detections for the given frame! The results are passed up to the tracker module, which correlates targets from frame to frame, discarding ones that don’t behave like a hoop should over time.

Results and Future Work

We only got the detector working with reasonable quality 2 or 3 weeks before the competition, so our corpus of images was not that large. However, in the dataset we do have, it is able to reliably detect the hoop out to about 6 meters of range with a manageable level of false positives. The table below shows the performance we had measured using data we took at our local testing sites.

Range (m) Detections False Detections
2-4 100% (18) 5.3% (1)
4-6 76% (16) 61% (25)
6-8 23% (16) 73% (14)

While the false positive level might seem high, the subsequent tracking is able to cope as the false positives tend to be randomly scattered about, and thus don’t form stable tracks from frame to frame. The detection rate within 6 meters is high enough that real hoops are detected 3 out of 4 times, which lets them be tracked just fine.

This approach seemed to work relatively well, although we had a relatively smaller corpus size compared to barrels and ramps. For future AVC competitions, we will likely use the same or similar technique, but trained on a larger dataset to avoid systematic failures like hoop shaped clouds, trees, or buildings.

Savage Solder: Selecting Constants for Image Processing

The final pieces of the cone detection and tracking system for Savage Solder are the tools we used to derive all the constants necessary for each of the algorithms. In part 1 (cone detector) and part 2 (cone tracker) I described how we first pick out possible range and bearings to cones in an image, and then take those range and bearings and turn them into a local Cartesian coordinate through the tracking process. As mentioned there, each of those stages has many tunable knobs.For the first stage, the following are the key parameters:

  • Hue range – The window of hues to consider a pixel part of a valid cone.
  • Saturation range – The range of saturation values to include.
  • Minimum and maximum aspect ratio – Valid cone like objects are expected to be moderately narrow and tall.
  • Minimum and maximum fill rate – If the image is crisp, most of the pixels in the bounding box, (but not all) should meet the filtering criteria.

Additionally, the cone tracker has its own set of parameters:

  • Detection rate – given a real world cone at a certain distance, how likely are we to detect it?
  • False positive rate – given a measurement at a specific range, how likely is it to be false?
  • Range and bearing limits – At what range and bearing should we reject measurements.

What we did in 2012, during the preparation for our first RoboMagellan, was take a lot of pictures of cones during our practice runs. Savage Solder normally is configured to save an image twice a second all the time it is running. These image datasets formed the basis of our ability to tune parameters and develop algorithms that would be robust in a wide range of conditions.

The basic idea we worked off was to create a metric for how good the system is, and then evaluate the metric over a set of data using our current algorithms and constants. You can then easily tweak the constants and algorithms as much as you want without running the car one bit and have good confidence that the results will be applicable to actual live runs.

Annotation Pipeline

Most of the metrics we wanted involved knowing where the cones actually were in the images. If we could just robustly identify cones in images programmatically, we wouldn’t really be worrying about this to begin with. So instead, we created a set of tools that let us rapidly mark, or annotate, images to indicate where the absolute ground truth of cones could be found. This is just a simple custom OpenCV program with some keyboard shortcuts that let us classify the cones in about 3000 images in a couple of hours. We selected images from varying times of day, angles, and lighting conditions, so that we would have a robust training set.

OpenCV Application for Annotation

Then, with a little wrapper script, we ran our cone detection algorithm over each of the frames. As mentioned in the on the cone detector, it outputs one or more range/bearing pairs to each of the prospective cones in the image. The quality metric then scores the cone detector based on accuracy of bearing and range to real cones, missed detections, and false detections. It also keeps histograms of each detection category by range. In the end, for a given set of cone detector parameters, we end up with a table that looks like the one below.

Range Detection Rate False Positive Rate
3m 91% 2%
5m 95% 7%
7.5m 93% 21%
10m 89% 34%
15m 72% 26%
20m 37% 23%

After each run over all the images, we would try changing the parameters to the cone detector, then seeing what the resultant table would like. Ideally, you have a much higher detection rate than false positive rate over the ranges you care about. The table above is actually the final one we were able to achieve for 2012, which allowed us to reliably sense cones at 15 meters of range using just our 640×480 stock webcam.

Savage Solder: Identifying Cones

Cone Tracking Block Diagram

One of the key aspects of our Savage Solder robomagellan solution is the ability to identify and track the orange traffic cones that serve as goal and bonus targets. Our approach to this is divided up into two parts, the first identification, the second tracking. They are roughly linked together with the block diagram to the right. Raw camera imagery is fed to the identification algorithm, which produces putative cones denoted by the bearing and range where they are estimated in the camera field of fiew. The tracking algorithm then takes those range and bearing estimates, along with the current navigation solution, and fuses them into a set of cartesian coordinate cones in the current neighborhood. Those local coordinates are then fed to the higher level race logic, which can decide to target or clip a cone if it is at the appropriate place in the script.

In this post, I’ll cover the Cone Identification Algorithm, listing out the individual steps within it.

Cone Detection Dataflow

Original Image from Webcam

Our identification algorithm consists of a relatively simple image processing pipeline described below. We will work through a sample image starting from the source shown to the right:

Step A: First, we convert the image into the Hue Saturation Value color space, from the RGB values that our camera data source provides. In the HSV space, hue corresponds to color, and saturation and value determine how much color there is, and how white the color is. Since the traffic cones have a known color (orange), this helps us in later steps reject a lot of very unlikely regions of the visual field.

Hue Channel of HSV Transform


Step B: Second, we apply a limit filter to select only a subset of the hues and saturations, ignoring the value channel entirely. Both hue and saturation are parameterized by a center point and a range on either side of that center point which will be accepted. Since the hue values wrap around at 0, (red), we take care to select values on both sides of the center point. If the hue range does wrap around, we just do two limit operations, one for each side and merge the two together after the fact. The result of this process is a two color bitmap. It is black if there was no cone detected, white if there was a potential cone.

Bitmap with specific hue range selected

Step C: Next, we perform what OpenCV calls a “distance” transform on the resulting bitmap. This determines, for any given white pixel, how far it is to the nearest black pixel in any direction. Effectively, this will tell us how big any matched section is. It is not a terribly reliable indicator, as even a single black pixel in the middle of an otherwise large white area can result in halving the distance for all the pixels in that region. However, we use this to just discard regions that have a distance of 2 or less, which mostly throws away speckle noise. We also process each region in order of its distance score, so that we handle the biggest regions first.


Distance metric

Step D: At this point, we iterate through the connected regions starting at those which contain the largest “distance” points. For each, we measure the bounding rectangle of the region, and from that caculate the aspect ratio and the fill percentage of the region. The aspect ratio is just the width divided by the height. The fill percentage is the percentage of pixels inside the bounding rectangle which were marked as being a cone. To be considered a valid cone, the aspect ratio and fill percentage must be within certain configured limits.

Step E: Finally, if the putative cone passed all the above checks, the bounding rectangle is used to estimate the bearing and distance. The bearing estimation is relatively straightforward. We assume our camera has no distortion and a field of view of 66 degrees. From there, the x coordinate of the bounding box center linearly relates to the bearing angle. Range is estimated using the total number of pixels detected in this region assuming the cone is a perfect sphere. We calibrated the size of the sphere to match the visible surface area of a cone. If the detected region abuts one of the edges of the camera image, then range is not emitted at all, as the range will be suspect due to some of the pixels being off screen. The very last step is to estimate the bearing and range uncertainty, which the detector generates for 1 standard deviation away from the mean. A guess for the bearing value is made assuming a fixed number of pixels of error in the x direction. For range, we assume that some percentage of the pixels will be missed, which would errnoneously make the cone seem farther away than it actually is.

Next Steps

One all the qualifying regions are processed, or the maximum number of visible cones are found (4 for our case), then we call this image done and record the outputs. Next, these values go on to the “cone tracker”, which generates Cartesian coordinate positions for each cone. I’ll discuss that module later.