Real-Time Markerless Arm Tracking

For several years, I was working on the DARPA ARM-S project at the National Robotics Engineering Center (NREC). This DARPA project was a competition whose aim was to solve basic autonomous robot manipulation problems (such as opening doors, drilling holes, and picking things up) using a 2-armed robot. Unlike the robots commonly found in factories, our robot would be faced with unknown objects in unknown locations, and would have to rely on sensing and planning to accomplish its tasks.

One of the most challenging aspects of the competition was the fact that our robot’s arms (cable-driven Barrett arms) had spectacularly bad positioning repeat-ability. Industrial robots typically can repeat the same end effector position to within a few millimeters. We observed, on the other hand, that our arms had a positioning uncertainty of nearly six centimeters at the end effector, due to cable stretch, the effects of gravity, and poorly calibrated encoders. Further, the error gets worse over time, and is different in different configurations of the robot.


This led to miserable performance in some of the early challenges in Phase I and II. All of the teams developed strategies to combat this. In Phase I, we relied on a series of touch motions based on the robot’s force sensor to reduce uncertainty. In Phase II, we decided to address the issue of end-effector uncertainty by directly estimating the pose of the arm using depth data, and taking corrective action to account for the error. The technique we developed is called Real-Time Markerless Arm Tracking, and is related to Microsoft Research’ Kinect skeleton tracking algorithm.


Basically, we want to minimize the distance between our model of the robot’s arm, and the observed point cloud. We do this by stochastic gradient descent on a distance-based cost function in joint space. We have analytically derived the gradient of the cost function in joint space, and found that the result is extremely elegant: simply apply a force at every point along the robot’s arm toward the observed sensor data.

It goes like this:

  1. Have the robot look at the arm.
  2. Render a synthetic point cloud in the frame of the depth camera, with each link indexed by color. Take a random subset of these points.
  3. Match each point in the synthetic point cloud to the nearest point from the depth sensor using an octree.
  4. For each matching point, find a vector from the synthetic point cloud to the sensor data.
  5. Pass the vector through the Jacobian transpose of the robot’s arm to get a joint differential for each point. This is analogous to applying a force between the robot’s model and the observed sensor data.
  6. Take a small step in joint space in the direction of the average differential computed for all the points.
  7. Repeat from step 2 until convergence.

The result is an extremely fast, online, real-time algorithm for tracking the arm as it moves. It is so fast, that we integrated it into a 30hz controller. This dramatically improved our performance on all of the tasks, and for the most part, eliminated our previous touch-based strategy.



Developing this arm-tracking technology has made me wonder about its potential uses. One of the problems with robotic manipulation is the prohibitively large cost of robot arms. Robot arms are expensive for a variety of reasons — but one of them is the extremely high level of precision required in manufacturing them so that they are accurately and repeatably controlled. If we can mitigate robot arm error with sensing, perhaps it is possible to do more with cheaper robots. I am interested in applying this technique to much cheaper, hobbyist level arms, or perhaps to Rethink Robotics Baxter.