This page is here to detail research I have worked on or am currently working on.
- Have the robot look at the arm.
- Render a synthetic point cloud in the frame of the depth camera, with each link indexed by color. Take a random subset of these points.
- Match each point in the synthetic point cloud to the nearest point from the depth sensor using an octree.
- For each matching point, find a vector from the synthetic point cloud to the sensor data.
- Pass the vector through the Jacobian transpose of the robot's arm to get a joint differential for each point. This is analogous to applying a force between the robot's model and the observed sensor data.
- Take a small step in joint space in the direction of the average differential computed for all the points.
- Repeat from step 2 until convergence.
- The table is filtered out of the sensor data a priori by fitting a plane before the tracking phase begins. Table points are culled extremely efficiently in the GPU.
- Object poses are initialized with coarse template matching.
- "Synthetic point clouds" are produced on the GPU efficiently by rendering the modeled objects at their tracked poses in the frame of the sensor. Their synthetic depth images are then stochastically sampled.
- A single octree is used to find the closest points in the "synthetic point cloud" to the sensor data. Object points are indexed by color.
- Ten iterations of iterative-closest-point are performed on the synthetic sensor data before the next frame is taken in by the sensor.
I just wanted to post a video of an algorithm I made with Ivan Dryanovski while working on Google's Project Tango which does real-time 3D reconstruction on the Tango device using Chunked, Truncated Signed Distance Fields.
And a screenshot of a reconstructed apartment scene:
More about this soon.Read more..
Now that ARM-S Phase II is over, I've made all of my ARM-S videos public on my YouTube page. ARM-S is a competition, hosted by DARPA, to perform basic manipulation tasks autonomously with a 2-armed research robot. I was on the NREC team. Our robot was called "Andy".
Here are some of the cooler videos:
Andy grasping things with DARRT
Andy planning around sensed clutter with CHOMP
Andy removing a tire
Andy localizing a door with touches
Andy building a tower out of blocks (this demo has been running at the Smithsonian Air and Space Museum for about a year now)Read more..
For several years, I was working on the DARPA ARM-S project at the National Robotics Engineering Center (NREC). This DARPA project was a competition whose aim was to solve basic autonomous robot manipulation problems (such as opening doors, drilling holes, and picking things up) using a 2-armed robot. Unlike the robots commonly found in factories, our robot would be faced with unknown objects in unknown locations, and would have to rely on sensing and planning to accomplish its tasks.
One of the most challenging aspects of the competition was the fact that our robot's arms (cable-driven Barrett arms) had spectacularly bad positioning repeat-ability. Industrial robots typically can repeat the same end effector position to within a few millimeters. We observed, on the other hand, that our arms had a positioning uncertainty of nearly six centimeters at the end effector, due to cable stretch, the effects of gravity, and poorly calibrated encoders. Further, the error gets worse over time, and is different in different configurations of the robot.
This led to miserable performance in some of the early challenges in Phase I and II. All of the teams developed strategies to combat this. In Phase I, we relied on a series of touch motions based on the robot's force sensor to reduce uncertainty. In Phase II, we decided to address the issue of end-effector uncertainty by directly estimating the pose of the arm using depth data, and taking corrective action to account for the error. The technique we developed is called Real-Time Markerless Arm Tracking, and is related to Microsoft Research' Kinect skeleton tracking algorithm.
Basically, we want to minimize the distance between our model of the robot's arm, and the observed point cloud. We do this by stochastic gradient descent on a distance-based cost function in joint space. We have analytically derived the gradient of the cost function in joint space, and found that the result is extremely elegant: simply apply a force at every point along the robot's arm toward the observed sensor data.
It goes like this:
The result is an extremely fast, online, real-time algorithm for tracking the arm as it moves. It is so fast, that we integrated it into a 30hz controller. This dramatically improved our performance on all of the tasks, and for the most part, eliminated our previous touch-based strategy.
Developing this arm-tracking technology has made me wonder about its potential uses. One of the problems with robotic manipulation is the prohibitively large cost of robot arms. Robot arms are expensive for a variety of reasons -- but one of them is the extremely high level of precision required in manufacturing them so that they are accurately and repeatably controlled. If we can mitigate robot arm error with sensing, perhaps it is possible to do more with cheaper robots. I am interested in applying this technique to much cheaper, hobbyist level arms, or perhaps to Rethink Robotics Baxter.
This was a little unpublished demo I threw together towards the end of the DARPA ARM-S (Autonomous Robotic Manipulation) competition. The goal was to track multiple, known objects in a scene in real-time by performing online iterative-closest-point on a Kinect point cloud.
The method uses several tricks to make it very fast:
The result is an okay-ish tracking algorithm that works reasonably quickly. In the video, on the left you see a 3D view of the point cloud and tracked objects. On the right we have the images used to create the synthetic point cloud from the perspective of the depth camera. The technique unfortunately suffers from large local minima, and can't handle very fast motion. It would be best used as the input to a filtering algorithm such as a Kalman filter or a Particle filter.
In practice, we used an algorithm very similar to this to localize objects in the robot's hand. The robot can efficiently track an object in its hand by using the kinematic model of its own arm as both a means of culling sensor data unrelated to the object in the hand, and to initialize the pose of the object.Read more..
Some of my latest research has focused on reconstructing the 3D shape of objects using noisy, sparse depth data (from, say a Kinect) in real-time using efficient algorithms based around Martin Hermann et. al Voxel Depth Carving.
A key insight we're pushing is that often the negative data from a point cloud (that is, information about what is not part of the object) is much more informative than the positive data. This has led us to avoid using point clouds altogether in favor of much more descriptive ray clouds to describe depth data. One advantage of this technique is that it tends to be much more robust to noise, and naturally deals with missing data.
Another advantage of the technique is that it allows us to reason about the unknown regions of space, and make probabilistic statements about the object at those locations -- something which is not possible when one only considers the point cloud. The robot can construct strongly principled priors about the shapes of objects and make reasoned inferences about the shape of the unknown parts of the object, rather than throwing away data or fitting a model beforehand.
In an ongoing research effort, we are using the Voxel Depth Carving technique along with kernel regression to learn the 3-dimensional distance field of the object using passthroughs of un-occupied space as constraints.
This research will probably be published early next school year.Read more..