This was a little unpublished demo I threw together towards the end of the DARPA ARM-S (Autonomous Robotic Manipulation) competition. The goal was to track multiple, known objects in a scene in real-time by performing online iterative-closest-point on a Kinect point cloud.
The method uses several tricks to make it very fast:
- The table is filtered out of the sensor data a priori by fitting a plane before the tracking phase begins. Table points are culled extremely efficiently in the GPU.
- Object poses are initialized with coarse template matching.
- “Synthetic point clouds” are produced on the GPU efficiently by rendering the modeled objects at their tracked poses in the frame of the sensor. Their synthetic depth images are then stochastically sampled.
- A single octree is used to find the closest points in the “synthetic point cloud” to the sensor data. Object points are indexed by color.
- Ten iterations of iterative-closest-point are performed on the synthetic sensor data before the next frame is taken in by the sensor.
The result is an okay-ish tracking algorithm that works reasonably quickly. In the video, on the left you see a 3D view of the point cloud and tracked objects. On the right we have the images used to create the synthetic point cloud from the perspective of the depth camera. The technique unfortunately suffers from large local minima, and can’t handle very fast motion. It would be best used as the input to a filtering algorithm such as a Kalman filter or a Particle filter.
In practice, we used an algorithm very similar to this to localize objects in the robot’s hand. The robot can efficiently track an object in its hand by using the kinematic model of its own arm as both a means of culling sensor data unrelated to the object in the hand, and to initialize the pose of the object.