Primary tabs

Robot Manipulation for Unstructured Environments

Project Abstract: 
In order to perform autonomous sequential manipulation tasks, perception in cluttered scenes remains a critical challenge for robots. We have developed a probabilistic approach for robust sequential scene estimation and manipulation - Sequential Scene Understanding and Manipulation (SUM). SUM considers uncertainty due to discriminative object detection and recognition in the generative estimation of the most likely object poses maintained over time to achieve a robust estimation of the scene under heavy occlusions and unstructured environment. Our method utilizes candidates from discriminative object detector and recognizer to guide the generative process of sampling scene hypothesis, and each scene hypotheses is evaluated against the observations. Also, SUM maintains beliefs of scene hypothesis over robot physical actions for better estimation and against noisy detections. We will conduct experiments in unstructured environments to show that our approach is able to perform robust estimation and manipulation.
Years Active: 
2017
Methods: 
Given past RGBD observations and robot manipulation actions, our aim is to estimate an unstructured scene as a collection of k objects, with object labels o, 2D image-space bounding boxes b_t, and six DoF object poses qt. Note, k is the number of objects recognized in the scene. Object labels are strings containing a semantic identifier assumed to be a human-intuitive reference. In our experiments, the manipulation actions u_t are pick-and-place actions, which will invoke a motion planning process. However, our formulation is general such that u_t also applies to low level motor commands represented by joint torques, such as in scenarios for object tracking. The state of an individual object i in the scene is represented as a tuple of object pose, object lable, and its image space bounding box. We assume that every object is independent of all other objects, which implies there will be only one object with a given label in the eventual inferred scene estimate. Independence between objects allows us to state this scene estimation problem as using the statistical chain rule and independence assumptions, factoring of the scene estimation problem into object detection, object recognition, and belief over object pose. The object detection factor denotes the probability of object i being observed within the bounding box given observation. The object recognition factor denotes the probability of this object having label given the observation inside the bounding box. These distributions are generated as the output of a pre-trained discriminative object detector and recognizer that evaluates all possibilities. The pose belief factor for a particular object is modeled over time by a recursive Bayesian filter. We have demonstrated the effectiveness and robustness of SUM with respect to both perception and manipulation errors in a cluttered tabletop scenario for an object sorting task with a Fetch mobile manipulator. We are now looking to extend these methods to unstructured environments beyond tabletops.