'Feature-Based Transfer Learning for Robotic Push Manipulation'
Our paper entitled 'Feature-Based Transfer Learning for Robotic Push Manipulation' published in 2018 IEEE International Conference on Robotics and Automation (ICRA) has been selected for an extended version by the Special Issue on Advancement in Engineering and Computer Science organized by Advances in Science, Technology and Engineering Systems Journal (ASTESJ)
Thanks to my students, Rhys Howard and Jochen Stuber, the extended paper also proposed a improvement of a twofold in prediction accuracy w.r.t. our original conference paper.
For a reader unfamiliar to AI & Robotic topics, I'll do my best to explain first the two main keywords in it. Let's start from describing what push manipulation is and why we should waste any of our precious time to read about it. Push manipulation is a branch of research which would very much like to use pushing as actions to interact with other objects, as frequently humans do. Use it in this context means also understanding. People in this area of research would like to build a model that could answer questions like the following: How an object behaves under a specific push?. This question requires to understand and model the physical interactions between two bodies, a pusher and an object to be pushed. Unfortunately the answer depends on a (high) number of parameters that are typically unknown or impractical to estimate, such as friction coefficients, mass distributions, etc. So, if there are cases in which we can answer this question, i.e. physics engines do a good job for a bunch of objects and environments that someone has been carefully chosen/designed, answering this question on any given objects is challenging. Thus the second keyword. Transferable learning refers to a type of learning that is designed to achieve maximum generalisation. We want to learn skills in a way that they can be applied to novel contexts. To achieve this, same careful thinking is required when we attempt to identify a possible solution. Deep learning enthusiasts would suggest to collect an enormous amount of data. If predicting how an object behaves under a push operation is a function, a deep neural network (DNN) can learn any type of function. Providing enough data, a function for any object/environment can be approximated and learned. This sounds like a lot of work though. Nonetheless, humans do a good job in adapting pushing skills to novel objects. When we imagine the outcome of a push we do so thanks to our internal models that are learned as result of an accumulation of a life of physical interactions, as opposed to an inherent understanding of physics (like physics engines do), but I strongly doubt we need such an enormous amount of data to do so. Our proposed approach confirms my intuition. First of all, our approach relies on a point cloud object model (PCOM) of the object to be pushed. This means that we represent the object as a set of points scanned from an RGB-D camera, but our system has no idea of what type of object is looking at. Only the visible shape of the object is known. This means that a point cloud looking like a cube could represent a box full of loose fill or a block of granite. At this point, the system has no information to distinguish between these two cases. Our approach will deal with this problem later on. Nonetheless, from the geometrical information available, our model computes a set of local features. The overall goal is to estimate a reference frame for the object to be pushed, as it stands in its initial pose, and compute a motion of the reference frame after a pushing action is applied on it, so to estimate where the object will be after the action. We do so by learning a set of motion models in simulation by repeating the same action on different physical conditions. For each simulation, physical parameters like friction and mass are sampled from some a priori distributions. At prediction time, our approach attempts to generate similar (geometrical) local contacts in order to make predictions on familiar ground and increase the transferability to novel objects.
A novel contribution proposed in the extended paper is about improving even further the performance of the predictors in specific context. For example, let us assume that we need to push a heavy object over a tiled floor, we would have a good idea of the mass of the object to be pushed and the friction that it will experience when pushed. Thus, we should take advantage of such information to train biased predictors for this specific environment/object pair (e.g. low friction/high mass). We do this by adjusting the distributions over the physical parameters at training time to learn a set of models for specific contexts.
The effectiveness of our approach is demonstrated in a simulated environment in which a Pioneer 3-DX robot equipped with a bumper needs to push previously unseen objects. We train on two objects (a cube and a cylinder) for a total of 24,000 pushes in various conditions, and test on six new objects for a total of 14,400 predicted push outcomes.