Metrics and Benchmarks for Remote Shared Controllers in Industrial Applications
LINK TO PAPER: https://arxiv.org/abs/1906.08381
LINK TO POSTER: https://drive.google.com/open?id=1osPtTC9_zDxhm90epWTHXzcY2Ty7DIKq
Remote manipulation is emerging as one of the key robotics tasks needed in extreme environments. Several researchers have investigated how to add AI components into shared controllers to improve their reliability. Nonetheless, the impact of novel research approaches in real-world applications can have a very slow in-take. We propose a set of benchmarks and metrics to evaluate how the AI components of remote shared control algorithms can improve the effectiveness of such frameworks for real industrial applications. We also present an empirical evaluation of a simple intelligent share controller against a manually operated manipulator in a tele-operated grasping scenario.
The exposure of humans to hostile work environments can be reduced by means of shared controlled systems, in which the operator remotely controls a robotic platform to perform a task in such environments. One of the first industries which initially introduced the use of shared control systems was the nuclear industry but, over time, many other industries have adopted these technologies including health care, mining, military, firefighting, undersea, construction, and space. Telepresence is achieved by sensing through digital sensors information about the task environment, and feeding back this information to the human operator on a remote site. The level of automation of the system typically depends on the area of application. Some requires only supervision from the operator, but in the majority of cases it requires direct manual manipulation via a master device. In fact, the current state-of-the-art controllers in extreme environments still heavily relies on the user-in-the-loop for cognitive reasoning, while the robot device merely reproduces the master's movements. However, direct control is typically non-intuitive for the human operator and yields to an increase of his cognitive burden, which has led many researchers to investigate possible alternatives towards more efficient and intuitive frameworks.
The key insight is to add an active AI component which is context- and user-aware so to make better decision on how to assist the operator. Context-awareness is typically provided by reconstructing and understanding the scene, in terms of the objects the robot has to manipulate. User-awareness is obtained by providing as input the operator's task to the AI component, so to enable a more efficient interpretation of his/her inputs through the master device.
Despite the advancements in technologies and algorithms in autonomous robotics, and shared controllers, many industries has not yet embraced these new approaches, but rather prefer to maintain out-of-date but reliable systems. This is due to a very simple fact: the risks for the operators and the money to be invested are not worth the benefits that a novel approach may have on paper but which has never been properly tested on an uniform and standardised benchmark. Therefore we argue that providing such a benchmark, a benchmark approved and standardised by a consortium of research institutions and industries, will encourage industrial partners to invest of the new technologies, which will lead to a safer and efficient environments.
The benchmarks and metrics we propose in this paper are designed to evaluate mainly two aspects of a share control algorithm: i) The ability of extrapolating contextual information from sensing the environment to be of any support of the operator, and ii) how the (visual, haptic) feedback are used to influence the operator's response. The combination of these two components should, in fact, improve the task efficiency (number of successful executions of the task), reduce the task effort (how long it will take to execute the task), and robot demand attention (the time the user utilises to interact with the interface instead of focusing on the task at hand).
The main task of a shared control interface is to sense appropriate information about the environment and provide them to the human operator. The operator will take his or her decisions upon the information received. Figure 2 (blue model) shows a simple four-stage model of information processing for the operator. The first stage refers to the acquisition and registration of multiple sources of information. The second stage involves conscious perception and retrieving processed information from memory. The third stage is where the decision on how to act is made. The decision is obviously influenced by the task at hand. The fourth and final stage is the implementation of the chosen response, typically as a movement on a master device. A non-intelligent shared control system would receive the response of the operator via the master device and mimic (the direction of) the movement on the slave robot, as shown in Fig. 2 (top grey region). The shared controller has no knowledge of the working environment and it is not aware of the task at hand. Hence it is not capable to interpret the operator's intentions, and the best it can do is to reproduce the master's movements and provide some low-level, haptic feedback, e.g. vibrations when hitting an external surface.
To reproduce the master's movements, the simplest interface would require the operator to control one joint of the manipulator at the time. First the operator manually chooses the appropriate joint to move and then increases or decreases the joint angle via the master device. However, this type of interfaces increase the operator's cognitive burden and are not very efficient especially for complex robots with high degrees of freedom (DOF). Other interfaces control the manipulator in Cartesian space, where the master's movements are replicated by the robot's end-effector w.r.t. a chosen world's coordinate frame. Although it is more efficient than controlling single joints, this process still heavily relies on a visual servoing from the operator.
Figure 2 (bottom row, green model) shows the relative four-stage model for an intelligent shared controller and how the operator/robot share such information (black dotted arrows). The first stage involves capturing one or multiple view of the task space. Typically this is done by acquiring RGB-D images from pre-selected poses from the eye-on-hand camera. If multiple images are taken, the system registers them to create a single dense point cloud. The second stage pre-process the point cloud to remove unnecessary surfaces, such as the tabletop, or outliers. A decision is then made given the contextual information available. Assuming that the AI system and the operator share the same task, such as pick and place the objects from the tabletop, the former can assist the decision-making process of the latter by offering available actions. In the pick-and-place example, the system could compute a set of candidate grasping trajectories and visualise them on the augmented feed of the external camera, as shown in Fig. 2 by the black dotted arrow from the shared controller's Decision Making module to the operator's Sensory Processing. This would provide a way to influence the operator into select the preferred grasp. Once this process is complete, the operator and the shared controller are working to reach and execute a selected action, of which they are both aware. The system will interpret the operator's commands to assist him/her to accomplish the goal, e.g. moving along the trajectory.
Finally, the fourth stage of the shared control involves visual or haptic feedback that can be used to communicate with the human to affect his/her response, e.g. creating a field force on the master haptic device if the robot is moving away from an optimal alignment with the object, jeopardising the grasping success. An example of an intelligent shared control is presented in Fig. 4.