Constraint Estimation and Derivative-Free Recovery for Robot Learning from Demonstrations

We propose a two-phase method for learning safe control in fully controllable robotic manipulation tasks. In the first phase, the state-space support of supervisor demonstrations is estimated to infer implicit constraints. In the second phase, we present a switching policy to prevent the robot from leaving safe states. The policy switches between the robot’s learned policy and a novel failure-avoidance policy depending on the distance to the boundary of the support. We prove that inferred constraints are guaranteed to be enforced if the support is well-estimated. A simulated pushing task suggests that support estimation and failure avoidance control can reduce failures by 87% while sacrificing only 40% of performance. On a line tracking task using a da Vinci surgical robot, failure avoidance control reduced failures by 84%.

Jonathan Lee, Michael Laskey, Roy Fox, and Ken Goldberg, CASE 2018

Parametrized Hierarchical Procedures for Neural Programming

The main challenges that set algorithmic domains apart from other imitation learning domains are the need for high accuracy, the involvement of specific structures of data, and the extremely limited observability. To address these challenges, we propose to model programs as Parametrized Hierarchical Procedures (PHPs). A PHP is a sequence of conditional operations, that uses a program counter, along with the observation, to select between taking an elementary action, invoking another PHP as a sub-procedure, and returning to the caller. We develop an algorithm for training PHPs from a mixture of annotated and unannotated demonstrations, and apply it to efficient level-wise training of multi-level PHPs. We show in two benchmarks, NanoCraft and long-hand addition, that PHPs can learn neural programs more accurately from smaller amounts of strong and weak supervision.

Roy Fox, Richard Shin, Sanjay Krishnan, Ken Goldberg, Dawn Song, and Ion Stoica, ICLR 2018

DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations

Discovery of Deep Continuous Options (DDCO) learns from demonstrations low-level continuous control skills parametrized by deep neural networks. A hybrid categorical–continuous distribution model parametrizes high-level policies that can invoke discrete options as well continuous control actions, and a cross-validation method tunes the number of options to be discovered. We evaluate DDCO in simulation of a 3-link robot, and in two physical experiments on the da Vinci surgical robot.

Sanjay Krishnan*, Roy Fox*, Ion Stoica, and Ken Goldberg, CoRL 2017
* Equal contribution

DART: Noise Injection for Robust Imitation Learning

DART injects noise during supervisor demonstrations, with a noise level optimized to approximate the error of the robot’s trained policy. This forces the supervisor to demonstrate how to recover from errors, and doesn’t suffer from supervisor burden, computational cost, and training risk as much as on-policy methods. DART learns better policies faster and from fewer demonstrations.

Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, and Ken Goldberg, CoRL 2017

An Algorithm and User Study for Teaching Bilateral Manipulation via Iterated Best Response Demonstrations

Bilateral Iterated Best Response (BIBR) reduces supervisor burden when training multiple manipulators, by rolling out an estimated robot policy for one arm while the human demonstrates for the other, iteratively updating the estimated policy. BIBR learns policies with increased success rate, and shorter and smoother trajectories.

Carolyn Chen, Sanjay Krishnan, Michael Laskey, Roy Fox, and Ken Goldberg, CASE 2017

Statistical Data Cleaning for Deep Learning of Automation Tasks from Demonstrations

We explore how characterizing supervisor inconsistency and correcting for this noise can improve task performance with a limited budget of data. In a planar part extraction task where human operators provide demonstrations by teleoperating a 2DOF robot, CNN models perform better when trained after error corrections.

Caleb Chuck, Michael Laskey, Sanjay Krishnan, Ruta Joshi, Roy Fox, and Ken Goldberg, CASE 2017


Multi-Level Discovery of Deep Options

Existing techniques for automated option discovery do not scale to multi-level hierarchies and to expressive representations such as deep networks. We present Discovery of Deep Options (DDO), a policy-gradient algorithm that discovers parametrized options from a set of demonstration trajectories, and can be used recursively to discover additional levels of the hierarchy. We show that DDO is effective in adding options that accelerate learning in 4 out of 5 Atari RAM environments chosen in our experiments. We also show that DDO can discover structure in robot-assisted surgical videos and kinematics that match expert annotation with 72% accuracy.

Roy Fox*, Sanjay Krishnan*, Ion Stoica, and Ken Goldberg, arXiv, 2017
* Equal contribution