Discovery of Deep Continuous Options (DDCO) learns from demonstrations low-level continuous control skills parametrized by deep neural networks. A hybrid categorical–continuous distribution model parametrizes high-level policies that can invoke discrete options as well continuous control actions, and a cross-validation method tunes the number of options to be discovered. We evaluate DDCO in simulation of a 3-link robot, and in two physical experiments on the da Vinci surgical robot.
Sanjay Krishnan*, Roy Fox*, Ion Stoica, and Ken Goldberg, CoRL 2017
* Equal contribution
DART injects noise during supervisor demonstrations, with a noise level optimized to approximate the error of the robot’s trained policy. This forces the supervisor to demonstrate how to recover from errors, and doesn’t suffer from supervisor burden, computational cost, and training risk as much as on-policy methods. DART learns better policies faster and from fewer demonstrations.
Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, and Ken Goldberg, CoRL 2017
We explore how characterizing supervisor inconsistency and correcting for this noise can improve task performance with a limited budget of data. In a planar part extraction task where human operators provide demonstrations by teleoperating a 2DOF robot, CNN models perform better when trained after error corrections.
Caleb Chuck, Michael Laskey, Sanjay Krishnan, Ruta Joshi, Roy Fox, and Ken Goldberg, CASE 2017