Conference Proceedings

An Algorithm and User Study for Teaching Bilateral Manipulation via Iterated Best Response Demonstrations

Multilateral demonstrations can be difficult for human supervisors to proved because they require divided attention. We propose Bilateral Iterated Best Response (BIBR), a new algorithm that reduces supervisor burden by iteratively demonstrating each manipulator unilaterally while rolling out an estimated robot policy for the other manipulator. We present a web-based user study of a two-agent gridworld domain. We confirm prior work that bilateral demonstrations are noisier and longer when the task is asymmetric, and show that BIBR improves success rate in the asymmetric task, while learning policies that have shorter and smoother trajectories.

Carolyn Chen, Sanjay Krishnan, Michael Laskey, Roy Fox, and Ken Goldberg, CASE, 2017

Statistical Data Cleaning for Deep Learning of Automation Tasks from Demonstrations

Human demonstrators of deep learning tasks are prone to inconsistencies and errors that can delay or degrade learning. We improve task performance by characterizing supervisor inconsistency and correcting for it using data cleaning techniques. In human demonstrations of a planar part extraction task on a 2DOF robot, trained CNN models show an improvement of 11.2% in mean absolute success rate after data cleaning.

Caleb Chuck, Michael Laskey, Sanjay Krishnan, Ruta Joshi, Roy Fox, and Ken Goldberg, CASE, 2017

Preprints

Multi-Level Discovery of Deep Options

Discovery of Deep Options (DDO) discovers parametrized options from demonstrations, and can recursively discover additional levels of the hierarchy. It scales to multi-level hierarchies by decoupling low-level option discovery from high-level meta-control policy learning, facilitated by under-parametrization of the high level. We demonstrate that augmenting the action space of Deep Q-Networks with the discovered options accelerates learning by guiding exploration in tasks where random actions are unlikely to reach valuable states.

*Roy Fox, *Sanjay Krishnan, Ion Stoica, and Ken Goldberg, 2017
* Equal contribution

Iterative Noise Injection for Scalable Imitation Learning

Imitation Learning algorithms suffer from covariate shift when the agent visits different states than the supervisor. Active approaches such as DAgger collect data from the current agent distribution, which can also change in each iteration with very large batch sizes. We propose injecting artificial noise into the supervisor’s policy, prove an improved bound on the loss due to the covariate shift, and introduce an algorithm to estimate the level of ε-greedy noise to inject. Our algorithm, Dart, achieves a better performance than DAgger in a driving simulator.

Michael Laskey, Jonathan Lee, Wesley Hsieh, Richard Liaw, Jeffrey Mahler, Roy Fox, and Ken Goldberg, 2017