Conference Proceedings

Statistical Data Cleaning for Deep Learning of Automation Tasks from Demonstrations

Human demonstrators of deep learning tasks are prone to inconsistencies and errors that can delay or degrade learning. We improve task performance by characterizing supervisor inconsistency and correcting for it using data cleaning techniques. In human demonstrations of a planar part extraction task on a 2DOF robot, trained CNN models show an improvement of 11.2% in mean absolute success rate after data cleaning.

Caleb Chuck, Michael Laskey, Sanjay Krishnan, Ruta Joshi, Roy Fox, and Ken Goldberg, CASE, 2017


Multi-Level Discovery of Deep Options

Discovery of Deep Options (DDO) discovers parametrized options from demonstrations, and can recursively discover additional levels of the hierarchy. It scales to multi-level hierarchies by decoupling low-level option discovery from high-level meta-control policy learning, facilitated by under-parametrization of the high level. We demonstrate that augmenting the action space of Deep Q-Networks with the discovered options accelerates learning by guiding exploration in tasks where random actions are unlikely to reach valuable states.

*Roy Fox, *Sanjay Krishnan, Ion Stoica, and Ken Goldberg, 2017
* Equal contribution

Iterative Noise Injection for Scalable Imitation Learning

Imitation Learning algorithms suffer from covariate shift when the agent visits different states than the supervisor. Active approaches such as DAgger collect data from the current agent distribution, which can also change in each iteration with very large batch sizes. We propose injecting artificial noise into the supervisor’s policy, prove an improved bound on the loss due to the covariate shift, and introduce an algorithm to estimate the level of ε-greedy noise to inject. Our algorithm, Dart, achieves a better performance than DAgger in a driving simulator.

Michael Laskey, Jonathan Lee, Wesley Hsieh, Richard Liaw, Jeffrey Mahler, Roy Fox, and Ken Goldberg, 2017