Conference Proceedings

DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations

Discovery of Deep Continuous Options (DDCO) learns from demonstrations low-level continuous control skills parametrized by deep neural networks. A hybrid categorical–continuous distribution model parametrizes high-level policies that can invoke discrete options as well continuous control actions, and a cross-validation method tunes the number of options to be discovered. We evaluate DDCO in simulation of a 3-link robot, and in two physical experiments on the da Vinci surgical robot.

Sanjay Krishnan*, Roy Fox*, Ion Stoica, and Ken Goldberg, CoRL 2017
* Equal contribution

Principled Option Learning in Markov Decision Processes

We characterize a good set of prior options as the centroids of clusters of control options that are optimized for a set of subtasks. We formulate this insight as an optimization problem and derive an optimization algorithm that alternates between planning given the set of prior options and clustering the set of control options. We illustrate this approach in a simple two-room simulation.

Roy Fox*, Michal Moshkovitz*, and Naftali Tishby, EWRL 2016
* Equal contribution