Roy Fox*, Sanjay Krishnan*, Ion Stoica, and Ken Goldberg
Augmenting an agent’s control with useful higher-level behaviors called options can greatly reduce the sample complexity of reinforcement learning, but manually designing options is infeasible in high-dimensional and abstract state spaces. While recent work has proposed several techniques for automated option discovery, they do not scale to multi-level hierarchies and to expressive representations such as deep networks. We present Discovery of Deep Options (DDO), a policy-gradient algorithm that discovers parametrized options from a set of demonstration trajectories, and can be used recursively to discover additional levels of the hierarchy. The scalability of our approach to multi-level hierarchies stems from the decoupling of low-level option discovery from high-level meta-control policy learning, facilitated by under-parametrization of the high level. We demonstrate that using the discovered options to augment the action space of Deep Q-Network agents can accelerate learning by guiding exploration in tasks where random actions are unlikely to reach valuable states. We show that DDO is effective in adding options that accelerate learning in 4 out of 5 Atari RAM environments chosen in our experiments. We also show that DDO can discover structure in robot-assisted surgical videos and kinematics that match expert annotation with 72% accuracy.