Multi-Level Discovery of Deep Options

Discovery of Deep Options (DDO) discovers parametrized options from demonstrations, and can recursively discover additional levels of the hierarchy. It scales to multi-level hierarchies by decoupling low-level option discovery from high-level meta-control policy learning, facilitated by under-parametrization of the high level. We demonstrate that augmenting the action space of Deep Q-Networks with the discovered options accelerates learning by guiding exploration in tasks where random actions are unlikely to reach valuable states.

*Roy Fox, *Sanjay Krishnan, Ion Stoica, and Ken Goldberg, 2017
* Equal contribution