Conference Proceedings

Principled Option Learning in Markov Decision Processes

We characterize a good set of prior options as the centroids of clusters of control options that are optimized for a set of subtasks. We formulate this insight as an optimization problem and derive an optimization algorithm that alternates between planning given the set of prior options and clustering the set of control options. We illustrate this approach in a simple two-room simulation.

*Roy Fox, *Michal Moshkovitz, and Naftali Tishby, EWRL, 2016
* Equal contribution