The main challenges that set algorithmic domains apart from other imitation learning domains are the need for high accuracy, the involvement of specific structures of data, and the extremely limited observability. To address these challenges, we propose to model programs as Parametrized Hierarchical Procedures (PHPs). A PHP is a sequence of conditional operations, that uses a program counter, along with the observation, to select between taking an elementary action, invoking another PHP as a sub-procedure, and returning to the caller. We develop an algorithm for training PHPs from a mixture of annotated and unannotated demonstrations, and apply it to efficient level-wise training of multi-level PHPs. We show in two benchmarks, NanoCraft and long-hand addition, that PHPs can learn neural programs more accurately from smaller amounts of strong and weak supervision.
Conferences, Workshops, and Symposia
Discovery of Deep Continuous Options (DDCO) learns from demonstrations low-level continuous control skills parametrized by deep neural networks. A hybrid categorical–continuous distribution model parametrizes high-level policies that can invoke discrete options as well continuous control actions, and a cross-validation method tunes the number of options to be discovered. We evaluate DDCO in simulation of a 3-link robot, and in two physical experiments on the da Vinci surgical robot.
* Equal contribution
Existing techniques for automated option discovery do not scale to multi-level hierarchies and to expressive representations such as deep networks. We present Discovery of Deep Options (DDO), a policy-gradient algorithm that discovers parametrized options from a set of demonstration trajectories, and can be used recursively to discover additional levels of the hierarchy. We show that DDO is effective in adding options that accelerate learning in 4 out of 5 Atari RAM environments chosen in our experiments. We also show that DDO can discover structure in robot-assisted surgical videos and kinematics that match expert annotation with 72% accuracy.
* Equal contribution