Carolyn Chen, Sanjay Krishnan, Michael Laskey, Roy Fox, and Ken Goldberg
Human demonstrations can be valuable for teaching robots to perform manipulation and coordination tasks. However, it can be difficult for human supervisors to provide demonstrations for multilateral (multi-arm) tasks, which require divided attention. In this paper, we propose a new algorithm called Bilateral Iterated Best Response (BIBR), which builds on the game-theoretic concept of Iterated Best Response. This algorithm allows a supervisor to train each manipulator iteratively, thereby reducing supervisor burden and improving the quality of demonstrations. We present a web-based user study of 51 participants controlling two agents in a GridWorld environment with a keyboard interface. We confirm prior work that bilateral demonstrations are noisier and longer than demonstrations provided separately for either manipulator when the task is asymmetric. As unilateral demonstrations lack coordination, this paper proposes learning coordinated bilateral policies from unilateral demonstrations by rolling out an estimated robot policy for one arm while the human demonstrates for the other, iteratively updating the estimated policy. Compared to a bilateral demonstration baseline, BIBR improves the success rate of the learned policy from 29.17% to 55.55% in the asymmetric task in the first full round of demonstrations. Furthermore, these policies learn trajectories that have 8.63% fewer steps and smoother trajectories, i.e., have 44.15% fewer changes in direction.