Multilateral demonstrations can be difficult for human supervisors to proved because they require divided attention. We propose Bilateral Iterated Best Response (BIBR), a new algorithm that reduces supervisor burden by iteratively demonstrating each manipulator unilaterally while rolling out an estimated robot policy for the other manipulator. We present a web-based user study of a two-agent gridworld domain. We confirm prior work that bilateral demonstrations are noisier and longer when the task is asymmetric, and show that BIBR improves success rate in the asymmetric task, while learning policies that have shorter and smoother trajectories.
We consider the problem of controlling a linear system with Gaussian noise and quadratic cost (LQG), using a memoryless controller that has limited capacity of the channel connecting its sensor to its actuator. We formulate this setting as a sequential rate-distortion (SRD) problem, where we minimize the rate of information required for the controller’s operation, under a constraint on its external cost. We present the optimality principle, and study the interesting and useful phenomenology of the optimal controller, such as the principled reduction of its order.
We consider the case where the controller is retentive (memory-utilizing). We can view the memory reader as one more sensor, and the memory writer as one more actuator. We can then formulate the problem of control under communication limitations, again as a sequential rate-distortion (SRD) problem of minimizing the rate of information required for the controller’s operation, under a constraint on its external cost. We show that this problem can be reduced to the memoryless case, studied in Part I. We then further investigate the form of the resulting optimal solution, and demonstrate its interesting phenomenology.
We model the process by which the brain and the computer in a brain-computer interface (BCI) co-adapt to one another. We show in this simplified Linear-Quadratic-Gaussian (LQG) model how the brain’s neural encoding can adapt to the task of controlling the computer, at the same time that the computer’s adaptive decoder can adapt to the task of estimating the intention signal, leading to improvement in the system’s performance. We then propose an encoder-aware decoder adaptation scheme, which allows the computer to drive improvement forward faster by anticipating the brain’s adaptation.
* Equal contribution
We formulate the problem of optimizing an agent under both extrinsic and intrinsic constraints on its operation in a dynamical system and develop the main tools for solving it. We identify the challenging convergence properties of the optimization algorithm, such as the bifurcation structure of the update operator near phase transitions. We study the special case of linear-Gaussian dynamics and quadratic cost (LQG), where the optimal solution has a particularly simple and solvable form. We also explore the learning task, where the model of the world dynamics is unknown and sample-based updates are used instead.