We propose a two-phase method for learning safe control in fully controllable robotic manipulation tasks. In the first phase, the state-space support of supervisor demonstrations is estimated to infer implicit constraints. In the second phase, we present a switching policy to prevent the robot from leaving safe states. The policy switches between the robot’s learned policy and a novel failure-avoidance policy depending on the distance to the boundary of the support. We prove that inferred constraints are guaranteed to be enforced if the support is well-estimated. A simulated pushing task suggests that support estimation and failure avoidance control can reduce failures by 87% while sacrificing only 40% of performance. On a line tracking task using a da Vinci surgical robot, failure avoidance control reduced failures by 84%.

# Publications tagged "Continuous"

### Conferences

## Fast and Reliable Autonomous Surgical Debridement with Cable-Driven Robots Using a Two-Phase Calibration Procedure

We propose and evaluate a two-phase calibration process for cable-driven robots. In Phase I, the robot performs a set of open-loop trajectories and learns a nonlinear transformation from camera pixels to joint values. In Phase II, the robot uses Phase I to move systematically to target points in a printed array, where a human operator manually adjusts the end-effector position. Phase I can reduce average error from 4.55mm to 2.14mm, and together with Phase II to 1.08mm. We apply this combination to clear away raisins and pumpkin seeds with success rate up to 99.2%, exceeding prior results and more than twice faster.

## Robustly Adjusting Indoor Drip Irrigation Emitters with the Toyota HSR Robot

We explore how the Toyota HSR mobile manipulator robot can autonomously adjust the screw-cap on drip emitters for precision irrigation. As the built-in sensors do not provide sufficient accuracy for gripper alignment, we designed a lightweight, modular Emitter Localization Device (ELD) with cameras and LEDs that can be non-invasively mounted on the gripper. This paper presents details on the design, algorithms, and experiments in bringing the gripper into alignment with a sequence of 9 emitters using a two-phase procedure: 1) aligning the robot base using the built-in hand camera, and 2) aligning the gripper axis with the emitter axis using the ELD. Experiments suggest that each emitter can be reliably adjusted in under 20 seconds.

## DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations

Discovery of Deep Continuous Options (DDCO) learns from demonstrations low-level continuous control skills parametrized by deep neural networks. A hybrid categorical–continuous distribution model parametrizes high-level policies that can invoke discrete options as well continuous control actions, and a cross-validation method tunes the number of options to be discovered. We evaluate DDCO in simulation of a 3-link robot, and in two physical experiments on the da Vinci surgical robot.

* Equal contribution

## Statistical Data Cleaning for Deep Learning of Automation Tasks from Demonstrations

We explore how characterizing supervisor inconsistency and correcting for this noise can improve task performance with a limited budget of data. In a planar part extraction task where human operators provide demonstrations by teleoperating a 2DOF robot, CNN models perform better when trained after error corrections.

## Minimum-Information LQG Control — Part I: Memoryless Controllers

We consider the problem of controlling a linear system with Gaussian noise and quadratic cost (LQG), using a memoryless controller that has limited capacity of the channel connecting its sensor to its actuator. We formulate this setting as a sequential rate-distortion (SRD) problem, where we minimize the rate of information required for the controller’s operation, under a constraint on its external cost. We present the optimality principle, and study the interesting and useful phenomenology of the optimal controller, such as the principled reduction of its order.

## Minimum-Information LQG Control — Part II: Retentive Controllers

We consider the case where the controller is retentive (memory-utilizing). We can view the memory reader as one more sensor, and the memory writer as one more actuator. We can then formulate the problem of control under communication limitations, again as a sequential rate-distortion (SRD) problem of minimizing the rate of information required for the controller’s operation, under a constraint on its external cost. We show that this problem can be reduced to the memoryless case, studied in Part I. We then further investigate the form of the resulting optimal solution, and demonstrate its interesting phenomenology.

## A Multi-Agent Control Framework for Co-Adaptation in Brain-Computer Interfaces

We model the process by which the brain and the computer in a brain-computer interface (BCI) co-adapt to one another. We show in this simplified Linear-Quadratic-Gaussian (LQG) model how the brain’s neural encoding can adapt to the task of controlling the computer, at the same time that the computer’s adaptive decoder can adapt to the task of estimating the intention signal, leading to improvement in the system’s performance. We then propose an encoder-aware decoder adaptation scheme, which allows the computer to drive improvement forward faster by anticipating the brain’s adaptation.

* Equal contribution

### Theses

## Information-Theoretic Methods for Planning and Learning in Partially Observable Markov Decision Processes

We formulate the problem of optimizing an agent under both extrinsic and intrinsic constraints on its operation in a dynamical system and develop the main tools for solving it. We identify the challenging convergence properties of the optimization algorithm, such as the bifurcation structure of the update operator near phase transitions. We study the special case of linear-Gaussian dynamics and quadratic cost (LQG), where the optimal solution has a particularly simple and solvable form. We also explore the learning task, where the model of the world dynamics is unknown and sample-based updates are used instead.