Jekyll2019-07-19T08:46:01+00:00https://roydfox.com/feed/publications.xmlRoy Fox | PublicationsRoy Foxroy.d.fox@gmail.comMulti-Task Learning via Task Multi-Clustering2019-06-15T00:00:00+00:002019-06-15T00:00:00+00:00https://roydfox.com/Yan2019Multi<p>Multi-task learning has the potential to facilitate learning of shared representations between tasks, leading to better task performance. Some sets of tasks are related, and can share many features that are useful latent representations for these tasks. Other sets of tasks are less related, possibly sharing some features, but also competing on the representational resources of shared parameters. We propose to discover how to share parameters between related tasks and split parameters between conflicting tasks, by learning a multi-clustering of the tasks. We present a mixture-of-experts model, where each cluster is an expert that extracts a feature vector from the input, and each task belongs to a set of clusters whose experts it can mix. In experiments on the CIFAR-100 MTL domain, multi-clustering outperforms a model that mixes all experts in accuracy and computation time. The results suggest that the performance of our method is robust to regularization that increases the model’s sparsity when sufficient data is available, and can benefit from sparser models as data becomes scarcer.</p>Roy Foxroy.d.fox@gmail.comMulti-task learning has the potential to facilitate learning of shared representations between tasks, leading to better task performance. Some sets of tasks are related, and can share many features that are useful latent representations for these tasks. Other sets of tasks are less related, possibly sharing some features, but also competing on the representational resources of shared parameters. We propose to discover how to share parameters between related tasks and split parameters between conflicting tasks, by learning a multi-clustering of the tasks. We present a mixture-of-experts model, where each cluster is an expert that extracts a feature vector from the input, and each task belongs to a set of clusters whose experts it can mix. In experiments on the CIFAR-100 MTL domain, multi-clustering outperforms a model that mixes all experts in accuracy and computation time. The results suggest that the performance of our method is robust to regularization that increases the model’s sparsity when sufficient data is available, and can benefit from sparser models as data becomes scarcer.Generalizing Robot Imitation Learning with Invariant Hidden Semi-Markov Models2018-12-09T00:00:00+00:002018-12-09T00:00:00+00:00https://roydfox.com/Tanwani2018Generalizing<p>Generalizing manipulation skills to new situations requires extracting invariant patterns from demonstrations. For example, the robot needs to understand the demonstrations at a higher level while being invariant to the appearance of the objects, geometric aspects of objects such as its position, size, orientation and viewpoint of the observer in the demonstrations. In this paper, we propose an algorithm that learns a joint probability density function of the demonstrations with invariant formulations of hidden semi-Markov models to extract invariant segments (also termed as sub-goals or options), and smoothly follow the generated sequence of states with a linear quadratic tracking controller. The algorithm takes as input the demonstrations with respect to different coordinate systems describing virtual landmarks or objects of interest with a task-parameterized formulation, and adapt the segments according to the environmental changes in a systematic manner. We present variants of this algorithm in latent space with low-rank covariance decompositions, semi-tied covariances, and non-parametric online estimation of model parameters under small variance asymptotics; yielding considerably low sample and model complexity for acquiring new manipulation skills. The algorithm allows a Baxter robot to learn a pick-and-place task while avoiding a movable obstacle based on only 4 kinesthetic demonstrations.</p>Roy Foxroy.d.fox@gmail.comGeneralizing manipulation skills to new situations requires extracting invariant patterns from demonstrations. For example, the robot needs to understand the demonstrations at a higher level while being invariant to the appearance of the objects, geometric aspects of objects such as its position, size, orientation and viewpoint of the observer in the demonstrations. In this paper, we propose an algorithm that learns a joint probability density function of the demonstrations with invariant formulations of hidden semi-Markov models to extract invariant segments (also termed as sub-goals or options), and smoothly follow the generated sequence of states with a linear quadratic tracking controller. The algorithm takes as input the demonstrations with respect to different coordinate systems describing virtual landmarks or objects of interest with a task-parameterized formulation, and adapt the segments according to the environmental changes in a systematic manner. We present variants of this algorithm in latent space with low-rank covariance decompositions, semi-tied covariances, and non-parametric online estimation of model parameters under small variance asymptotics; yielding considerably low sample and model complexity for acquiring new manipulation skills. The algorithm allows a Baxter robot to learn a pick-and-place task while avoiding a movable obstacle based on only 4 kinesthetic demonstrations.Hierarchical Imitation Learning via Variational Inference of Control Programs2018-12-08T03:00:00+00:002018-12-08T03:00:00+00:00https://roydfox.com/Fox2018Hierarchical<p>Autonomous controllers can be trained by imitation learning from demonstrations of the intended control. Hierarchical imitation learning in the parametrized hierarchical procedures (PHP) framework can reduce the required number of demonstrations by allowing each procedure to specialize in specific behavior and abstract away from transient state features. We propose a variational inference method for discovering the latent hierarchical structure in observation–action traces of teacher demonstrations. We train an inference model to approximate the posterior distribution of the latent call-stack of hierarchical procedures, and sample from it to guide the training of the hierarchical controller. Our method requires 40 demonstrations, less than half as many as end-to-end RNN training, to achieve 88% success rate in training the BubbleSort algorithm.</p>Roy Foxroy.d.fox@gmail.comAutonomous controllers can be trained by imitation learning from demonstrations of the intended control. Hierarchical imitation learning in the parametrized hierarchical procedures (PHP) framework can reduce the required number of demonstrations by allowing each procedure to specialize in specific behavior and abstract away from transient state features. We propose a variational inference method for discovering the latent hierarchical structure in observation–action traces of teacher demonstrations. We train an inference model to approximate the posterior distribution of the latent call-stack of hierarchical procedures, and sample from it to guide the training of the hierarchical controller. Our method requires 40 demonstrations, less than half as many as end-to-end RNN training, to achieve 88% success rate in training the BubbleSort algorithm.An Empirical Exploration of Gradient Correlations in Deep Learning2018-12-08T02:00:00+00:002018-12-08T02:00:00+00:00https://roydfox.com/Rothchild2018Empirical<p>We introduce the mean and RMS dot product between normalized gradient vectors as tools for investigating the structure of loss functions and the trajectories followed by optimizers. We show that these quantities are sensitive to well-understood properties of the optimization algorithm, and we argue that investigating these quantities in detail can provide insight into properties that are less well understood. Using these tools, we observe that variance in the gradients of the loss function can be mostly explained by a small number of dimensions, and we compare results when training networks within the subspace spanned by the first few gradients to those obtained by training within a randomly chosen subspace.</p>Roy Foxroy.d.fox@gmail.comWe introduce the mean and RMS dot product between normalized gradient vectors as tools for investigating the structure of loss functions and the trajectories followed by optimizers. We show that these quantities are sensitive to well-understood properties of the optimization algorithm, and we argue that investigating these quantities in detail can provide insight into properties that are less well understood. Using these tools, we observe that variance in the gradients of the loss function can be mostly explained by a small number of dimensions, and we compare results when training networks within the subspace spanned by the first few gradients to those obtained by training within a randomly chosen subspace.Neural Inference of API Functions from Input–Output Examples2018-12-08T01:00:00+00:002018-12-08T01:00:00+00:00https://roydfox.com/Bavishi2018Neural<p>Because of the prevalence of APIs in modern software development, an automated interactive code discovery system to help developers use these APIs would be extremely valuable. Program synthesis is a promising method to build such a system, but existing approaches focus on programs in domain-specific languages with much fewer functions than typically provided by an API. In this paper we focus on 112 functions from the Python pandas library for DataFrame manipulation, an order of magnitude more than considered in prior approaches. To assess the viability of program synthesis in this domain, our first goal is a system that reliably synthesizes programs with a single library function. We introduce an encoding of structured input–output examples as graphs that can be fed to existing graph-based neural networks to infer the library function. We evaluate the effectiveness of this approach on synthesized and real-world I/O examples, finding programs matching the I/O examples for 97% of both our validation set and cleaned test set.</p>Roy Foxroy.d.fox@gmail.comBecause of the prevalence of APIs in modern software development, an automated interactive code discovery system to help developers use these APIs would be extremely valuable. Program synthesis is a promising method to build such a system, but existing approaches focus on programs in domain-specific languages with much fewer functions than typically provided by an API. In this paper we focus on 112 functions from the Python pandas library for DataFrame manipulation, an order of magnitude more than considered in prior approaches. To assess the viability of program synthesis in this domain, our first goal is a system that reliably synthesizes programs with a single library function. We introduce an encoding of structured input–output examples as graphs that can be fed to existing graph-based neural networks to infer the library function. We evaluate the effectiveness of this approach on synthesized and real-world I/O examples, finding programs matching the I/O examples for 97% of both our validation set and cleaned test set.Constraint Estimation and Derivative-Free Recovery for Robot Learning from Demonstrations2018-08-22T00:00:00+00:002018-08-22T00:00:00+00:00https://roydfox.com/Lee2018Constraint<p>Learning from human demonstrations can facilitate automation but is risky because the execution of the learned policy might lead to collisions and other failures. Adding explicit constraints to avoid unsafe states is generally not possible when the state representations are complex. Furthermore, enforcing these constraints during execution of the learned policy can be challenging in environments where dynamics are difficult to model such as push mechanics in grasping. In this paper, we propose Derivative-Free Recovery (DFR), a two-phase method for generating robust policies from demonstrations in robotic manipulation tasks where the system comes to rest at each time step. In the first phase, we use support estimation of supervisor demonstrations and treat the support as implicit constraints on states. We also propose a time-varying modification for sequential tasks. In the second phase, we use this support estimate to derive a switching policy that employs the learned policy in the interior of the support and switches to a recovery policy to steer the robot away from the boundary of the support if it drifts too close. We present additional conditions, which linearly bound the difference in state at each time step by the magnitude of control, allowing us to prove that the robot will not violate the constraints using the recovery policy. A simulated pushing task in MuJoCo suggests that DFR can reduce collisions by 83%. On a physical line tracking task using a da Vinci Surgical Robot and a moving Stewart platform, DFR reduced collisions by 84%.</p>Roy Foxroy.d.fox@gmail.comLearning from human demonstrations can facilitate automation but is risky because the execution of the learned policy might lead to collisions and other failures. Adding explicit constraints to avoid unsafe states is generally not possible when the state representations are complex. Furthermore, enforcing these constraints during execution of the learned policy can be challenging in environments where dynamics are difficult to model such as push mechanics in grasping. In this paper, we propose Derivative-Free Recovery (DFR), a two-phase method for generating robust policies from demonstrations in robotic manipulation tasks where the system comes to rest at each time step. In the first phase, we use support estimation of supervisor demonstrations and treat the support as implicit constraints on states. We also propose a time-varying modification for sequential tasks. In the second phase, we use this support estimate to derive a switching policy that employs the learned policy in the interior of the support and switches to a recovery policy to steer the robot away from the boundary of the support if it drifts too close. We present additional conditions, which linearly bound the difference in state at each time step by the magnitude of control, allowing us to prove that the robot will not violate the constraints using the recovery policy. A simulated pushing task in MuJoCo suggests that DFR can reduce collisions by 83%. On a physical line tracking task using a da Vinci Surgical Robot and a moving Stewart platform, DFR reduced collisions by 84%.Imitation Learning of Hierarchical Programs via Variational Inference2018-07-15T02:00:00+00:002018-07-15T02:00:00+00:00https://roydfox.com/Fox2018Imitation<p>The design of controllers that operate in dynamical systems to perform specified tasks has traditionally been manual. Machine learning algorithms enable data-driven generation of controllers, also called policies or programs, and differ in how a user may convey what task the controller should perform. In Imitation Learning (IL), the user demonstrates a supervisor control signal in a set of execution traces, and the objective is to train from this data a controller that performs the computation correctly on unseen inputs.</p>
<p>This paper takes a hierarchical imitation learning approach to program synthesis. We model the controller as a set of Parametrized Hierarchical Procedures (PHPs) (Fox et al., 2018), each of which can invoke a sub-procedure, take a control action, or terminate and return to its caller. The PHP model maintains a similar call-stack to that of Neural Programmers–Interpreters (NPI) (Reed & De Freitas, 2015; Li et al., 2016), but makes discrete and interpretable procedure calls, rather than smoothing over continuous values.</p>
<p>We consider a dataset of weakly supervised demonstrations, in which the user provides observation–action trajectories executed by an expert controller. Inferring hierarchical structure from weak supervision in which the structure is never observed is challenging, particularly with deep multi-level hierarchies. To facilitate discovery of the structure we augment the dataset with strongly supervised demonstrations of not only the control actions to take, but also the internal structure of the program flow that led to these actions. Training from a mixture of strongly and weakly supervised trajectories can discover highly informative structures from few strongly supervised trajectories, and leverage these structures in learning models with good generalization from a larger amount of weakly supervised trajectories.</p>
<p>We propose to use autoencoders to train hierarchical procedures from weakly supervised data, a method that showed success in unsupervised learning (Baldi, 2012; Kingma & Welling, 2013; Rezende et al., 2014; Zhang et al., 2017). We present a novel method of stochastic variational inference (SVI), where a hierarchical inference model is trained to approximate the posterior distribution of the latent hierarchical structure given the observable traces, and used to impute the call-stack of hierarchical procedures and guide the training of the generative model, i.e. the procedures themselves.</p>
<p>The contributions of this paper are: (1) extending the PHP model with procedures that take arguments; (2) a hierarchical variational inference method for training PHPs from weak supervision. Our method generalizes Stochastic Recurrent Neural Networks (SRNNs) (Fraccaro et al., 2016) to hierarchical controllers. Compared to level-wise hierarchical training via the Expectation–Gradient (EG) method (Fox et al., 2018), our SVI approach applies to deeper hierarchies and to procedures that take arguments.</p>Roy Foxroy.d.fox@gmail.comThe design of controllers that operate in dynamical systems to perform specified tasks has traditionally been manual. Machine learning algorithms enable data-driven generation of controllers, also called policies or programs, and differ in how a user may convey what task the controller should perform. In Imitation Learning (IL), the user demonstrates a supervisor control signal in a set of execution traces, and the objective is to train from this data a controller that performs the computation correctly on unseen inputs.Task-Relevant Embeddings for Robust Perception in Reinforcement Learning2018-07-15T01:00:00+00:002018-07-15T01:00:00+00:00https://roydfox.com/Liang2018Task<p>Reinforcement learning is unreasonably sample inefficient in many real-world visual domains, which require relatively simple control behaviors but pose challenging perception problems. We show that even simple visual noise added to common reinforcement learning benchmark environments can significantly degrade learning efficiency and break common approaches such as the use of autoencoders. We propose new methods for learning task-relevant state representations, and show that they can discover image embeddings that are significantly more effective when robust perception is required.</p>Roy Foxroy.d.fox@gmail.comReinforcement learning is unreasonably sample inefficient in many real-world visual domains, which require relatively simple control behaviors but pose challenging perception problems. We show that even simple visual noise added to common reinforcement learning benchmark environments can significantly degrade learning efficiency and break common approaches such as the use of autoencoders. We propose new methods for learning task-relevant state representations, and show that they can discover image embeddings that are significantly more effective when robust perception is required.RLlib: Abstractions for Distributed Reinforcement Learning2018-07-13T00:00:00+00:002018-07-13T00:00:00+00:00https://roydfox.com/Liang2018RLlib<p>Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available as part of the open source Ray project.</p>Roy Foxroy.d.fox@gmail.comReinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available as part of the open source Ray project.Robot Learning with Invariant Hidden Semi-Markov Models2018-06-30T00:00:00+00:002018-06-30T00:00:00+00:00https://roydfox.com/Tanwani2018Robot<p>Generalizing manipulation skills to new situations requires extracting invariant patterns from demonstrations. For example, the robot needs to understand the demonstrations at a higher level while being invariant to the appearance of the objects, geometric aspects of objects such as its position, size, orientation, viewpoint of the observer in the demonstrations. In this paper, we learn a joint probability density function of the demonstrations with invariant formulations of hidden semi-Markov model, and smoothly follow the generated sequence of states with a linear quadratic tracking controller. We present parsimonious and Bayesian non-parametric online learning formulations of the HSMM to exploit the invariant segments (also termed as sub-goals, options or actions) in the demonstrations and adapt the movement in accordance with the external environmental situations such as size, position and orientation of the objects in the environment using a task-parameterized formulation. We show an application of robot learning from demonstrations in picking and placing an object while avoiding a moving obstacle.</p>Roy Foxroy.d.fox@gmail.comGeneralizing manipulation skills to new situations requires extracting invariant patterns from demonstrations. For example, the robot needs to understand the demonstrations at a higher level while being invariant to the appearance of the objects, geometric aspects of objects such as its position, size, orientation, viewpoint of the observer in the demonstrations. In this paper, we learn a joint probability density function of the demonstrations with invariant formulations of hidden semi-Markov model, and smoothly follow the generated sequence of states with a linear quadratic tracking controller. We present parsimonious and Bayesian non-parametric online learning formulations of the HSMM to exploit the invariant segments (also termed as sub-goals, options or actions) in the demonstrations and adapt the movement in accordance with the external environmental situations such as size, position and orientation of the objects in the environment using a task-parameterized formulation. We show an application of robot learning from demonstrations in picking and placing an object while avoiding a moving obstacle.