No abstract available.
Selective Dyna-style planning under limited model capacity
In model-based reinforcement learning, planning with an imperfect model of the environment has the potential to harm learning progress. But even when a model is imperfect, it may still contain information that is useful for planning. In this paper, we ...
A distributional view on multi-objective policy optimization
- Abbas Abdolmaleki,
- Sandy H. Huang,
- Leonard Hasenclever,
- Michael Neunert,
- H. Francis Song,
- Martina Zambelli,
- Murilo F. Martins,
- Nicolas Heess,
- Raia Hadsell,
- Martin Riedmiller
Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their ...
Efficient optimistic exploration in linear-quadratic regulators via lagrangian relaxation
We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting. Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of OFU-LQ and ...
Super-efficiency of automatic differentiation for functions defined as a minimum
In min-min optimization or max-min optimization, one has to compute the gradient of a function defined as a minimum. In most cases, the minimum has no closed-form, and an approximation is obtained via an iterative algorithm. There are two usual ways of ...
A geometric approach to archetypal analysis via sparse projections
Archetypal analysis (AA) aims to extract patterns using self-expressive decomposition of data as convex combinations of extremal points (on the convex hull) of the data. This work presents a computationally efficient greedy AA (GAA) algorithm. GAA ...
Context-aware local differential privacy
Local differential privacy (LDP) is a strong notion of privacy that often leads to a significant drop in utility. The original definition of LDP assumes that all the elements in the data domain are equally sensitive. However, in many real-life ...
Efficient intervention design for causal discovery with latents
We consider recovering a causal graph in presence of latent variables, where we seek to minimize the cost of interventions used in the recovery process. We consider two intervention cost models: (1) a linear cost model where the cost of an intervention on ...
The neural tangent kernel in high dimensions: triple descent and a multi-scale theory of generalization
Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well. An ...
Rank aggregation from pairwise comparisons in the presence of adversarial corruptions
Rank aggregation from pairwise preferences has widespread applications in recommendation systems and information retrieval. Given the enormous economic and societal impact of these applications, and the consequent incentives for malicious players to ...
Boosting for control of dynamical systems
We study the question of how to aggregate controllers for dynamical systems in order to improve their performance. To this end, we propose a framework of boosting for online control. Our main result is an efficient boosting algorithm that combines weak ...
An optimistic perspective on offline reinforcement learning
Off-policy reinforcement learning (RL) using a fixed offline dataset of logged interactions is an important consideration in real world applications. This paper studies offline RL using the DQN Replay Dataset comprising the entire replay experience of a ...
Optimal bounds between f-divergences and integral probability metrics
The families of f-divergences (e.g. the Kullback-Leibler divergence) and Integral Probability Metrics (e.g. total variation distance or maximum mean discrepancies) are commonly used in optimization and estimation. In this work, we systematically study the ...
LazyIter: a fast algorithm for counting Markov equivalent DAGs and designing experiments
The causal relationships among a set of random variables are commonly represented by a Directed Acyclic Graph (DAG), where there is a directed edge from variable X to variable Y if X is a direct cause of Y. From the purely observational data, the true ...
Learning what to defer for maximum independent sets
Designing efficient algorithms for combinatorial optimization appears ubiquitously in various scientific fields. Recently, deep reinforcement learning (DRL) frameworks have gained considerable attention as a new approach: they can automate the design of a ...
Invariant risk minimization games
The standard risk minimization paradigm of machine learning is brittle when operating in environments whose test distributions are different from the training distribution due to spurious correlations. Training on data from many environments and finding ...
Why bigger is not always better: on finite and infinite neural networks
Recent work has argued that neural networks can be understood theoretically by taking the number of channels to infinity, at which point the outputs become Gaussian process (GP) distributed. However, we note that infinite Bayesian neural networks lack a ...
Discriminative Jackknife: quantifying uncertainty in deep learning via higher-order influence functions
Deep learning models achieve high predictive accuracy across a broad spectrum of tasks, but rigorously quantifying their predictive uncertainty remains challenging. Usable estimates of predictive uncertainty should (1) cover the true prediction targets ...
Frequentist uncertainty in recurrent neural networks via blockwise influence functions
Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data. Yet, when using RNNs to inform decision-making, predictions by themselves are not sufficient--we also need estimates of predictive uncertainty. Existing ...
Random extrapolation for primal-dual coordinate descent
We introduce a randomly extrapolated primal-dual coordinate descent method that adapts to sparsity of the data matrix and the favorable structures of the objective function. Our method updates only a subset of primal and dual variables with sparse data, ...
A new regret analysis for Adam-type algorithms
In this paper, we focus on a theory-practice gap for Adam and its variants (AMSGrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter β1 (typically between 0.9 and 0.99). In theory, regret guarantees for ...
Restarted Bayesian online change-point detector achieves optimal detection delay
In this paper, we consider the problem of sequential change-point detection where both the changepoints and the distributions before and after the change are assumed to be unknown. For this problem of primary importance in statistical and sequential ...
Maximum likelihood with bias-corrected calibration is hard-to-beat at label shift adaptation
Label shift refers to the phenomenon where the prior class probability p(y) changes between the training and test distributions, while the conditional probability p(x|y) stays fixed. Label shift arises in settings like medical diagnosis, where a ...
The implicit regularization of stochastic gradient flow for least squares
We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic ...
Structural language models of code
We address the problem of any-code completion - generating a missing piece of source code in a given program without any restriction on the vocabulary or structure. We introduce a new approach to any-code completion that leverages the strict syntax of ...
LowFER: low-rank bilinear pooling for link prediction
Knowledge graphs are incomplete by nature, with only a limited number of observed facts from the world knowledge being represented as structured relations between entities. To partly address this issue, an important task in statistical relational learning ...
Discount factor as a regularizer in reinforcement learning
Specifying a Reinforcement Learning (RL) task involves choosing a suitable planning horizon, which is typically modeled by a discount factor. It is known that applying RL algorithms with a lower discount factor can act as a regularizer, improving ...
Neuro-symbolic visual reasoning: disentangling "visual" from "reasoning"
Visual reasoning tasks such as visual question answering (VQA) require an interplay of visual perception with reasoning about the question semantics grounded in perception. However, recent advances in this area are still primarily driven by perception ...
The differentiable cross-entropy method
We study the cross-entropy method (CEM) for the non-convex optimization of a continuous and parameterized objective function and introduce a differentiable variant that enables us to differentiate the output of CEM with respect to the objective function's ...
Customizing ML predictions for online algorithms
A popular line of recent research incorporates ML advice in the design of online algorithms to improve their performance in typical instances. These papers treat the ML algorithm as a black-box, and redesign online algorithms to take advantage of ML ...
Fairwashing explanations with off-manifold detergent
Explanation methods promise to make black-box classifiers more transparent. As a result, it is hoped that they can act as proof for a sensible, fair and trustworthy decision-making process of the algorithm and thereby increase its acceptance by the end-...
Cited By
- Ramesh Kumar K and Tene M (2024). Algebraic multiscale grid coarsening using unsupervised machine learning for subsurface flow simulation, Journal of Computational Physics, 496:C, Online publication date: 1-Jan-2024.
- Richardson B, Sattigeri P, Wei D, Ramamurthy K, Varshney K, Dhurandhar A and Gilbert J Add-Remove-or-Relabel: Practitioner-Friendly Bias Mitigation via Influential Fairness Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, (736-752)
Index Terms
- Proceedings of the 37th International Conference on Machine Learning