research-article

Open access

Optimal Auctions through Deep Learning: Advances in Differentiable Economics

Authors:

Paul Dütting,

Zhe Feng,

Harikrishna Narasimhan,

David C. Parkes,

Sai Srivatsa RavindranathAuthors Info & Claims

Journal of the ACM, Volume 71, Issue 1

Article No.: 5, Pages 1 - 53

https://doi.org/10.1145/3630749

Published: 11 February 2024 Publication History

PDF eReader

Abstract

Designing an incentive compatible auction that maximizes expected revenue is an intricate task. The single-item case was resolved in a seminal piece of work by Myerson in 1981, but more than 40 years later, a full analytical understanding of the optimal design still remains elusive for settings with two or more items. In this work, we initiate the exploration of the use of tools from deep learning for the automated design of optimal auctions. We model an auction as a multi-layer neural network, frame optimal auction design as a constrained learning problem, and show how it can be solved using standard machine learning pipelines. In addition to providing generalization bounds, we present extensive experimental results, recovering essentially all known solutions that come from the theoretical analysis of optimal auction design problems and obtaining novel mechanisms for settings in which the optimal mechanism is unknown.

1 Introduction

Optimal auction design is one of the cornerstones of economic theory. It is of great practical importance as auctions are used across industries and in the public sector to organize the sale of products and services. Concrete examples are the U.S. FCC Incentive Auction, the sponsored search auctions conducted by search engines such as Google, and the auctions run on platforms such as eBay. In the standard independent private valuations model, each bidder has a valuation function over subsets of items drawn independently from not necessarily identical distributions. It is assumed that the auctioneer knows the value distributions and can use this information in designing the auction. A challenge is that valuations are private, and bidders may not report their valuations truthfully.

In a seminal piece of work, Myerson [1981] resolved the optimal auction design problem when there is a single item for sale. Today, after 40 years of intense research, there are some elegant partial characterizations [Manelli and Vincent 2006; Pavlov 2011; Haghpanah and Hartline 2019; Giannakopoulos and Koutsoupias 2018; Daskalakis et al. 2017; Yao 2017], but the analytical problem of optimal design is not completely resolved even for a setting with two bidders and two items. At the same time, there have been impressive algorithmic advances [Cai et al. 2012b;, 2012a;, 2013; Hart and Nisan 2017; Babaioff et al. 2014; Yao 2015; Cai and Zhao 2017; Chawla et al. 2010], although most of them apply to the weaker notion of Bayesian Incentive Compatibility (BIC). Our focus in this article is on auctions that satisfy Dominant-Strategy Incentive Compatibility (DSIC), which is a more robust and desirable notion of Incentive Compatibility (IC).

A recent line of work has started to bring in tools from machine learning and computational learning theory to design auctions from samples of bidder valuations. Much of the effort has focused on analyzing the sample complexity of designing revenue-maximizing auctions [Cole and Roughgarden 2014; Mohri and Medina 2016; Huang et al. 2018; Morgenstern and Roughgarden 2015; Gonczarowski and Nisan 2017; Morgenstern and Roughgarden 2016; Syrgkanis 2017; Gonczarowski and Weinberg 2018; Balcan et al. 2016]. A handful of works has leveraged machine learning pipelines to optimize different aspects of mechanisms [Lahaie 2011; Dütting et al. 2014; Narasimhan et al. 2016], but none of these provide the generality and the flexibility of our approach. There have also been other computational approaches to auction design under the research program of automated mechanism design [Conitzer and Sandholm 2002;, 2004; Sandholm and Likhodedov 2015] (to which the present work contributes), but where scalable, they are limited to specialized classes of auctions that are already known to be IC.

1.1 Our Contribution

In this work, we provide the first, general-purpose, end-to-end approach for solving the multi-item optimal auction design problem. We use multi-layer neural networks to encode the rules of auction mechanisms, with bidder valuations comprising the input to the network and allocation and payments comprising the output of the network. We train these neural networks using samples from bidder value distributions and seek to maximize expected revenue subject to constraints for IC. We refer to the overarching framework as that of differentiable economics, which references the idea of making use of differentiable representations of economic rules. In this way, we can use Stochastic Gradient Descent (SGD) for economic design, building on what is a very successful pipeline for deep learning.

The central technical challenge in this work is to achieve IC so that bidders will report true valuations in the equilibrium of the auction. We propose two different approaches to handling IC constraints. In the first, we leverage characterization results for IC mechanisms and constrain the network architecture appropriately. In the case of single-bidder settings, we show how to make use of menu-based characterizations, which correspond to DSIC mechanisms. We refer to this architecture as RochetNet, reflecting in its naming a connection with a characterization due to Rochet [1987].

The second approach replaces the IC constraints with the requirement of zero expected ex post regret, which is equivalent to DSIC up to measure zero events. For this, we make use of augmented Lagrangian optimization during training, which has the effect of introducing into the loss function penalty terms that correspond to violations of IC. In this way, we minimize during training a combination of negated revenue and a penalty term for IC violations. We refer to this neural network architecture as RegretNet. This approach is applicable to multi-bidder multi-item settings for which we do not have tractable characterizations of IC mechanisms, but will generally only find mechanisms that are approximately IC.

We show through extensive experiments that these two approaches are capable of recovering the designs of essentially all auctions for which theoretical solutions have been developed over the past 40 years, and in the case of RegretNet, we show that the degree of approximation to DSIC is very good. To get additional insights into the approximation to DSIC achieved by our framework, we report the estimated quantiles of ex post regret ($90\%, 95\%,$ and $99\%$ quantiles) in addition to estimated expected ex post regret. We also demonstrate that this deep learning framework is a useful tool for refuting hypotheses or generating supporting evidence in regard to the conjectured structure of optimal auctions, and in the case of RochetNet, this framework can be used to discover designs that can then be proved to be optimal. We also give generalization bounds that provide confidence intervals on the expected revenue and expected ex post regret in terms of the empirical revenue and empirical regret achieved during training, the descriptive complexity of the neural network used to encode the allocation and payment rules, and the number of samples used to train the network.

1.2 Discussion

The original work on automated mechanism design (AMD) framed the problem as a Linear Program (LP) [Conitzer and Sandholm 2002;, 2004]; however, this has severe scalability issues as the formulation scales exponentially in the number of agents and items [Guo and Conitzer 2010]. We provide a detailed comparison with an LP-based framework, and find that even for a small setting with two bidders and two items (and a discretization of bidder values into 11 bins per item), the corresponding LP takes 62 hours to complete since the LP needs to handle $\approx 9 \times 10^5$ decision variables and $\approx 3.6 \times 10^6$ constraints.

In comparison, differentiable economics leverages the expressive power of neural networks and the ability to enforce complex constraints using a standard machine learning pipeline. This provides for optimization over a broad class of mechanisms without needing to resort to a discretized function representation and is constrained only by the expressivity of the neural network architecture. In fact, we show that a simple, fully connected architecture can yield effective results in auction design, given the appropriate output representation. For the same setting, our approach finds an auction with low regret in just over 3.7 hours (see Figure 13). Moreover, the LP-based approach fails to scale much beyond this point while the neural network based approach continues to scale.

The optimization problems studied here are non-convex, and gradient-based approaches may, in general, get stuck in local optima. Empirically, however, this has not been an obstacle to the successful application of deep learning in other problem domains, and there is theoretical support for a “no local optima” phenomenon (see, e.g., [Choromanska et al. 2015; Kawaguchi 2016; Du et al. 2019; Allen-Zhu et al. 2019]). We make similar observations for our experiments: our neural network architectures recover optimal solutions wherever known, despite the formulation being non-convex. This work also demonstrates how menu-based architectures, as exemplified by RochetNet, can integrate seamlessly into neural network approaches and be used to discover new analytical results (see Section 5.5, where we use computational results to conjecture the analytical structure of an optimal design and duality theory to verify its optimality).

In the case of RegretNet, our framework only provides a guarantee of approximate DSIC. In this regard, we work with expected ex post regret, which is a quantifiable relaxation of DSIC that was first introduced in the work of Dütting et al. [2014]. An essential aspect is that it quantifies the regret to bidders for truthful bidding given knowledge of the bids of others (hence “ex post”) and thus is a quantity that measures the degree of approximation to DSIC. Indeed, our experiments suggest that this relaxation is a very effective tool for approximating optimal DSIC auctions, with RegretNet attaining a very good fit to known theoretical results.

Another widely used definition of approximate IC is approximate BIC. In general, approximate expected ex post regret is not comparable in terms of strength of incentives with approximate BIC. However, Conitzer et al. [2022] show that in the special case of a uniform valuation distribution, there exists a transformation from a mechanism with $\varepsilon$-expected ex post regret to a fully BIC mechanism, without any loss of social welfare and with additive and negligible revenue loss. Moreover, they show that it is impossible to perform such a transformation for the non-uniform valuation distribution setting if we want to achieve at least the same welfare together with negligible revenue loss. For the revenue objective, there exists a line of research on transforming an $\varepsilon$-BIC mechanism to a fully BIC mechanism with only negligible revenue loss [Daskalakis and Weinberg 2012; Rubinstein and Weinberg 2018; Cai et al. 2021]. It remains an open question whether, for general valuation distributions, we can provide a transformation from a mechanism with $\varepsilon$-expected ex post regret to a fully DSIC mechanism with negligible loss in revenue.

1.3 Further Related Work

Since the first version of this article, there has been considerable follow-up work on the topic of differentiable economics, extending the approach to budget-constrained bidders [Feng et al. 2018], applying specialized architectures for single-bidder settings and using them to derive new analytical results [Shen et al. 2019], minimizing agent payments [Tacchetti et al. 2022], applying to multi-facility location problems [Golowich et al. 2018], applying to two-sided matching [Ravindranath et al. 2021; Feng et al. 2022], incorporating human preferences [Peri et al. 2021], balancing fairness and revenue [Kuo et al. 2020], providing certificates of strategy-proofness [Curry et al. 2020], requiring complete allocations [Curry et al. 2022], developing permutation-equivariant architectures [Rahme et al. 2021b], formulating the problem as a two-player game between a designer and an adversary [Rahme et al. 2021a], using context-integrated transformer-based neural network architecture for contextual auction design [Duan et al. 2022], and using the attention mechanism through transformers for optimal auctions design [Ivanov et al. 2022]. Stein et al. [2023] propose a method to incorporate a privacy guarantee in regard to the bidders’ information into RegretNet. There has also been follow-up work on deriving sample complexity bounds for learning a Nash equilibrium [Duan et al. 2023a], using tools similar to the ones we use for our generalization bounds. Beyond RegretNet, there is also follow-up work on extending the menu-based RochetNet framework to Affine Maximizer Auction (AMA) settings [Curry et al. 2023; Duan et al. 2023b].

More recent work has adopted differentiable approaches for the design of taxation policies [Zheng et al. 2022], indirect auctions [Shen et al. 2020; Brero et al. 2021b;, 2021a], mitigations to price collusion [Brero et al. 2022], game design [Balaguer et al. 2022], and the study of platform economies [Wang et al. 2023]. Deep learning has also been used to study other problems within the field of economics, for example, using neural networks to predict the behavior of human participants in strategic scenarios [Hartford et al. 2016; Fudenberg and Liang 2019; Peterson et al. 2021], to provide an automated equilibrium analysis of mechanisms [Thompson et al. 2017], for causal inference [Hartford et al. 2017; Louizos et al. 2017], and for solving for the equilibria of Stackelberg games [Wang et al. 2022], pseudo games [Goktas et al. 2023], symmetric auction games [Bichler et al. 2021], asymmetric auctions games [Bichler et al. 2023], and combinatorial games [Raghu et al. 2018; Heidekrüger et al. 2021]. The research described here also relates to the method of empirical mechanism design [Areyan Viqueira et al. 2019; Vorobeychik et al. 2012;, 2006; Brinkman and Wellman 2017], which applies empirical game theory to mechanism design, using empirical game theory to search for the equilibria of induced games by building out a suitable set of candidate strategies [Jordan et al. 2010; Kiekintveld and Wellman 2008; Wellman 2006] (see also more recent work on policy-space response oracles [Lanctot et al. 2017]).

1.4 Organization

The rest of the article is organized as follows. Section 2 formulates the auction design problem as a learning problem, introduces the characterization-based and characterization-free approaches, and gives the main generalization bounds. Section 3 introduces the network architectures of RochetNet and RegretNet, and instantiates the specific generalization bound for these networks. Section 4 describes the training and optimization procedures, and Section 5 presents extensive experimental results including experiments that provide support for theoretical conjectures in regard to the design of optimal auctions along with the discovery of new, provably optimal auction designs. Section 6 concludes.

2 Auction Design as A Learning Problem

2.1 Preliminaries

We consider a setting with a set of n bidders $N= \lbrace 1,\ldots , n\rbrace$ and m items $M =\lbrace 1,\ldots ,m\rbrace$. Each bidder i has a valuation function $v_i: 2^M \rightarrow \mathbb {R}_{\ge 0}$, where $v_i(S)$ denotes the bidder’s value of the subset of items $S \subseteq M$.

In the simplest case, a bidder may have additive valuations, with a value $v_i(\lbrace j\rbrace)$ for each item $j \in M$, and a value for a subset of items $S \subseteq M$ that is $v_i(S) = \sum _{j \in S} v_i(\lbrace j\rbrace)$. Alternatively, if a bidder’s value for a subset of items $S\subseteq M$ is $v_i(S) = \max _{j\in S} v_i(\lbrace j\rbrace)$, the bidder has a unit-demand valuation. We also consider bidders with general combinatorial valuations but defer the details to Appendices A.2 and B.3.

Bidder i’s valuation function is drawn independently from a distribution $F_i$ over possible valuation functions $V_i$. We write $v = (v_1, \dots , v_n)$ for a profile of valuations and denote $V=\prod _{i = 1}^{n} V_i$. The auctioneer knows the distributions $F = (F_1, \dots , F_n)$ but does not know the bidders’ realized valuation v. The bidders report their valuations (perhaps untruthfully), and an auction decides on an allocation of items to the bidders and charges them payments.

We denote an auction $(g,p)$ as a pair of allocation rules $g_i: V \rightarrow 2^M$ and payment rules $p_i: V \rightarrow \mathbb {R}_{\ge 0}$ (these rules can be randomized). Given bids $b = (b_1, \dots ,b_n) \in V$, the auction computes an allocation $g(b)\in 2^M$ and payments $p(b)\in \mathbb {R}_{\ge 0}^n$.

A bidder with valuation $v_i$ receives utility $u_i(v_i; b) = v_i(g_i(b)) - p_i(b)$ at bid profile b. Let $v_{-i}$ denote the valuation profile $v=(v_1,\ldots ,v_n)$ without element $v_i$, similarly for $b_{-i}$, and let $V_{-i} = \prod _{j \ne i} V_j$ denote the possible valuation profiles of bidders other than bidder i. An auction is DSIC if each bidder’s utility is maximized by reporting truthfully no matter what the other bidders report. In other words, $u_i(v_i; (v_i,b_{-i})) \ge u_i(v_i; (b_i,b_{-i}))$ for every bidder i, every valuation $v_i \in V_i$, every bid $b_i \in V_i$, and all bids $b_{-i} \in V_{-i}$ from others. An auction is ex post Individually Rational (IR) if each bidder receives a non-zero utility when participating truthfully—that is, $u_i(v_i; (v_i,b_{-i})) \ge 0$ $\forall i \in N$, $v_i \in V_i$, and $b_{-i} \in V_{-i}$.

In a DSIC auction, it is in the best interest of each bidder to report truthfully, and so the equilibrium revenue on valuation profile v is simply $\sum _i p_i(v)$. Optimal auction design seeks to identify a DSIC auction that maximizes expected revenue.

There is also a weaker notion of IC, that being BIC. An auction is BIC if each bidder’s utility is maximized by reporting truthfully when the other bidders also report truthfully—that is, ${\mathbf {E}}_{v_{-i}}[u_i(v_i; (v_i,v_{-i}))] \ge {\mathbf {E}}_{v_{-i}}[u_i(v_i; (b_i,v_{-i}))]$ for every bidder i, every valuation $v_i \in V_i$, every bid $b_i \in V_i$. In this work, we focus on DSIC auctions rather than BIC auctions, as DSIC auctions are preferable in practice because truthful bidding remains an equilibrium without common knowledge of the distributions on valuations or common knowledge on rationality.

2.2 Formulation as a Learning Problem

We pose the problem of optimal auction design as a learning problem, where in the place of a loss function that measures error against a target label, we adopt as the loss function the negated, expected revenue on valuations drawn from F.

We are given a parametric class of auctions, $(g^w,p^w)\in \mathcal {M}$, for parameters $w\in \mathbb {R}^d$ for some $d\gt 0$, and a sample of L bidder valuation profiles $\mathcal {S}=\lbrace v^{(1)},\ldots ,v^{(L)}\rbrace$ drawn i.i.d. from F. There is no need to compute equilibrium inputs; rather, we sample true profiles and seek to learn the rules that are DSIC. The goal is to find an auction that minimizes the negated, expected revenue $-{\mathbf {E}}[\sum _{i\in N}p^w_i(v)]$, among all auctions in $\mathcal {M}$ that satisfies DSIC (or just IC). For a single-bidder setting, there is no difference between DSIC and BIC.

We present two approaches for achieving IC. In the first, we leverage a characterization result to constrain the search space so that all mechanisms within this class are IC. In the second, we replace the IC constraints with a differentiable approximation and move the constraints into the objective via the augmented Lagrangian method. The first approach affords a smaller search space and is exactly DSIC, but it only applies to single-bidder, multi-item settings. The second approach applies to multi-bidder, multi-item settings, but it entails searching through a larger parametric space and only achieves approximate IC.

In Appendix A.1, we describe a construction based on the characterization result of Myerson [1981] for multi-bidder, single-item settings, which we refer to as MyersonNet.

2.2.1 Characterization-Based Approach.

We begin by describing our first approach, which we refer to as RochetNet, in which we exploit a characterization of DSIC mechanisms to constrain the search space.

We describe the approach for additive valuations, but it can also be extended to unit demand valuations. For an additive valuation on m items, the utility function $u: {\mathbb {R}}_{\ge 0}^m \rightarrow {\mathbb {R}}$ induced for a single bidder by a mechanism $(g,p)$ is

\begin{equation} u(v) = \sum _{j=1}^m g_j(v)\,v_j \,-\, p(v), \end{equation}

(1)

where $g_j(v)\in \lbrace 0,1\rbrace$ indicates whether or not the bidder is assigned item j.

We can consider a menu of J choices, for some $J\ge 1$, where each choice consists of a possibly randomized allocation, together with a price. For choice $j\in [J]$, let $\alpha _j\in [0,1]^m$ specify the randomized allocation and parameter $\beta _j\in {\mathbb {R}}$ specify the negated price. By choosing the menu item that maximizes the bidder’s utility, or the null (no allocation, no payment) outcome when this is better, a menu of size J induces the following utility function:

\begin{equation} u(v) \,=\, \max \left\lbrace \max _{j \in [J]}\, \lbrace \alpha _j \cdot v \,+\, \beta _j\rbrace ,\, 0\right\rbrace . \end{equation}

(2)

The well-known taxation principle from mechanism design theory tells us that a mechanism that selects the menu choice that maximizes an agent’s reported utility, based on its bid $b\in {\mathbb {R}}^m$, is DSIC [Hammond 1979; Guesnerie 1995]. To see this, observe that the menu does not depend on the reports and that the an agent will maximize its utility by reporting its true valuation function so that the right choice is made on its behalf. Moreover, the taxation principle also tells us that the use of a menu is without loss of generality for DSIC mechanisms.

Based on this, for a given $J\ge 0$, we seek to learn a mechanism with parameters $w = (\alpha , \beta)$, where $\alpha \in [0,1]^{mJ}$ and $\beta \in {\mathbb {R}}^J$, to maximize the expected revenue ${\mathbf {E}}_{v \sim F}[\beta _{j^*(v)}]$, where $j^*(v) \in \text{argmax}_{j \in [J]} \lbrace \alpha _j \cdot v \,+\, \beta _j\rbrace$, and denotes the best choice for the bidder, where choice 0 corresponds to the null outcome. For a unit-demand bidder, the utility can also be represented via (1), with the additional constraint that $\sum _j g_j(v)\le 1, \forall v$. We discuss this more in Section 3.1.

We also have the following characterization of DSIC mechanisms for the single-bidder case.

Theorem 2.1 (Rochet [1987]).

The utility function $u: {\mathbb {R}}_{\ge 0}^m \rightarrow {\mathbb {R}}$ that is induced by a DSIC mechanism for a single bidder is 1-Lipschitz w.r.t. the $\ell _1$-norm, non-decreasing, and convex.

The convexity can be understood by recognizing that the induced utility function (2) is the maximum over a set of hyperplanes, each corresponding to a choice in the menu set. Figure 1 illustrates Rochet’s theorem for a single item ($m=1$) and a menu consisting of four choices ($J=4$). Here, the induced utility for choice j given bid $b\in {\mathbb {R}}$ is $h_j(b) \,=\, \alpha _j \cdot b \,+\, \beta _j$.

Fig. 1.

Given this, to find the optimal single-bidder auction we can search over a suitably sized menu set and pick the one that maximizes expected revenue. In Section 3.1, we explain how to achieve this by modeling the utility function as a neural network and formulating the preceding optimization as a differentiable learning problem.

2.2.2 Characterization-Free Approach.

Our second approach—which we refer to as RegretNet—does not require a characterization of IC. Instead, it replaces the IC constraints with a differentiable approximation and brings the IC constraints into the objective by augmenting the objective with a term that accounts for the extent to which the IC constraints are violated.

We measure the extent to which an auction violates IC through the following notion of ex post regret. Fixing the bids of others, the ex post regret for a bidder is the maximum increase in her utility, considering all possible non-truthful bids. For a mechanism ($g^{w}, p^{w}$), with parameters w, we will be interested in the expected ex post regret for bidder i:

\[\begin{eqnarray*} \mathit {rgt}_i(w) =~ \mathbf {E}\Big [\max _{v^{\prime }_i \in V_i}\,u^w_i(v_i; (v^{\prime }_i, v_{-i})) - u^w_i(v_i;(v_i, v_{-i}))\Big ], \end{eqnarray*}\]

where the expectation is over $v \sim F$ and $u^w_i(v_i;b) = v_i(g^w_i(b)) - p^w_i(b)$ for model parameters w.

We assume that F has full support on the space of valuation profiles V. Given this, and recognizing that the regret is non-negative, an auction satisfies DSIC if and only if $\mathit {rgt}_i(w) = 0, \forall i \in N$, except for measure zero events.¹

Given this, we reformulate the learning problem as one of minimizing the expected negated revenue subject to the expected ex post regret being zero for each bidder:

\[\begin{eqnarray*} \min _{w \in \mathbb {R}^d}\, &{\mathbf {E}}_{v\sim F}\bigg [-\displaystyle \sum _{i\in N}p^w_i(v)\bigg ]\\ \text{s.t.}\, &{rgt}_i(w) \,=\, 0, ~\forall i\in N. \end{eqnarray*}\]

Given a sample $\mathcal {S}$ of L valuation profiles from F, we estimate the empirical ex post regret for bidder i as

\begin{align} \widehat{\mathit {rgt}}_i(w) = & \frac{1}{L}\sum _{\ell =1}^L \Big [\max _{v^{\prime }_i \in V_i}\,u^w_i\big (v_i^{(\ell)}; \big (v^{\prime }_i, v^{(\ell)}_{-i}\big)\big) - u^w_i(v_i^{(\ell)}; v^{(\ell)})\Big ], \end{align}

(3)

and we seek to minimize the empirical loss (negated revenue) subject to the empirical regret being zero for all bidders and the following formulation:

\[\begin{eqnarray} \min _{w \in \mathbb {R}^d}\, &-\frac{1}{L}\displaystyle \sum _{\ell =1}^L \displaystyle \sum _{i=1}^n p^w_i(v^{(\ell)}) \nonumber \nonumber\\ \text{s.t.}\, &\widehat{rgt}_i(w) \,=\, 0, ~~\forall i\in N. \end{eqnarray}\]

(4)

We additionally require the auction to satisfy IR, which can be ensured by restricting the search space to a class of parameterized auctions that charge no bidder more than her valuation for an allocation.

In Section 3, we model the allocation and payment rules through a neural network, and incorporate the IR requirement within the architecture. In Section 4, we describe how the IC constraints can be incorporated into the objective using Lagrange multipliers so that the resulting neural net can be trained with standard pipelines.

2.3 Quantile-Based Regret

The intent is that the characterization-free approach leads to mechanisms with low expected ex post regret. By seeking to minimize the expected ex post regret, we can also obtain regret bounds of the form “the probability that the ex post regret is larger than x is at most q.” For this, we define quantile-based ex post regret.

Definition 2.1 (Quantile-Based Ex Post Regret).

For each bidder i, and q with $0\lt q\lt 1$, the q-quantile-based ex post regret, $\mathit {rgt}^q_i(w)$, induced by the probability distribution F on valuation profiles, is defined as the smallest x such that

\begin{align*} \mathbf {P}\left(\max _{v^{\prime }_i \in V_i}\,u^w_i(v_i; (v^{\prime }_i, v_{-i})) - u^w_i(v_i;(v_i, v_{-i})) {\;\ge x} \right) {\le q.} \end{align*}

We can bound the q-quantile based regret $\mathit {rgt}^q_i(w)$ by the expected ex post regret $\mathit {rgt}_i(w)$ as in the following lemma. The proof appears in Appendix D.1.

Lemma 2.1.

For any fixed q, $0 \lt q \lt 1$, and bidder i, we can bound the q-quantile-based ex post regret by

\begin{align*} \mathit {rgt}^q_i(w) \le \frac{\mathit {rgt}_i(w)}{q}. \end{align*}

Using this Lemma 2.1, we can show, for example, that when the expected ex post regret is 0.001, then the probability that the ex post regret exceeds 0.01 is at most $10\%$.

2.4 Generalization Bounds

We conclude this section with two generalization bounds. The first is a lower bound on the expected revenue in terms of the empirical revenue during training, the complexity (or capacity) of the auction class that we optimize over, and the number of sampled valuation profiles. The second is an upper bound on the expected ex post regret in terms of the empirical regret during training, the complexity (or capacity) of the auction class that we optimize over, and the number of sampled valuation profiles.

We measure the capacity of an auction class $\mathcal {M}$ using a definition of covering numbers from the ranking literature [Rudin and Schapire 2009]. For this, define the $\ell _{\infty ,1}$ distance between auctions $(g,p), (g^{\prime },p^{\prime }) \in {\mathcal {M}}$ as

\begin{equation*} {\max _{v \in V} \sum _{i \in N, j \in M}|g_{ij}(v) - g^{\prime }_{ij}(v)| + \sum _{i \in N} |p_i(v) - p^{\prime }_i(v)|.} \end{equation*}

For any $\epsilon \gt 0$, let ${\mathcal {N}}_\infty ({\mathcal {M}}, \epsilon)$ be the minimum number of balls of radius $\epsilon$ required to cover ${\mathcal {M}}$ under the $\ell _{\infty ,1}$ distance.

Theorem 2.2.

For each bidder i, assume that the valuation function $v_i$ satisfies $v_i(S) \le 1,\, \forall S \subseteq M$. Let ${\mathcal {M}}$ be a class of auctions that satisfy individual rationality. Fix $\delta \in (0,1)$. With probability at least $1-\delta$ over a draw of sample $\mathcal {S}$ of L profiles from F, for any $(g^w, p^w) \in {\mathcal {M}}$,

\[\begin{eqnarray*} &\ {{\mathbf {E}}_{v\sim F}\bigg [\sum _{i\in N}p^w_i(v)\bigg ]} { \,\ge \, \frac{1}{L}\sum _{\ell =1}^L \sum _{i=1}^n p^w_i(v^{(\ell)})} {\,-\, 2n{\Delta _{L}}} {\,-\, {Cn}\sqrt {\frac{\log (1/\delta)}{L}},} \\ \end{eqnarray*}\]

and

\[\begin{eqnarray*} \frac{1}{n}\sum _{i=1}^n rgt_i(w) \,\le \, \frac{1}{n}\sum _{i=1}^n \widehat{rgt}_i(w) \,+\, 2\Delta _{L} \,+\, C^{\prime }\sqrt {\frac{\log (1/\delta)}{L}}, \end{eqnarray*}\]

where $\Delta _{L}= \inf _{\epsilon \gt 0}\lbrace \tfrac{\epsilon }{n} + 2\sqrt {\tfrac{2\log ({\mathcal {N}}_\infty ({\mathcal {M}}, \,\tfrac{\epsilon }{2n}))}{L}}\rbrace$ and $C, C^{\prime }$ are distribution-independent constants.

See Appendix D.2 for the proof. If the term $\Delta _{L}$ in the above bound goes to zero as the sample size L increases, then the above bounds go to zero as $L \rightarrow \infty$. In Theorem 3.1 in Section 3, we bound $\Delta _{L}$ for the neural network architectures we present in this work.

3 Neural Network Architectures

We describe the RochetNet architecture for single-bidder, multi-item settings in Section 3.1, and the RegretNet architecture for multi-bidder, multi-item settings in Section 3.2. We focus on additive valuations and unit-demand valuations, and discuss how to extend the constructions to allow for combinatorial valuations in Appendix A.2.

3.1 The RochetNet Architecture

RochetNet operationalizes the idea of menu-based mechanisms through a suitable neural network architecture. We first describe the construction for additive valuations and then explain how to extend it to unit-demand valuations. The parameters correspond to a menu of J choices, where each choice $j\in [J]$ is associated with randomized allocation $\alpha _j\in [0,1]^m$ and negated price $\beta _j\in {\mathbb {R}}$ ($\beta _j$s will be negative, and the smaller the value of $\beta _j$, the larger the payment). The network selects the choice for the bidder that maximizes the bidder’s reported utility given its bid or chooses the null outcome (no allocation, no payment) when this is preferred. This ensures DSIC and IR.

Fig. 2.

The utility function, represented as a single-layer neural network, is illustrated in Figure 2, where each $h_j(b) \,=\, \alpha _j \cdot b \,+\, \beta _j$ for bid $b\in {\mathbb {R}}^m$. The input layer takes a bid $b\in {\mathbb {R}}^m$ and the output of the network is the induced utility. For input b, $j^\ast (b) \in \text{argmax}_{j \in [J]\cup \lbrace 0\rbrace } \lbrace \alpha _j \cdot b \,+\, \beta _j\rbrace$ denotes the best choice for the bidder, where choice 0 corresponds to $\alpha _0=0$ and $\beta _0=0$ and the null outcome. This best choice defines the allocation and payment rule: for bid b, the allocation is $g^w(b)=\alpha _{j^\ast (b)}$ and the payment is $p^w(b)=-\beta _{j^\ast (b)}$.

By using a large number of hyperplanes, one can use this neural network architecture to search over a sufficiently rich class of DSIC and IR auctions for the single-bidder, multi-item setting. Given the RochetNet construction, we seek to minimize the negated, expected revenue, ${\mathbf {E}}_{v \sim F}[\beta _{j^*(v)}]$. To ensure that the objective is a continuous function of parameters $\alpha$ and $\beta$, we adopt during training a softmax operation in place of the argmax and the following loss function:

\begin{equation} {\mathcal {L}(\alpha ,\beta) \,=\, -{\mathbf {E}}_{v \sim F}\left[\sum _{j\in [J]}\beta _j \widetilde{\nabla }_j(v)\right],} \end{equation}

(5)

where

\begin{equation} {\widetilde{\nabla }_j(v) \,=\, \mathrm{softmax}_j\big (\kappa \cdot (\alpha _1 \cdot v + \beta _1), \ldots , \kappa \cdot (\alpha _J \cdot v + \beta _J)\big)}, \end{equation}

(6)

and $\kappa \gt 0$ is a constant that controls the quality of the approximation. Here, the softmax function, $\mathrm{softmax}_j(\kappa x_1,\ldots ,\kappa x_J)=e^{\kappa x_j}/\sum _{j^{\prime }}e^{\kappa x_{j^{\prime }}}$, takes as input J real numbers and returns a probability distribution consisting of J probabilities, proportional to the exponential of the inputs. We only do this approximation during training and always use argmax during testing to guarantee the mechanism is DSIC.

During training, we seek to optimize the parameters of the neural network (i.e., $\alpha \in [0,1]^{mJ}$ and $\beta \in {\mathbb {R}}^{J}$) to minimize loss (5). For this, given a sample $\mathcal {S} =\lbrace v^{(1)}, \ldots , v^{(L)}\rbrace$ drawn from F, we use SGD to optimize an empirical version of the loss.

This approach easily extends to a single bidder with a unit-demand valuation. In this case, the new requirement is that the sum of the allocation probabilities cannot exceed 1. This can be enforced by restricting the coefficients for each hyperplane to sum up to at most 1 (i.e., $\sum _{k=1}^m \alpha _{jk} \le 1, \forall j\in [J]$ and $\alpha _{jk} \ge 0, \forall j\in J, k\in [m]$). To achieve this constraint, we can re-parameterize $\alpha _{jk}$ as $\mathrm{softmax}_k(\gamma _{j1}, \ldots , \gamma _{jm})$, where $\gamma _{jk}\in {\mathbb {R}}, \forall j\in J, k\in m$. With this restriction, the resulting mechanism is DSIC for unit-demand bidders since the selected menu choice corresponds a distribution over single-item allocations.²

3.2 The RegretNet Architecture

We next describe the architecture for the characterization-free RegretNet approach. In this case, we train a neural network that explicitly encodes a multi-bidder allocation and payment rule. The architecture consists of two logically distinct components that comprise part of a single network: the allocation component and the payment component. These are trained together as a single network, and the outputs of these networks are used to compute the regret and revenue, and thus quantities used by the loss function.

3.2.1 Additive Valuations.

Fig. 3.

An overview of the RegretNet architecture for additive valuations is given in Figure 3. The allocation component encodes a randomized allocation rule $g^w: {\mathbb {R}}^{nm} \rightarrow [0,1]^{nm}$ and the payment component encodes a payment rule $p^w: {\mathbb {R}}^{nm} \rightarrow {\mathbb {R}}_{\ge 0}^{n}$, both of which are modeled as feed-forward, fully connected networks with a tanh activation function in each of the hidden nodes. The input layer consists of bids $b_{ij} \ge 0$ representing the valuation of bidder i for item j.

The allocation component outputs a vector of allocation probabilities $z_{1j} = g_{1j}(b), \ldots , z_{nj} = g_{nj}(b)$, for each item $j\in [m]$. To ensure feasibility (i.e., that the probability of an item being allocated is at most 1), the allocations are computed using a softmax activation function so that for all items j, we have $\sum _{i=1}^n z_{ij} \le 1$. To accommodate the possibility of an item not being assigned, we include a dummy node in the softmax computation to hold the residual allocation probability. The payment component outputs a payment for each bidder that denotes the amount the bidder should pay in expectation for a particular bid profile.

To ensure that the auction satisfies ex post IR (i.e., does not charge a bidder more than her expected value for the allocation), the network first computes a normalized payment $\tilde{p}_i \in [0,1]$ for each bidder i using a sigmoidal unit, then outputs a payment $p_i = \tilde{p}_{i}(\sum _{j=1}^m z_{ij}\, b_{ij})$, where the $z_{ij}$’s are the outputs from the allocation component. This guarantees ex post IR, since the payment can be represented as a distribution over payments for each allocation in the support of the randomized allocation, where each payment is at most the bidder’s reported value for that allocation.

3.2.2 Unit-Demand Valuations.

For revenue maximization in this setting, it is sufficient to consider allocation rules that are one-to-one assignments, where each bidder is assigned at most one item. A simple reduction argument establishes that this restriction on assignments is without loss of generality: for any IC auction that allocates multiple items to a bidder, one can construct an IC auction with the same revenue by retaining only the most-preferred item among those allocated to the bidder.

Randomized allocation rules are lotteries over one-to-one assignments and can be represented by a doubly stochastic matrix (here, rather than summing to 1, we allow the sum of rows to be less than equal to 1, and the sum of columns to be less than or equal to 1). This representation as a doubly stochastic matrix is an easy corollary of Birkhoff [1946] and also a special case of the bihierarchy structure proposed in the work of Budish et al. [2013] (Theorem 1), which we state in Lemma 3.1 for completeness. Budish et al. [2013] also give a polynomial-time algorithm to decompose the doubly stochastic matrix into its constituent set of one-to-one assignments.

Lemma 3.1 ([Birkhoff 1946]).

Any doubly stochastic matrix $A \in \mathbb {R}^{n \times m}$ can be represented as a convex combination of matrices $B^1, \dots , B^k$ where each $B^\ell \in \lbrace 0,1\rbrace ^{n \times m}$ and $\sum _{j \in [m]} B_{ij} \le 1$, $\forall i \in [n]$ and $\sum _{i \in [n]} B_{ij} \le 1$, $\forall j \in [m]$.

Fig. 4.

The allocation component in RegretNet in the case of unit-demand bidders is the feed-forward network shown in Figure 4. To output a doubly stochastic matrix, the allocation component computes two sets of scores $s_{ij}$’s and $s^{\prime }_{ij}$’s. Let s, $s^{\prime } \in {\mathbb {R}}^{nm}$ denote the corresponding matrices. The first set of scores is normalized along the rows, and the second set of scores is normalized along the columns. Both normalizations can be performed by passing these scores through softmax functions. The allocation for bidder i and item j is then computed as the minimum of the corresponding normalized scores:

\[\begin{eqnarray*} z_{ij} \,=\,\varphi ^{DS}_{ij}(s, s^{\prime }) \,=\, \min \bigg \lbrace \frac{e^{s_{ij}}}{\sum _{k=1}^{n+1} e^{s_{kj}}},\,\frac{e^{s^{\prime }_{ij}}}{\sum _{k=1}^{m+1} e^{s^{\prime }_{ik}}}\bigg \rbrace , \end{eqnarray*}\]

where indices $n+1$ and $m+1$ denote dummy inputs that correspond to an item not being allocated to any bidder and a bidder not being allocated any item, respectively. We first show that $\varphi ^{DS}(s,s^{\prime })$ as constructed is doubly stochastic and that we do not lose in generality by the constructive approach that we take. See Appendix D.3 for the proof.

Lemma 3.2.

The matrix $\varphi ^{DS}(s, s^{\prime })$ is doubly stochastic $\forall \, s, s^{\prime } \in {\mathbb {R}}^{nm}$. For any doubly stochastic matrix $z \in [0,1]^{nm}$, $\exists \, s, s^{\prime } \in {\mathbb {R}}^{nm}$, for which $z = \varphi ^{DS}(s, s^{\prime })$.

The payment component for unit-demand valuations is the same as for the case of additive valuations (see Figure 3).

3.3 Covering Number Bounds

We conclude this section by instantiating our generalization bound from Section 2.4 to RegretNet, where we have both a regret and revenue term. Analogous results can also be stated for RochetNet, where we only have a revenue term. Here, $\Vert \cdot \Vert _1$ is the induced matrix norm (i.e., $\Vert w\Vert _1 = \max _{j}\sum _{i} |w_{ij}|$).

Theorem 3.1.

For RegretNet with R hidden layers, K nodes per hidden layer, $d_g$ parameters in the allocation component, $d_p$ parameters in the payment component, m items, n bidders, a sample size of L, and the vector of all model parameters w satisfying $\Vert w\Vert _{1} \le W$, the following are valid bounds for the $\Delta _{L}$ term defined in Theorem 2.2, for different bidder valuation types:

(a)

additive valuations:

$\Delta _{L} \le O\big (\sqrt {R(d_g+d_p) \log (LW\max \lbrace K, mn\rbrace) / {L} } \big)$,

(b)

unit-demand valuations:

$\displaystyle \Delta _{L} \le O\big (\sqrt {R(d_g+d_p) \log (LW\max \lbrace K, mn\rbrace) / {L} }\big)$,

The proof is given in Appendix D.5. As the sample size $L \rightarrow \infty$, the term $\Delta _{L} \rightarrow 0$. The dependence of the preceding result on the number of layers, nodes, and parameters in the network is similar to the standard covering number bounds for neural networks [Anthony and Bartlett 2009].

4 Training the Networks

We next describe how we train the neural network architectures presented in the previous sections.

The approach that we take for RochetNet is the standard (projected) SGD for loss function $\mathcal {L}(\alpha , \beta)$ in Equation (5). For additive valuations, we project each weight $\alpha _{jk}$ during training into $[0,1]$ to guarantee feasibility.

In the case of RegretNet, we need to take care of the need for incentive alignment directly. We use the augmented Lagrangian method to solve the constrained training problem in (4) over the space of neural network parameters w. The Lagrangian function for the optimization problem, augmented with a quadratic penalty term for violating the constraints, is

\[\begin{eqnarray} {\mathcal {C}}_\rho (w; \lambda) ~=~ -\frac{1}{L}\sum _{\ell =1}^L \sum _{i \in N} p^w_i(v^{(\ell)}) \,+\, \sum _{i\in N}\lambda _{i}\,\widehat{\mathit {rgt}}_i(w) \,+\, \frac{\rho }{2} \sum _{i\in N}\Big (\widehat{\mathit {rgt}}_i(w)\Big)^2, \end{eqnarray}\]

(7)

where $\lambda \in \mathbb {R}^n$ is a vector of Lagrange multipliers and $\rho \gt 0$ is a fixed parameter that controls the weight on the quadratic penalty.

The solver is described in Algorithm 1, and alternates between the following updates on the model parameters and the Lagrange multipliers: (a) $w^{new}\,\in \, \text{argmin}_{w}\,\, {\mathcal {C}}_\rho (w^{old};\, \lambda ^{old})$, and (b) $\lambda ^{new}_{i} \,=\, \lambda _i^{old}\,+\, \rho \,\widehat{\mathit {rgt}}_i(w^{new}),~ \forall i\in N.$

We divide the training sample $\mathcal {S}$ into minibatches of size B and perform several passes over the training samples (with random shuffling of the data after each pass). We denote the minibatch received at iteration t by $\mathcal {S}_t \,=\, \lbrace v^{(1)}, \ldots , v^{(B)}\rbrace$. The update (a) on model parameters involves an unconstrained optimization of ${\mathcal {C}}_\rho$ over w and is performed using a gradient-based optimizer.

Let $\widetilde{\mathit {rgt}}_i(w)$ denote the empirical regret in (3) computed on minibatch $\mathcal {S}_t$. The gradient of ${\mathcal {C}}_\rho$ w.r.t. w for fixed $\lambda ^t$ is given by

\begin{align} \nabla _w \, {\mathcal {C}}_\rho (w;\, \lambda ^{t}) ~=~ & -\frac{1}{B}\sum _{\ell =1}^B \sum _{i\in N} \nabla _w\, p^w_i(v^{(\ell)}) +\, \sum _{i\in N}\, \sum _{\ell = 1}^B \lambda ^t_{i}\, g_{\ell , i} \,+\,\rho \sum _{i\in N} \, \sum _{\ell = 1}^B\, \widetilde{\mathit {rgt}}_i(w)\, g_{\ell , i}, \end{align}

(8)

where

\begin{align*} g_{\ell , i} ~=~ \nabla _w\Big [ \max _{v^{\prime }_i \in V_i}\,u^w_i\big (v_i^{(\ell)}; \big (v^{\prime }_i, v^{(\ell)}_{-i}\big)\big) - u^w_i(v_i^{(\ell)}; v^{(\ell)})\Big ]. \end{align*}

The terms $\widetilde{rgt}_i$ and $g_{\ell , i}$, in turn, involve a “max” over misreports for each bidder i and valuation profile $\ell$. We solve this inner maximization over misreports using another gradient-based optimizer. In particular, we maintain a misreport ${v^{\prime }}_{i}^{(\ell)}$ for each i and valuation profile $\ell$. This value is randomly initialized before training. During training, for each minibatch, the corresponding misreports are updated by taking $\Gamma$ gradient updates, with each update of the form:

\begin{align} {v^{\prime }}_{i}^{(\ell)} & = {v^{\prime }}_{i}^{(\ell)} +\gamma \nabla _{v^{\prime }_i}\!\big [u^w_i\big (v^{(\ell)}_i; \big (v^{\prime }_i, v^{(\ell)}_{-i}\big)\big)\big ]\,\Big \vert _{v^{\prime }_i={v^{\prime }}_{i}^{(\ell)}}, \end{align}

(9)

for some $\gamma \gt 0$. This is in the spirit of adversarial machine learning, where these gradient steps on the input are taken to try to find a misreport for the agent that “defeats” the incentive alignment of the mechanism.

Fig. 5.

Figure 5 gives a visualization of this search for defeating misreports when learning an optimal auction for a problem with a single bidder with an additive valuation over two items, where the bidder’s value for each item is an independent draw from $U[0,1]$ (see Section 5.3, Setting 1). In the visualization, the bidder has true valuation $(v_1, v_2) = (0.1, 0.8)$, with this input represented as a green dot. The red crosses represent possible misreports. The heat map shows the utility gain, $u_1((v_1, v_2); (b_1, b_2)) - u_1((v_1, v_2); (v_1, v_2))$, for this bidder when bidding some amount $(b_1,b_2) \in [0,1]^2$ rather than truthfully. This mechanism is already approximately DSIC and the utility gain is negative everywhere (and truthful bidding has zero regret), with shades of yellow corresponding to a misreport that is almost as good as a true report and shades of green toward blue corresponding to a harmful misreport. We illustrate the use of input gradients by initializing each of 10 possible misreports (we are using 10 misreports for illustration, and in our experiments we initialize only a single misreport) and performing $\Gamma =20$ gradient-ascent steps (9) for each misreport. Figure 5 shows the initial misreports along with a new snapshot of the location of each misreport every four gradient-ascent steps.

We use the Adam optimizer [Kingma and Ba 2015] for updates on model parameters w and misreports ${v^{\prime }}_{i}^{(\ell)}$.³ Since the optimization problem is non-convex, the solver is not guaranteed to reach a globally optimal solution. However, this training algorithm proves very effective in our experiments. The learned auctions incur very low regret and closely match the structure of optimal auctions in settings where this structure is known from existing theory.

5 Experimental Results

In this section, we demonstrate that our approach can recover near-optimal auctions for essentially all settings for which an an analytical solution is known, that it is an effective tool for confirming or refuting hypotheses about optimal designs, and that it can find new auctions for settings where there is no known analytical solution. We present a representative subset of the results here and provide additional experimental results in Appendix B.

5.1 Setup

We implement our framework using the TensorFlow deep learning library. For RochetNet we initialized parameters $\alpha$ and $\beta$ in (5) using a random uniform initializer over the interval [0,1] and a zero initializer, respectively. For RegretNet, we used the tanh activation function at the hidden nodes and Glorot uniform initialization [Glorot and Bengio 2010]. We perform cross-validation to decide on the number of hidden layers and the number of nodes in each hidden layer. We include exemplary numbers that illustrate the tradeoffs in Section 5.7.

We trained RochetNet on $2^{15}$ valuation profiles sampled every iteration in an online manner. We used the Adam optimizer with a learning rate of 0.1 for 20,000 iterations for making the updates. The parameter $\kappa$ in Equation (6) was set to 1,000. Unless specified otherwise, we used a max network over 1,000 linear functions to model the induced utility functions and report our results on a sample of 10,000 profiles.

For RegretNet, we used a sample of 640,000 valuation profiles for training and a sample of 10,000 profiles for testing. The augmented Lagrangian solver was run for a maximum of 80 epochs (full passes over the training set) with a minibatch size of 128. The value of $\rho$ in the augmented Lagrangian was set to 1.0 and incremented every 2 epochs.

An update on $w^t$ was performed for every minibatch using the Adam optimizer with learning rate 0.001. For each update on $w^t$, we ran $\Gamma =25$ misreport updates steps with a learning rate of 0.1. At the end of 25 updates, the optimized misreports for the current minibatch were cached and used to initialize the misreports for the same minibatch in the next epoch. An update on $\lambda ^t$ was performed once every 100 minibatches (i.e., $Q=100$).

We ran all experiments on a compute cluster with NVIDIA Graphics Processing Unit (GPU) cores.

5.2 Evaluation

In addition to the revenue of the learned auction on a test set, we evaluate the regret achieved by RegretNet, averaged across all bidders and test valuation profiles (i.e., $rgt \,=\, \frac{1}{n}\sum _{i=1}^n \widehat{\mathit {rgt}}_i(g^{w},p^{w})$). Each $\widehat{\mathit {rgt}_i}$ has an inner “max” of the utility function over bidder valuations $v_i^{\prime } \in V_i$ (see (3)). We evaluate these terms by running gradient ascent on $v_i^{\prime }$ with a step size of 0.1 for 2,000 iterations (we test 1,000 different random initial $v^{\prime }_i$ and report the one that achieves the largest regret).

For some of the experiments, we also report the total time required to train the network. This time is incurred during offline training, whereas the allocation and payments can be computed in a few milliseconds once the network is trained.

5.3 The Manelli-Vincent and Pavlov Auctions

As a representative example of the exhaustive set of analytical results that we can recover with our approach, we discuss the Manelli-Vincent and Pavlov auctions [Manelli and Vincent 2006; Pavlov 2011]. We specifically consider the following single-bidder, two-item settings:

(A)

A single bidder with additive valuations over two items, where the item values are independent draws from $U[0,1]$.

(B)

A single bidder with unit-demand valuations over two items, where the item values are independent draws from $U[2,3]$.

The optimal design for the first setting is given by Manelli and Vincent [2006], who show that the optimal mechanism is deterministic and offers the bidder three options: receive both items and pay $(4-\sqrt {2})/3$, receive item 1 and pay $2/3$, or receive item 2 and pay $2/3$. For the second setting, Pavlov [2011] shows that it is optimal to offer a fair lottery $(\frac{1}{2},\frac{1}{2})$ over the items (at a discount) or to purchase any item at a fixed price. For the parameters here, the price for the lottery is $\frac{1}{6} (8 + \sqrt {22}) \approx 2.115$ and the price for an individual item is $\frac{1}{6}+\frac{1}{6} (8 + \sqrt {22}) \approx 2.282$.

Fig. 6.

We used two hidden layers with 100 hidden nodes in RegretNet for these settings. A visualization of the optimal allocation rule and those learned by RochetNet and RegretNet is given in Figure 6. Figure 7(a) gives the optimal revenue, the revenue and regret obtained by RegretNet, and the revenue obtained by RochetNet. Figure 7(b) shows how these terms evolve over time during training in RegretNet.

We find that both approaches essentially recover the optimal design, not only in terms of revenue but also in terms of the allocation rule and transfers. The auctions learned by RochetNet are exactly DSIC and match the optimal revenue precisely, with sharp decision boundaries in the allocation and payment rule. The decision boundaries for RegretNet are smoother but still remarkably accurate. The revenue achieved by RegretNet matches the optimal revenue up to a $\lt$1% error term, and the regret it incurs is $\lt$0.001. The plots of the estimated test revenue and test regret show that the augmented Lagrangian method is effective in driving these metrics toward optimal levels.

The additional domain knowledge incorporated into the RochetNet architecture leads to exact DSIC mechanisms that match the optimal design more accurately and speeds up computation (the training took about 10 minutes compared to 11 hours). However, we find it surprising how well RegretNet performs given that it starts with no domain knowledge at all.

We present and discuss a host of additional experiments with single-bidder, two-item settings in Appendix B.

Fig. 7.

5.4 The Straight-Jacket Auction

Fig. 8.

Extending the analytical result of Manelli and Vincent [2006] to a single bidder and an arbitrary number of items (even with additive preferences, all uniform on $[0,1]$) has proven elusive. It is not even clear whether the optimal mechanism is deterministic or requires randomization.

A breakthrough came with Giannakopoulos and Koutsoupias [2018], who were able to find a pattern in the results for two items and three items. The proposed mechanism—the Straight-Jacket Auction (SJA)—offers bundles of items at fixed prices. The key to finding these prices is to view the best-response regions as a subdivision of the m-dimensional cube, and observe that there is an intrinsic relationship between the price of a bundle of items and the volume of the respective best-response region.

Giannakopoulos and Koutsoupias [2018] give a recursive algorithm for finding the subdivision and the prices, and used LP duality to prove that the SJA is optimal for $m \le 6$ items.⁴ They also conjecture that the SJA remains optimal for general m but were unable to prove it.

Figure 8 gives the revenue of the SJA and that found by RochetNet for $m \le 10$ items. We used a test sample of $2^{30}$ valuation profiles (instead of 10,000) to compute these numbers for higher precision. It shows that RochetNet finds the optimal revenue for $m \le 6$ items, and that it finds DSIC auctions whose revenue matches that of the SJA for $m = 7, 8, 9,$ and 10 items. Closer inspection reveals that the allocation and payment rules learned by RochetNet essentially match those predicted by Giannakopoulos and Koutsoupias for all $m \le 10$. We take this as strong additional evidence that their conjecture is correct.

Fig. 9.

For these experiments, we used a max network over 10,000 linear functions (instead of 1,000) to increase the representation and the flexibility of the neural network. This overparameterization trick is commonly used in deep learning and has proven to be very effective in practice [Krizhevsky et al. 2012; Allen-Zhu et al. 2019]. We illustrate this effect in Appendix B.4. We followed up on the usual training phase with an additional 20 iterations of training using Adam optimizer with a learning rate of 0.001 and a minibatch size of $2^{30}$.

We also found it useful to impose item-symmetry on the learned auction, especially for $m = 9$ and 10 items, as this helped with accuracy and reduced training time. Imposing symmetry comes without loss of generality for auctions with an item-symmetric distribution [Daskalakis and Weinberg 2012]. To impose item symmetry, we first permute the inputs to be in ascending order, compute the allocation and payment on this permuted input, and then invert the permutation of allocation to compute the mechanism for the original inputs. With these modifications, it took about 13 hours to train the networks.

5.5 Discovering New Analytical Results

We next demonstrate the potential of RochetNet to help to discover new analytical results for optimal auctions. For this, we consider a single bidder with additive but correlated valuations for two items as follows:

(C)

One additive bidder and two items, where the bidder’s valuation is drawn uniformly from the triangle $T=\lbrace (v_1, v_2)|\frac{v_1}{c}+v_2 \le 2, v_1\ge 0, v_2\ge 1\rbrace ,$ where $c\gt 0$ is a free parameter.

There is no analytical result for the optimal auction design for this setting. We ran RochetNet for different values of c to discover the optimal auction. The mechanisms learned by RochetNet for $c=0.5, 1, 3,$ and 5 are shown in Figure 9.

Based on this, we conjectured that the optimal mechanism contains two menu items for $c \le 1$, namely $\lbrace (0,0), 0\rbrace$ and $\lbrace (1,1), \frac{2+\sqrt {1+3c}}{3}\rbrace$, and three menu items for $c \gt 1$, namely $\lbrace (0,0), 0\rbrace$, $\lbrace (1/c, 1), 4/3\rbrace$, and $\lbrace (1,1), 1+c/3\rbrace$, giving the optimal allocation and payment in each region. In particular, as c transitions from values less than or equal to 1 to values larger than 1, the optimal mechanism transitions from being deterministic to being randomized. Figure 10 gives the revenue achieved by RochetNet and the conjectured optimal format for a range of parameters c computed on $2^{30}$ valuation profiles.

We can validate the optimality of this conjectured design through duality theory [Daskalakis et al. 2013]. The proof is given in Appendix D.6.

Fig. 10.

Theorem 5.1.

For any $c\gt 0$, suppose the bidder’s valuation is uniformly distributed over set $T=\lbrace (v_1, v_2)|\frac{v_1}{c}+v_2 \le 2, v_1\ge 0, v_2\ge 1\rbrace$. Then the optimal auction contains two menu items $\lbrace (0,0), 0\rbrace$ and $\lbrace (1,1), \frac{2+\sqrt {1+3c}}{3}\rbrace$ when $c \le 1$, and three menu items $\lbrace (0,0), 0\rbrace$, $\lbrace (1/c, 1), 4/3\rbrace$, and $\lbrace (1,1), 1+c/3\rbrace$ otherwise.

In Appendix B.5, we also give the mechanisms learned by RochetNet for two additional settings. Taken together, these results demonstrate that RochetNet is a powerful tool to help in the discovery of new analytical results. In follow-up work, Shen et al. [2019] also use a neural network framework, closely related to RochetNet, to discover an optimal analytical result for a similar setting: a single additive bidder and two items, where the bidder’s valuation is drawn uniformly from the triangle $\lbrace (v_1, v_2)|\frac{v_1}{c}+v_2 \le 1, v_1\ge 0, v_2\ge 0\rbrace$.

5.6 Experiments with Optimal Mechanisms That Require an Infinitely Sized Menu

We now demonstrate how RochetNet performs for settings where the optimal mechanism is known to require an infinite number of menu choices. For this, we consider the following setting from Daskalakis et al. [2017]:

(D)

One additive bidders and two items, where bidders draw their value for each item independently from $\mathit {Beta}(\alpha =1,\beta =2)$.⁵

This setting and the corresponding optimal mechanism, with its infinite menu size, is described in detail in Example 3 of Daskalakis et al. [2017]. We seek to evaluate the performance of RochetNet for different-sized menus. In Figure 11, we report the revenue, the number of menu choices represented in RochetNet, and the number of menu choices that are active for one or more samples in the test set. As we increase the number of initialized menu choices, the number of active menu items increases as well. Comparing the optimal infinite-sized menu with the menu learned by RochetNet, we find that the difference in revenue comes from a large number of menu items that each only contribute marginally to the net revenue ($\lt \! 10^{-5})$. RochetNet fails to learn some of these menus due to the fixed size of minibatches and the numerical tolerance error of the optimization routine. Regardless, the overall gap in revenue is negligible. Already with two active menu items, RegretNet achieves a revenue of $\sim$0.3309 ($99.93\%$ of optimal), whereas with three or more active menu items, the revenue is at least $\sim$0.3310 ($99.96\%$ of optimal).

Fig. 11.

5.7 Scaling Up

In this section, we consider settings with up to five bidders and up to 10 items. This is several orders of magnitude more complex than existing analytical or computational results. It is also a natural playground for RegretNet, as no tractable characterizations of IC mechanisms are known for these settings. We specifically consider the following two settings, which generalize the basic setting considered in the work of Manelli and Vincent [2006] and Giannakopoulos and Koutsoupias [2018] to more than one bidder:

(E)

Three additive bidders and 10 items, where bidders draw their value for each item independently from $U[0,1]$.

(F)

Five additive bidders and 10 items, where bidders draw their value for each item independently from $U[0,1]$.

An analytical description of the optimal auction for these settings is not known. However, running a separate Myerson auction for each item is optimal in the limit of the number of bidders [Palfrey 1983]. For a regime with a small number of bidders, this provides a strong benchmark. We also compare to selling the grand bundle via a Myerson auction.

For Setting 1, we show in Figure 12(a) the revenue and regret of the learned auction on a validation sample of 10,000 profiles, and these results were obtained with different architectures. Here, $(R, K)$ denotes an architecture with R hidden layers and K nodes per layer. The (5, 100) architecture has the lowest regret among all the 100-node networks for both Setting 1 and Setting 2. Figure 12(b) shows that the auctions learned through RegretNet yield higher revenue than the Myerson baselines, and they do so with tiny regret.⁶

Fig. 12.

5.8 Comparison to LP

In this section, we compare the train time and solution quality of RegretNet with the solve time and solution quality of the LP-based approach proposed in the work of Conitzer and Sandholm [2002];, 2004]. To be able to run the LP, we consider the small setting of two additive bidders and two items, and with bidders that draw their value for each item independently from $U[0,1]$.

For RegretNet, we used two hidden layers with 100 nodes per hidden layer. The LP was solved with the commercial solver Gurobi and on an Amazon AWS EC-2 instance with 48 cores and 96GB of memory. For the LP-based approach, we handle continuous valuations by discretizing the values into 11 bins per item (resulting in $\approx 9\times 10^5$ decision variables and $\approx 3.6\times 10^6$ constraints), and we adopt two different rounding strategies: one that rounds a continuous input valuation profile to the nearest discrete profile for evaluation (-nearest) and one that rounds a continuous input valuation profile down to the nearest discrete profile for evaluation (-down). Whereas the LP-based mechanism with -nearest rounding fails IR, the use of -down rounding ensures the LP-based approach is IR.

Fig. 13.

The results for this setup are shown in Figure 13. We also report the violations in IR constraints incurred by the LP on the test set; for L valuation profiles, this is measured by $\frac{1}{nL}\sum _{\ell = 1}^L\sum _{i \in N}\, \max \lbrace$ $-u_i(v^{(\ell)}), 0\rbrace$. Due to the coarse discretization, the LP approach with nearest-point rounding suffers substantial IR violations. As a result of this, as well as its relatively high regret compared to RegretNet, the relatively high revenue achieved by the LP together with nearest-point rounding, compared with RegretNet, is misleading. For this reason, we also include the performance of the LP-based mechanism when the continuous input valuation profiles are rounded down to their respective discrete profiles. There we see zero IR violations but substantially lower revenue than RegretNet (and still with higher regret) We were not able to run an LP for this setting for a finer discretization than 11 bins per item value in more than 9 days (216 hours) of compute time.⁷ In contrast, RegretNet yields very low regret along with zero IR violations (as the neural network satisfies IR by design) and does so in around 4 hours. In fact, even for the larger Settings E through F, the training time of RegretNet was less than 13 hours.

Fig. 14.

In Figure 14, we plot the estimated expected test revenue, test regret, and the runtime of the LP-based and RegretNet methods while varying the number of variables in the LP and the number of parameters in RegretNet. For the LP, this is done by varying the discretization, and for RegretNet, this is done by varying the network structure. In Appendix C, we include the complete set of results for varying the discretization in the LP-based method, as well as varying the number of hidden layers and hidden units configurations in RegretNet. Introducing an increasingly fine discretization into the LP-based method provides an initial increase in revenue in return for a modest increase in runtime, but this gives way to a huge increase in runtime with no effect on revenue. For RegretNet, the training time is relatively stable as the number of hidden layers and units per layer is varied, whereas larger networks bring a substantive increase in revenue. We only plot the results for RegretNet that lie on the efficient frontier and refer to Figure 25 (presented later) for the full details. Taken together, these results show that RegretNet’s performance substantially extends the revenue-time Pareto frontier available from the LP method, obtaining higher revenue for a relatively modest training time.

6 Conclusion

In this article, we have introduced the new framework of differentiable economics for using neural networks for economic discovery, specifically for the discovery of revenue-optimal, multi-bidder, and multi-item auctions. Our approach presents a viable path forward for automated mechanism design, where analytic inquiry and competing computational approaches (e.g., discretization plus LP) have hit barriers. We have demonstrated that standard machine learning pipelines can be used to essentially rediscover all known, optimal auction designs, and to discover the design of auctions for settings out of reach of theory and settings that are orders of magnitude larger than those that can be solved through other computational approaches. Additionally, we have given novel generalization bounds on approximate incentive alignment. We see promise for the framework in advancing economic theory, such as in supporting or refuting conjectures, and as an assistant in guiding new economic discovery. As an illustration, we have demonstrated how RochetNet can be used to give additional evidence for the conjecture that the SJA is optimal for $m\ge 6$ items.

In our most general network architecture, RegretNet, we have taken a novel approach to addressing IC. Rather than imposing this as a hard constraint (as in LP-based approaches) or restricting the optimization over designs that are known to be IC, we have adopted a differentiable relaxation of IC and taken a Lagrangian approach that, in addition to optimizing the actual objective, also minimizes the degree to which IC is violated. The notion of expected ex post regret that we have adopted has the advantage that it yields DSIC when optimized exactly. When the expected ex post regret is non-zero, additional insight into the IC violation can be gained by considering quantile information. A potential downside in multi-bidder settings is that the expected ex post regret depends on the type distributions of all bidders. In our experiments, we have seen that the expected ex post regret notion succeeds in finding designs that are close to DSIC designs, including in multi-bidder settings. We believe that providing theoretical evidence for why this approach, based on minimizing expected ex post regret, succeeds, or finding alternative differentiable relaxations of DSIC for which this is the case, is an important direction for future work.

Our framework has already inspired a great deal of follow-up work, in taking differentiable economics to additional domains and in scaling up the methods to support networks that simultaneously handle multiple sizes of markets (number of bidders and number of items) (e.g., [Duan et al. 2022]). Deep learning techniques have also been used to optimize classes of mechanisms that play an important role in practice, such as the Generalized-Second Price (GSP) mechanism for sponsored search auctions (e.g., [Zhang et al. 2021]). Looking ahead, there remain a number of interesting challenges. Beyond expanding the domains that are studied by differentiable economics, the methodological challenges include the interpretability of learned mechanisms, integrating additional structural regularities from economic theory, scaling up to larger economic systems, and providing robustness guarantees in the form of certificates for economic properties. Combinatorial auctions (CAs) present an especially important domain, and one whose study we have only initiated here (see Appendices A.2 and B.3, for theory and experiment results for the case of CAs with two items). CAs are important to practice [Palacios-Huerta et al. 2022], and yet concerns around low revenue and their vulnerability to collusion [Ausubel and Milgrom 2006; Ausubel et al. 2006; Day and Milgrom 2008; Levin and Skrzypacz 2016; Goeree and Lien 2016] mean that we lack a complete understanding even for the design of efficient auctions, nevermind finding revenue-optimizing designs.

Acknowledgments

We thank Zihe Wang (Shanghai University of Finance and Economics) for pointing out that the combinatorial feasible definition in the ICML’19 published version of the extended abstract of this article need not imply an integer decomposition. We would like to thank Dirk Bergemann, Yang Cai, Vincent Conitzer, Yannai Gonczarowski, Constantinos Daskalakis, Glenn Ellison, Sergiu Hart, Ron Lavi, Kevin Leyton-Brown, Shengwu Li, Noam Nisan, Parag Pathak, Alexander Rush, Karl Schlag, Zihe Wang, Alex Wolitzky, participants in the Economics and Computation Reunion Workshop at the Simons Institute, the NIPS’17 Workshop on Learning in the Presence of Strategic Behavior, a Dagstuhl Workshop on Computational Learning Theory meets Game Theory, the EC’18 Workshop on Algorithmic Game Theory and Data Science, the Annual Congress of the German Economic Association, participants in seminars at LSE, Technion, Hebrew, Google, HBS, MIT, Stanford, and the anonymous reviewers on earlier versions of this article for their helpful feedback.

Footnotes

In this work, we focus on DSIC, but the RegretNet can also be adapted to handle BIC [Feng et al. 2018].

In follow-up work, Shen et al. [2019] extend the RochetNet architecture to more general settings, including settings with non-linear utility functions.

Adam is a variant of SGD that makes use of a momentum term to update weights. Lines 9 and 15 in the pseudo-code of Algorithm 1 are stated for a standard SGD algorithm.

⁴

The duality argument developed by Giannakopoulos and Koutsoupias is similar but incomparable to the duality approach of Daskalakis et al. [2013]. We will return to the latter in Section 5.5.

⁵

A Beta distribution with $\alpha =1, \beta =2$ has density function $f(x) = 2(1 -x)$.

⁶

One might wonder, given that our framework provides only approximate DSIC, albeit with very small ex post regret, whether there is in fact a DSIC auction with better revenue than item-wise Myerson. In fact, we know there are such auctions. The follow-on work of LotteryAMA [Curry et al. 2023] and MenuNet [Duan et al. 2023b] provide DSIC auctions for Setting 1 with an expected revenue of 5.345 and 5.590, respectively, each outperforming item-wise Myerson.

⁷

We used an AWS EC-2 instance with 48 cores and 96GB of memory.

⁸

In the present work, we develop this architecture only for small number of items. With more items, combinatorial valuations can be succinctly represented using appropriate bidding languages (see, e.g., [Boutilier and Hoos 2001]).

⁹

This setting can be handled by the non-combinatorial RegretNet architecture and is included here for comparison to the work of Sandholm and Likhodedov [2015].

¹⁰

It is fairly similar to the proof for setting $c \gt 1$. If $c\le 1$, there are only two regions to discuss, in which $R_1$ and $R_2$ are the regions corresponding to allocations $(0,0)$ and $(1,1)$, respectively. Then we show the optimal $\gamma ^* = \bar{\gamma }^{R_1} + \bar{\gamma }^{R_2}$ where $\bar{\gamma }^{R_1}= 0$ for region $R_1$ and show $\gamma ^{R_2}$ only “transports” mass of measure downward and leftward in region $R_2$, which is analogous to the analysis for $\gamma ^{R_3}$ for setting $c \gt 1$.

A Additional Architectures

In this appendix, we present additional network architectures for a multi-bidder, single-item setting, as well as for a general multi-bidder, multi-item setting with combinatorial valuations.

A.1 The MyersonNet Approach

We start by describing an architecture that yields optimal DSIC auction for selling a single item to multiple buyers.

In the single-item setting, each bidder holds a private value $v_i \in {\mathbb {R}}_{\ge 0}$ for the item. We consider a randomized auction $(g,p)$ that maps a reported bid profile $b \in {\mathbb {R}}_{\ge 0}^n$ to a vector of allocation probabilities $g(b) \in {\mathbb {R}}_{\ge 0}^n$, where $g_i(b) \in {\mathbb {R}}_{\ge 0}$ denotes the probability that bidder i is allocated the item and $\sum _{i=1}^n g_i(b) \le 1$. We shall represent the payment rule $p_i$ via a price conditioned on the item being allocated to bidder i—that is, $p_i(b) = g_i(b)\,t_i(b)$ for some conditional payment function $t_i: {\mathbb {R}}_{\ge 0}^n \rightarrow {\mathbb {R}}_{\ge 0}$. The expected revenue of the auction, when bidders are truthful, is given by

\begin{equation} rev(g, p) \,=\, {\mathbf {E}}_{v \sim F}\bigg [\sum _{i=1}^n g_i(v)\,t_i(v)\bigg ]. \end{equation}

(10)

The structure of the revenue-optimal auction is well understood for this setting.

Theorem A.1 (Myerson [1981]).

There exist a collection of monotonically non-decreasing functions, $\bar{\phi _i}: \mathbb {R}_{\ge 0} \rightarrow \mathbb {R}$ called the ironed virtual valuation functions such that the optimal BIC auction for selling a single item is the DSIC auction that assigns the item to the buyer with the highest ironed virtual value $\bar{\phi }_i(v_i)$ provided that this value is non-negative, with ties broken in an arbitrary value-independent manner, and charges the bidders according to $p_i(v_i) = v_i g_i(v_i) - \int _{0}^{v_i} g_i(t) \;dt$.

For distribution $F_i$ with density $f_i,$ the virtual valuation function is $\psi _i(v_i) = v_i - (1-F(v_i))/f(v_i)$. A distribution $F_i$ with density $f_i$ is regular if $\psi _i$ is monotonically non-decreasing. For regular distributions $F_1, \dots , F_n,$ no ironing is required and $\bar{\phi }_i = \psi _i$ for all i.

If the virtual valuation functions $\psi _1, \dots , \psi _n$ are furthermore monotonically increasing and not only monotonically non-decreasing, the optimal auction can be viewed as applying the monotone transformations to the input bids $\bar{b}_i = \bar{\phi }_i(b_i)$, feeding the computed virtual values to a Second Price Auction (SPA) with zero reserve price, denoted $(g^0, p^0)$, making an allocation according to $g^0(\bar{b})$ and charging a payment $\bar{\phi }^{-1}_i(p^0_i(\bar{b}))$ for winning bidder i. In fact, this auction is DSIC for any choice of strictly monotone transformations of the values.

Theorem A.2.

For any set of strictly monotonically increasing functions $\bar{\phi }_1, \ldots , \bar{\phi }_n$, an auction defined by outcome rule $g_i = g^0_i \, \circ \, \bar{\phi }$ and payment rule $p_i = \bar{\phi }_i^{-1}\, \circ \, p^0_i \, \circ \, \bar{\phi }$ is DSIC and IR, where $(g^0, p^0)$ is the allocation and payment rule of a SPA with zero reserve.

For regular distributions with monotonically increasing virtual value functions, designing an optimal DSIC auction thus reduces to finding the right strictly monotone transformations and corresponding inverses, and modeling a SPA with zero reserve.

We present a high-level overview of a neural network architecture that achieves this in Figure 15(a), and describe the components of this network in more detail in Sections A.1.1 and A.1.2.

MyersonNet is tailored to monotonically increasing virtual value functions. For regular distributions with virtual value functions that are not strictly increasing and for irregular distributions, this approach only yields approximately optimal auctions.

Fig. 15.

A.1.1 Modeling Monotone Transforms.

We model each virtual value function $\bar{\phi }_i$ as a two-layer feed-forward network with min and max operations over linear functions. For K groups of J linear functions, with strictly positive slopes $w^i_{kj} \in {\mathbb {R}}_{\gt 0},~k = 1,\ldots ,K,~ j = 1,\ldots , J$ and intercepts $\beta ^i_{kj} \in {\mathbb {R}},~k = 1,\ldots ,K,~ j = 1,\ldots , J$, we define

\begin{equation*} \bar{\phi }_{i}(b_i) \,=\, \min _{k \in [K]} \max _{j \in [J]}\, w^i_{kj}\,b_i + \beta ^i_{kj}. \end{equation*}

Since each of the preceding linear functions is strictly non-decreasing, so is $\bar{\phi }_{i}$. In practice, we can set each $w^i_{kj} = e^{\alpha ^i_{kj}}$ for parameters $\alpha ^i_{kj} \in [-B,B]$ in a bounded range. A graphical representation of the neural network used for this transform is shown in Figure 15(b). For sufficiently large K and J, this neural network can be used to approximate any continuous, bounded monotone function (that satisfies a mild regularity condition) to an arbitrary degree of accuracy [Sill 1998]. A particular advantage of this representation is that the inverse transform $\bar{\phi }^{-1}$ can be directly obtained from the parameters for the forward transform:

\begin{equation*} \bar{\phi }^{-1}_{i}(y) \,=\, \max _{k \in [K]} \min _{j \in [J]}\, e^{-\alpha ^i_{kj}}(y - \beta ^i_{kj}). \end{equation*}

A.1.2 Modeling SPA with Zero Reserve.

We also need to model a SPA with zero reserve (SPA-0) within the neural network structure. For the purpose of training, we employ a smooth approximation to the allocation rule using a neural network. Once we learn value functions using this approximate allocation rule, we use them together with an exact SPA with zero reserve to construct the final auction.

Fig. 16.

The SPA-0 allocation rule $g^0$ can be approximated using a ‘softmax’ function on the virtual values $\bar{b}_1, \ldots , \bar{b}_n$ and an additional dummy input $\bar{b}_{n+1} = 0$:

\begin{align} g^0_i(\bar{b}) = \frac{e^{\kappa \bar{b}_i}}{\sum _{j=1}^{n+1} e^{\kappa \bar{b}_j}}, ~ i \in N, \end{align}

(11)

where $\kappa \gt 0$ is a constant fixed a priori and determines the quality of the approximation. The higher the value of $\kappa$, the better the approximation but the less smooth the resulting allocation function.

The SPA-0 payment to bidder i, conditioned on being allocated, is the maximum of the virtual values from the other bidders and zero:

\begin{align} t^0_i(\bar{b})\,=\, \max \big \lbrace \max _{j \ne i} \bar{b}_j, \, 0\big \rbrace ,~ i \in N. \end{align}

(12)

Let $g^{\alpha ,\beta }$ and $t^{\alpha ,\beta }$ denote the allocation and conditional payment rules for the overall auction in Figure 15(a), where $(\alpha ,\beta)$ are the parameters of the forward monotone transform. Given a sample of valuation profiles $\mathcal {S} =\lbrace v^{(1)}, \ldots , v^{(L)}\rbrace$ drawn i.i.d. from F, we optimize the parameters using the negated revenue on $\mathcal {S}$ as the error function, where the revenue is approximated as

\begin{equation} \widehat{rev}(g,t) \,=\, \frac{1}{L}\sum _{\ell =1}^L\sum _{i=1}^n g^{\alpha ,\beta }_i(v^{(\ell)})\,t^{\alpha ,\beta }_i(v^{(\ell)}). \end{equation}

(13)

We solve this training problem using a minibatch SGD solver.

A.2 RegretNet for Combinatorial Valuations

We next show how to adjust the RegretNet architecture to handle bidders with general, combinatorial valuations.⁸

In this case, each bidder i reports a bid $b_{i, S}$ for every bundle of items $S \subseteq M$ (except the empty bundle, for which her valuation is taken as zero). The allocation component of the network has an output $z_{i, S} \in [0,1]$ for each bidder i and bundle S, denoting the probability that the bidder is allocated the bundle.

To prevent the items from being overallocated, we require that the probability that an item appears in a bundle allocated to some bidder is at most 1. We also require that the total allocation to a bidder is at most 1:

\begin{align} & \sum _{i \in N}\sum _{S \subseteq M: j \in S } z_{i,S} \,\le \, 1,\, \forall j \in M; \end{align}

(14)

\begin{align} & \sum _{S \subseteq M} z_{i,S} \,\le \, 1,\, \forall i \in N. \end{align}

(15)

We refer to an allocation that satisfies constraints (14) and (15) as being combinatorial feasible. To enforce these constraints, the allocation component of the network computes a set of scores for each bidder and a set of scores for each item. Specifically, there is a group of bidder-wise scores $s_{i,S}, \forall S\subseteq M$ for each bidder $i \in N$, and a group of item-wise scores $\smash{s^{(j)}_{i,S}, \forall i \in N,\, S\subseteq M}$ for each item $j \in M$.

Let $s, s^{(1)}, \dots , s^{(m)} \in \mathbb {R}^{n \times 2^m}$ denote these bidder scores and item scores. Each group of scores is normalized using a softmax function: $\bar{s}_{i, S} = {\exp ({s_{i,S}})}/{\sum _{S^{\prime }} \exp ({s_{i,S^{\prime }}})}$ and $\bar{s}^{(j)}_{i, S} = {\exp ({s^{(j)}_{i,S}})}/{\sum _{i^{\prime },S^{\prime }} \exp ({s^{(j)}_{i^{\prime },S^{\prime }}})}.$ The allocation for bidder i and bundle $S \subseteq M$ is defined as the minimum of the normalized bidder-wise score $\bar{s}_{i,S}$ and the normalized item-wise scores $\smash{\bar{s}^{(j)}_{i, S}}$ for each $j \in S$:

\begin{align} {z_{i,S} \,=\, \varphi ^{CF}_{i,S}({s}, {s}^{(1)},\ldots ,{s}^{(m)}) \,=\, \min \big \lbrace \bar{s}_{i,S}, \, \bar{s}^{(j)}_{i,S}:\, j \in S\big \rbrace }. \end{align}

(16)

Similar to the unit-demand setting, we first show that $\varphi ^{CF}({s}, {s}^{(1)},\ldots ,{s}^{(m)})$ is combinatorial feasible and that our constructive approach is without loss of generality. See Appendix D.4 for a proof.

Lemma A.1.

The matrix $\varphi ^{CF}({s}, {s}^{(1)},\ldots ,{s}^{(m)})$ is combinatorial feasible $\forall \, {s},$ ${s}^{(1)}, \ldots ,{s}^{(m)} \in {\mathbb {R}}^{n \times 2^m}$. For any combinatorial feasible matrix $z \in [0,1]^{n \times 2^m}$, $\exists \, {s}, {s}^{(1)},\ldots ,{s}^{(m)} \in {\mathbb {R}}^{n \times 2^m}$, for which $z = \varphi ^{CF}({s}, {s}^{(1)},\ldots ,{s}^{(m)})$.

In addition, we want to understand whether a combinatorial feasible allocation z can be implementable, defined in the following way.

Definition A.1.

A fractional combinatorial allocation z is implementable if and only if z can be represented as a convex combination of combinatorial feasible, deterministic allocations.

Unfortunately, Example A.1 shows that a combinatorial feasible allocation may not have an integer decomposition, even for the case of two bidders and two items.

Example A.1.

Consider a setting with two bidders and two items, and the following fractional, combinatorial feasible allocation:

\begin{equation*} z = \begin{bmatrix} z_{1,\lbrace 1\rbrace } & z_{1,\lbrace 2\rbrace } & z_{1, \lbrace 1,2\rbrace }\\ z_{2,\lbrace 1\rbrace } & z_{2,\lbrace 2\rbrace } & z_{2, \lbrace 1,2\rbrace } \end{bmatrix}=\begin{bmatrix} 3/8 & 3/8 & 1/4\\ 1/8 & 1/8 & 1/4 \end{bmatrix}. \end{equation*}

Any integer decomposition of this allocation z would need to have the following structure:

\[\begin{eqnarray*} z &=& a\begin{bmatrix} 0 & 0 & 1\\ 0& 0 & 0 \end{bmatrix} + b\begin{bmatrix} 0 & 0 & 0\\ 0& 0 & 1 \end{bmatrix} + c\begin{bmatrix} 1 & 0 & 0\\ 0& 1 & 0 \end{bmatrix} + d\begin{bmatrix} 1 & 0 & 0\\ 0& 0 & 0 \end{bmatrix} + e\begin{bmatrix} 0 & 0 & 0\\ 0 & 1 & 0 \end{bmatrix}\\ && + f\begin{bmatrix} 0 & 1 & 0\\ 1 & 0 & 0 \end{bmatrix} + g\begin{bmatrix} 0 & 1 & 0\\ 0 & 0 & 0 \end{bmatrix} + h\begin{bmatrix} 0 & 0 & 0\\ 1 & 0 & 0 \end{bmatrix} \end{eqnarray*}\]

where the coefficients sum to at most 1. First, it is straightforward to see that $a = b = 1/4$. Given the construction, we must have $c + d = 3/8, e\ge 0$ and $f + g = 3/8, h \ge 0$. Thus, $a +b + c + d+e+f+g+h \ge 1/2 + 3/4 = 5/4$ for any decomposition. Hence, z is not implementable.

To ensure that a combinatorial feasible allocation has an integer decomposition, we need to introduce additional constraints. For the two-item case, we introduce the following constraint:

\[\begin{eqnarray} \forall i, z_{i, \lbrace 1\rbrace } + z_{i, \lbrace 2\rbrace } \le 1 -\sum _{i^{\prime }=1}^{n} z_{i^{\prime }, \lbrace 1, 2\rbrace }. \end{eqnarray}\]

(17)

Theorem A.3.

For $m=2$, any combinatorial feasible allocation z with additional constraints (17) can be represented as a convex combination of matrices $B^1, \dots , B^k$ where each $B^\ell$ is a combinatorial feasible, 0-1 allocation.

Proof.

First, we observe in any deterministic allocation $B^\ell$, if there exists an i, s.t. $B^\ell _{i, \lbrace 1, 2\rbrace } = 1$, then $\forall j \ne i, S: B^\ell _{j, S} = 0$. Therefore, we first decompose z into the following components,

\[\begin{eqnarray*} z = \sum _{i=1}^n z_{i, \lbrace 1, 2\rbrace }\cdot B^{i} + C, \end{eqnarray*}\]

and

\begin{equation*} B^i_{j, S} = \left\lbrace \begin{array}{ll} 1 & \text{if $j=i, S =\lbrace 1, 2\rbrace $, and} \\ 0 & \text{otherwise.} \end{array} \right. \end{equation*}

Then we want to argue that C can be represented as $\sum _{\ell = i+1}^{k} p_\ell \cdot B^\ell$, where $\sum _{\ell = i+1}^{k} p_\ell \le 1 - \sum _{i=1}^n z_{i, \lbrace 1, 2\rbrace }$ and each $B^\ell$ is a feasible 0-1 allocation. Matrix C has all zeros in the last (items $\lbrace 1, 2\rbrace$) column, $\sum _{i} C_{i, \lbrace 1\rbrace } \le 1 - \sum _{i=1}^n z_{i, \lbrace 1, 2\rbrace }$, and $\sum _{i} C_{i, \lbrace 2\rbrace } \le 1 - \sum _{i=1}^n z_{i, \lbrace 1, 2\rbrace }$.

In addition, based on constraint (17), for each bidder i,

\[\begin{eqnarray*} C_{i, \lbrace 1\rbrace } + C_{i, \lbrace 2\rbrace } = z_{i, \lbrace 1\rbrace } + z_{i, \lbrace 2\rbrace } \le 1 - \sum _{i^{\prime }=1}^n z_{i^{\prime }, \lbrace 1, 2\rbrace }. \end{eqnarray*}\]

Thus, C is a doubly stochastic matrix with scaling factor $1 - \sum _{i^{\prime }=1}^n z_{i^{\prime }, \lbrace 1, 2\rbrace }$. Therefore, we can always decompose C into a linear combination $\sum _{\ell = i+1}^{k} p_\ell \cdot B^\ell$, where $\sum _{\ell = i+1}^{k} p_\ell \le 1 - \sum _{i^{\prime }=1}^n z_{i^{\prime }, \lbrace 1, 2\rbrace }$ and each $B^\ell$ is a feasible 0-1 allocation. □

We leave it to future work to characterize the additional constraints needed for the multi-item ($m\gt 2$) case.

A.2.1 RegretNet for Two-Item Auctions with Implementable Allocations.

To accommodate the additional constraint (17) for the two-item case, we add an additional softmax layer for each bidder. In addition to the original (un-normalized) bidder-wise scores $s_{i, S}, \forall i \in N, S\subseteq M$ and item-wise scores $s^{(j)}_{i, S}, \forall i \in N, S\subseteq M, j\in M$ and their normalized counterparts $\bar{s}_{i, S}, \forall i \in N, S\subseteq M$ and $\bar{s}^{(j)}_{i, S}, \forall i \in N, S\subseteq M, j\in M$, the allocation component of the network computes an additional set of scores for each bidder i, ${s^{\prime }}^{(i)}_{i, \lbrace 1\rbrace }, {s^{\prime }}^{(i)}_{i, \lbrace 2\rbrace }, {s^{\prime }}^{(i)}_{1, \lbrace 1, 2\rbrace }, \ldots , {s^{\prime }}^{(i)}_{n, \lbrace 1, 2\rbrace }$. These additional scores are then normalized using a softmax function as follows:

\[\begin{eqnarray*} \forall i, k \in N, S\subseteq M, & \bar{s^{\prime }}^{(i)}_{k, S} = \displaystyle \frac{\exp \left({s^{\prime }}^{(i)}_{k, S} \right)}{\exp \left({s^{\prime }}^{(i)}_{i, \lbrace 1\rbrace }\right) + \exp \left({s^{\prime }}^{(i)}_{i, \lbrace 2\rbrace }\right) + \sum _k \exp \left({s^{\prime }}^{(i)}_{k, \lbrace 1, 2\rbrace } \right)}. \end{eqnarray*}\]

To satisfy constraint (17) for each bidder i, we compute the normalized score $\bar{s^{\prime }}_{i, S}$ for each $i, S$ as

\begin{equation*} \bar{s^{\prime }}_{i, S} = \left\lbrace \begin{array}{ll} \bar{s^{\prime }}^{(i)}_{i, S} & \text{ if } S=\lbrace 1\rbrace \text{ or } \lbrace 2\rbrace , \text{and} \\ \min \left\lbrace \bar{s^{\prime }}^{(k)}_{i, S}: k\in N\right\rbrace & \text{ if } S=\lbrace 1, 2\rbrace . \end{array} \right. \end{equation*}

Then the final allocation for each bidder i is

\[\begin{eqnarray*} z_{i, S} = \min \left\lbrace \bar{s}_{i, S}, \bar{s^{\prime }}_{i, S}, \bar{s}^{(j)}_{i, S}: j\in S\right\rbrace . \end{eqnarray*}\]

The payment component of the network for combinatorial bidders has the same structure as the one in Figure 3, computing a fractional payment $\tilde{p}_i \in [0,1]$ for each bidder i using a sigmoidal unit and outputting a payment $p_i = \tilde{p}_{i}\, \sum _{S\subseteq M} z_{i,S}\, b_{i,S}$.

B Additional Experiments

We present a broad range of additional experiments for the two main architectures used in the body of the article and additional ones for the architectures presented in Appendix A.

B.1 Experiments with MyersonNet

We first evaluate the MyersonNet architecture introduced in Appendix A.1 for designing single-item auctions. We focus on settings with a small number of bidders because this is where revenue-optimal auctions are meaningfully different from efficient auctions. We present experimental results for the following four settings:

(G)

Three bidders with independent, regular, and symmetrically distributed valuations $v_i \sim U[0,1]$.

(H)

Five bidders with independent, regular, and asymmetrically distributed valuations $v_i \sim U[0,i]$.

(I)

Three bidders with independent, regular, and symmetrically distributed valuations $v_i \sim Exp(3)$.

(J)

Three bidders with independent irregular distributions $F_\text{irregular}$, where each $v_i$ is drawn from $U[0,3]$ with probability 3/4 and from $U[3,8]$ with probability 1/4.

We note that the optimal auctions for the first three distributions involve virtual value functions $\bar{\phi }_i$ that are strictly monotone. For the fourth and final distribution, the optimal auction uses ironed virtual value functions that are not strictly monotone.

For the training set and test set, we used 1,000 valuation profiles sampled i.i.d. from the respective valuation distribution. We modeled each transform $\bar{\phi }_i$ in the MyersonNet architecture using five sets of 10 linear functions, and we used $\kappa = 10^3$.

Fig. 17.

The results are summarized in Figure 17. For comparison, we also report the revenue obtained by the optimal Myerson auction and the SPA without reserve. The auctions learned by the neural network yield revenue close to the optimal.

B.2 Additional Experiments with RochetNet and RegretNet

In addition to the experiments with RochetNet and RegretNet on the single bidder, multi-item settings in Section 5.3, we considered the following settings:

(K)

A single additive bidder with independent preferences over two non-identically distributed items, where $v_1 \sim U[4,16]$ and $v_2 \sim U[4,7]$. The optimal mechanism is given by Daskalakis et al. [2017].

(L)

A single additive bidder with preferences over two items, where $(v_1, v_2)$ are drawn jointly and uniformly from a unit triangle with vertices $(0,0), (0,1),$ and $(1,0)$. The optimal mechanism is due to Haghpanah and Hartline [2019].

(M)

A single unit-demand bidder with independent preferences over two items, where the item values $v_1, v_2 \sim U[0,1]$. See the work of Haghpanah and Hartline [2019] for the optimal mechanism.

We used RegretNet architectures with two hidden layers with 100 nodes each. The optimal allocation rules as well as a side-by-side comparison of those found by RochetNet and RegretNet are given in Figure 18. Figure 19 gives the revenue and regret achieved by RegretNet and the revenue achieved by RochetNet.

We find that in all three settings, RochetNet recovers the optimal mechanism basically exactly, whereas RegretNet finds an auction that matches the optimal design to surprising accuracy.

Fig. 18.

Fig. 19.

B.3 Experiments with RegretNet with Combinatorial Valuations

We next compare our RegretNet architecture for combinatorial valuations described in Section A.2 to the computational results of Sandholm and Likhodedov [2015] for the following settings for which the optimal auction is not known:

(N)

Two additive bidders and two items, where bidders draw their value for each item independently from $U[0,1]$.⁹

(O)

Two bidders and two items, with item valuations $v_{1,1}, v_{1,2}, v_{2,1}, v_{2,2}$ drawn independently from $U[1,2]$ and set valuations $v_{1,\lbrace 1,2\rbrace } = v_{1,1}+v_{1,2} + C_1$ and $v_{2,\lbrace 1,2\rbrace } = v_{2,1}+ v_{2,2} + C_2$, where $C_1, C_2$ are drawn independently from $U[-1,1]$.

(P)

Two bidders and two items, with item valuations $v_{1,1}, v_{1,2}$ drawn independently from $U[1,2]$, item valuations $v_{2,1}, v_{2,2}$ drawn independently from $U[1,5]$, and set valuations $v_{1,\lbrace 1,2\rbrace } = v_{1,1} + v_{1,2} + C_1$ and $v_{2,\lbrace 1,2\rbrace } = v_{2,1}+ v_{2,2} + C_2$, where $C_1, C_2$ are drawn independently from $U[-1,1]$.

These settings correspond to Settings I through III described in Section 3.4 of the work of Sandholm and Likhodedov [2015]. These authors conducted extensive experiments with several different classes of IC mechanisms and different heuristics for setting the parameters of these auctions. They observed the highest revenue for two classes of mechanisms that generalize mixed bundling auctions and $\lambda$-auctions [Jehiel et al. 2007].

These two classes of mechanisms are the Virtual Value Combinatorial Auction ($\text{VVCA}$) and AMA. They also considered a restriction of $\text{AMA}$ to bidder-symmetric auction ($\text{AMA}_{\text{bsym}}$). We use $\text{VVCA}^*$, $\text{AMA}^*$, and $\text{AMA}^*_{\text{bsym}}$ to denote the best mechanism in the respective class, as reported by Sandholm and Likhodedov and found using a heuristic grid search technique.

For Settings N and O, Sandholm and Likhodedov observed the highest revenue for $\text{AMA}^*_{\text{bsym}}$, and for Setting P, the best-performing mechanism was $\text{VVCA}^*$. Figure 20 compares the performance of RegretNet to that of these best-performing benchmark mechanisms. To compute the revenue of the benchmark mechanisms, we used the parameters reported in the work of Sandholm and Likhodedov [2015] (Table 2, p. 1011) and evaluated the respective mechanisms on the same test set used for RegretNet. Note that RegretNet is able to learn new auctions with improved revenue and tiny regret.

Fig. 20.

To make sure we are using sufficient data to report our results, we reran our evaluation for Setting N on a bigger test set with up to 50,000 samples and computed the regret using 5,000 gradient ascent steps. The estimated revenue and regret remained approximately the same as that observed on our regular test set with 10,000 samples with regret computed using 2,000 gradient ascent steps. Figure 21 shows how the revenue and regret vary as we increase the size of the test set.

Fig. 21.

B.4 Experiments with RochetNet with Varying Linear Units

In Figure 22, we show how the performance of RochetNet varies as we increase the number of initialized menu choices (i.e., number of units in the network). We consider here a single bidder and six items, where the bidder’s valuation is sampled independently $U[0,1]$ for each item. The optimal mechanism is given by the SJA. We observe that RochetNet recovers the optimal design with increasing accuracy as we increase the number of menu choices (units in the network) and even while only a small fraction of the menu choices are active (<3% active when the number of initialized menu choices are over 1,000). When we also impose item symmetry, we observed that the performance of RochetNet is relatively invariant to increasing the number of initialized menu choices.

Fig. 22.

B.5 Additional Experiments with Discovering New Analytical Results

In Section 5, we described how RochetNet can be used to discover new analytical results for optimal auctions. In this section, we give analogous computational results, again suggestive of the structure of theoretically optimal auction designs, for two such additional settings:

(Q)

One additive bidder and two items, where the bidder’s valuation is drawn uniformly from the triangle $T=\lbrace (v_1, v_2)|v_1 +v_2(c-1) \le 2c - 1, v_1\ge 1, v_2\ge 1\rbrace ,$ where $c\ge 1$ is a free parameter.

(R)

One additive bidder and two items, where the bidder’s valuation is drawn uniformly from the triangle $T=\lbrace (v_1, v_2)|v_1 + v_2 \le c + 1, v_1\ge 1, v_2\ge 1\rbrace ,$ where $c\ge 1$ is a free parameter.

The mechanisms learned by RochetNet for Setting 1 and Setting 2 for various values of c are shown in Figures 23 and 24, respectively.

Fig. 23.

Fig. 24.

C Comparison to LP

Fig. 25.

In Figure 25, we report additional details on the performance of the LP-based approach as we vary the discretization in the LP and the number of parameters in RegretNet (varying the number of hidden layers and hidden units). For LP, the number of parameters is given by the number of output variables used to define the objective and the constraints. For RegretNet, the number of parameters are computed by counting the number of learnable weights in the allocation and payment network. The results are reported for the setting in Section 5.8, with two additive bidders and two items, with bidder item values sampled independently $U[0,1]$. For the -nearest rounding strategy, the LP-based approach yields a higher revenue than RegretNet, but this is misleading and would not be attainable in practice because it has higher regret and suffers from substantial IR violations. If we instead compute the allocation and payment in the LP through -down rounding, the IR violation is zero but the revenue is much lower. Increasing the amount of discretization in the LP leads to more accurate results with lower regret (and lower IR violations with -nearest), but the number of parameters and the runtime also increase exponentially. For the setting with 12 bins per value, the LP did not terminate despite running for 9 days on an AWS EC2 instance with 48 cores and 96GB of memory. In contrast, RegretNet learns a mechanism in this setting with negligible regret and zero IR violations in at most 6 hours for most configurations. In Figure 25, we report the estimated expected test revenue and regret achieved by RegretNet for different hidden layer R and hidden unit K configurations.

D Omitted Proofs

We present formal proofs for all theorems and lemmas that are stated in the body of the article or in other appendices. We first introduce some notation. We denote the inner product between vectors $a,b \in {\mathbb {R}}^d$ as $\langle a, b \rangle \,=\, \sum _{i=1}^d a_i b_i$. We denote the $\ell _1$ norm for a vector x by $\Vert x \Vert _1$ and the induced $\ell _{1}$ norm for a matrix $A \in {\mathbb {R}}^{k\times t}$ by $\Vert A\Vert _{1} \,=\, \max _{1\le j \le t} \sum _{i=1}^k A_{ij}$.

D.1 Proof of Lemma 2.1

Let $f_i(v;w) := \max _{v^{\prime }_i \in V_i}\,u^w_i(v_i; (v^{\prime }_i, v_{-i})) - u^w_i(v_i;(v_i, v_{-i}))$. Then we have $\mathit {rgt}_i(w) = {\mathbf {E}}_{v\sim F}[f_i(v;w)]$. Rewriting the expected value, we have

\begin{align*} \mathit {rgt}_i(w) & = \int _{0}^\infty \mathbf {P}(f_i(v;w)\ge x) dx \ge \int _{0}^{\mathit {rgt}^q_i(w)} \mathbf {P}(f_i(v;w)\ge x) dx \ge q\cdot \mathit {rgt}^q_i(w), \end{align*}

where the last inequality holds because for any $0\lt x\lt \mathit {rgt}^q_i(w)$, $\mathbf {P}(f_i(v;w) \ge x) \ge \mathbf {P}(f_i(v;w) \ge \mathit {rgt}^q_i(w)) = q$. $\square$

D.2 Proof of Theorem 2.2

We present the proof for auctions with general, randomized allocation rules. A randomized allocation rule $g_i: V \rightarrow [0,1]^{2^M}$ maps valuation profiles to a vector of allocation probabilities for bidder i, where $g_{i,S}(v) \in [0,1]$ denotes the probability that the allocation rule assigns a subset of items $S \subseteq M$ to bidder i and $\sum _{S \subseteq M}g_{i,S}(v) \le 1$. This encompasses both the allocation rules for the combinatorial setting and the allocation rules for the additive and unit-demand settings, which only output allocation probabilities for individual items. The payment function $p: V \rightarrow R^n$ maps valuation profiles to a payment for each bidder $p_i(v) \in {\mathbb {R}}$. For ease of exposition, we omit the superscripts “w”. Recall that ${\mathcal {M}}$ is a class of auctions consisting of allocation and payment rules $(g,p)$. As noted in the theorem statement, we will assume without loss of generality that for each bidder i, $v_i(S) \le 1,\, \forall S \subseteq M$.

D.2.1 Definitions.

Let $\mathcal {U}_i$ be the class of utility functions for bidder i defined on auctions in ${\mathcal {M}}$—that is,

\begin{align*} \mathcal {U}_i \,=\, \big \lbrace & u_i: V_i \times V \rightarrow {\mathbb {R}}\,\big |\,u_i(v_i, b) \,=\, v_i(g(b)) \,-\, p_i(b) \text{ for some }(g,p) \in {\mathcal {M}}\big \rbrace . \end{align*}

In addition, let $\mathcal {U}$ be the class of profile of utility functions defined on ${\mathcal {M}}$—that is, the class of tuples $(u_1,\ldots ,u_n)$ where each $u_i: V_i\times V \rightarrow {\mathbb {R}}$ and $u_i(v_i, b) \,=\, v_i(g(b)) \,-\, p_i(b), \forall i \in N$ for some $(g,p) \in {\mathcal {M}}$.

We will sometimes find it useful to represent the utility function as an inner product—that is, treating $v_i$ as a real-valued vector of length $2^M$, we may write ${u}_i(v_i, b) = \langle v_i, {g}_{i}(b)\rangle - {p}_i(b)$.

Let $\mathrm{rgt}\circ \mathcal {U}_i$ be the class of all regret functions for bidder i defined on utility functions in $\mathcal {U}_i$, such that

\begin{align*} \mathrm{rgt}\,\circ \, \mathcal {U}_i = \Big \lbrace & f_i: V \rightarrow {\mathbb {R}}\,\Big |\, f_i(v) \,=\, \max _{v^{\prime }_i} u_i(v_i, (v^{\prime }_i, v_{-i})) \,-\, u_i(v_i, v)\text{ for some } u_i \in \mathcal {U}_i \Big \rbrace , \end{align*}

and as before, let $\mathrm{rgt}\,\circ \, \mathcal {U}$ be defined as the class of profiles of regret functions.

Define the $\ell _{\infty , 1}$ distance between two utility functions u and $u^{\prime }$ as

\begin{align*} \max _{v, v^{\prime }} \sum _i \vert u_i(v_i,(v^{\prime }_i,v_{-i})) - u_i(v_i,(v^{\prime }_i,v_{-i}))\vert , \end{align*}

and let $\mathcal {N}_\infty (\mathcal {U}, \epsilon)$ denote the minimum number of balls of radius $\epsilon$ to cover $\mathcal {U}$ under this distance. Similarly, define the distance between $u_i$ and $u^{\prime }_i$ as $\max _{v, v^{\prime }_i}\vert u_i(v_i,(v^{\prime }_i, v_{-i})) - u^{\prime }_i(v_i,(v^{\prime }_i, v_{-i}))\vert$, and let $\mathcal {N}_\infty (\mathcal {U}_i, \epsilon)$ denote the minimum number of balls of radius $\epsilon$ to cover $\mathcal {U}_i$ under this distance. Similarly, we define covering numbers $\mathcal {N}_\infty (\mathrm{rgt}\,\circ \,\mathcal {U}_i, \epsilon)$ and $\mathcal {N}_\infty (\mathrm{rgt}\,\circ \,\mathcal {U}, \epsilon)$ for the function classes $\mathrm{rgt}\,\circ \,\mathcal {U}_i$ and $\mathrm{rgt}\,\circ \,\mathcal {U,}$ respectively.

Moreover, we denote the class of allocation functions as $\mathcal {G}$ and for each bidder i, $\mathcal {G}_i \,=\, \lbrace g_i: V\rightarrow 2^M\,|\,g \in \mathcal {G}\rbrace$. Similarly, we denote the class of payment functions by $\mathcal {P}$ and $\mathcal {P}_i \,=\, \lbrace p_i: V\rightarrow {\mathbb {R}}\,|\,p \in \mathcal {P}\rbrace$. We denote the covering number of $\mathcal {P}$ as $\mathcal {N}_\infty (\mathcal {P}, \epsilon)$ under the $\ell _{\infty ,1}$ distance and the covering number for $\mathcal {P}_i$ using $\mathcal {N}_\infty (\mathcal {P}_i, \epsilon)$ under the $\ell _{\infty , 1}$ distance.

D.2.2 Auxiliary Lemma.

We will use a lemma from Shalev-Shwartz and Ben-David [2014]. Let $\mathcal {F}$ denote a class of bounded functions $f: {Z} \rightarrow [-c,c]$ defined on an input space Z, for some $c\gt 0$. Let D be a distribution over Z and $\mathcal {S} \,=\, \lbrace z_1,\ldots ,z_L\rbrace$ be a sample drawn i.i.d. from D. We are interested in the gap between the expected value of a function f and the average value of the function on sample S, and would like to bound this gap uniformly for all functions in $\mathcal {F}$. For this, we measure the capacity of the function class $\mathcal {F}$ using the empirical Rademacher complexity on sample S, defined next:

\begin{equation*} \hat{\mathcal {R}}_{L}(\mathcal {F}):= \frac{1}{L}{\mathbf {E}}_{\sigma }\left[\sup _{f\in \mathcal {F}}\sum _{z_i\in S}\sigma _i f(z_i)\right], \end{equation*}

where $\sigma \in \lbrace -1,1\rbrace ^L$ and each $\sigma _i$ is drawn i.i.d from a uniform distribution on $\lbrace -1,1\rbrace$. We then have the following lemma.

Lemma D.1 ([Shalev-Shwartz and Ben-David 2014]).

Let $\mathcal {S} \,=\, \lbrace z_1,\ldots ,z_L\rbrace$ be a sample drawn i.i.d. from some distribution D over Z. Then with probability of at least $1-\delta$ over draw of $\mathcal {S}$ from D, for all $f\in \mathcal {F}$,

\begin{align*} {\mathbf {E}}_{z\sim D}[f(z)] \le \frac{1}{L}\sum _{\ell =1}^L f(z_\ell) \,+\, 2\hat{\mathcal {R}}_{L}(\mathcal {F}) \,+\, 4c\sqrt {\frac{2\log (4/\delta)}{L}}. \end{align*}

D.2.3 Generalization Bound for Revenue.

We first prove the generalization bound for revenue. For this, we define the following auxiliary function class, where each $f: V \rightarrow {\mathbb {R}}_{\ge 0}$measures the total payments from some mechanism in ${\mathcal {M}}$:

\begin{equation*} \mathrm{rev}\circ {\mathcal {M}}\,=\, \big \lbrace f: V \rightarrow {\mathbb {R}}_{\ge 0}\,\big |\, f(v) \,=\, \displaystyle \sum _{i=1}^n p_i(v) \text{ for some }(g,p) \in {\mathcal {M}}\big \rbrace . \end{equation*}

Note that each function f in this class corresponds to a mechanism $(g,p)$ in ${\mathcal {M}}$, and the expected value ${\mathbf {E}}_{v \sim D}[f(v)]$ gives the expected revenue from that mechanism. The proof then follows by an application of the uniform convergence bound in Lemma D.1 to the preceding function class and by further bounding the Rademacher complexity term in this bound by the covering number of the auction class ${\mathcal {M}}$.

Applying Lemma D.1 to the auxiliary function class $\mathrm{rev}\circ {\mathcal {M}}$, we get with probability of at least $1-\delta$ over draw of L valuation profiles S from D, for any $f \in \mathrm{rev}\circ {\mathcal {M}}$, there exists a distribution-independent constant $C \gt 0$ such that

\begin{align} {\mathbf {E}}_{v\sim F}\Big [-\sum _{i\in N}p_i(v)\Big ] \le & ~-\frac{1}{L}\sum _{\ell =1}^L \sum _{i=1}^n p_i(v^{(\ell)})\nonumber \nonumber\\ & ~+ 2\hat{R}_L(\mathrm{rev}\circ {\mathcal {M}}) ~+ Cn\sqrt {\frac{\log (1/\delta)}{L}}. \end{align}

(18)

All that remains is to bound the preceding empirical Rademacher complexity $\hat{R}_L(\mathrm{rev}\circ {\mathcal {M}})$ in terms of the covering number of the payment class $\mathcal {P}$ and in turn in terms of the covering number of the auction class ${\mathcal {M}}$. Since we assume that the auctions in ${\mathcal {M}}$ satisfy individual rationality and $v(S) \le 1, \forall S \subseteq M$, we have for any v, $p_i(v)\le 1$.

By the definition of the covering number for the payment class, there exists a cover $\hat{\mathcal {P}}$ for $\mathcal {P}$ of size $|\hat{\mathcal {P}}|\le \mathcal {N}_\infty (\mathcal {P}, \epsilon)$ such that for any $p\in \mathcal {P}$, there is a ${f_p}\in \hat{\mathcal {P}}$ with $\max _v\sum _i\vert p_i(v)-{f_p}_i(v)\vert \le \epsilon$. We thus have

\begin{align} \hat{\mathcal {R}}_L(\mathrm{rev}\circ {\mathcal {M}}) = & ~\frac{1}{L}{\mathbf {E}}_\sigma \left[\sup _p\sum _{\ell =1}^{L}\sigma _\ell \cdot \sum _i p_i(v^{(\ell)})\right]\nonumber \nonumber\\ = & ~\frac{1}{L}{\mathbf {E}}_\sigma \left[\sup _p\sum _{\ell =1}^{L}\sigma _\ell \cdot \sum _i {f_p}_i(v^{(\ell)})\right] + \frac{1}{L}{\mathbf {E}}_\sigma \left[\sup _p\sum _{\ell =1}^{L}\sigma _\ell \cdot \sum _i p_i(v^{(\ell)}) - {f_p}_i(v^{(\ell)})\right] \nonumber \nonumber\\ \le & ~\frac{1}{L}{\mathbf {E}}_\sigma \left[\sup _{\hat{p}\in \hat{\mathcal {P}}}\sum _{\ell =1}^{L}\sigma _\ell \cdot \sum _i \hat{p}_i(v^{(\ell)})\right] + \frac{1}{L}{\mathbf {E}}_\sigma \Vert \sigma \Vert _1 \epsilon \nonumber \nonumber\\ \le & ~\sqrt {\sum _{\ell } \left(\sum _i \hat{p}_i(v^{\ell })\right)^2} \sqrt {\frac{2\log (\mathcal {N}_\infty (\mathcal {P}, \epsilon))}{L}} + \epsilon \nonumber \nonumber\\ \le & ~2n\sqrt {\frac{2\log (\mathcal {N}_\infty (\mathcal {P}, \epsilon))}{L}} + \epsilon , \end{align}

(19)

where the second-last inequality follows from Massart’s lemma, and the last inequality holds because

\begin{equation*} \sqrt {\sum _{\ell } \left(\sum _i \hat{p}_i(v^{\ell })\right)^2}\le \sqrt {\sum _{\ell } \left(\sum _i p_i(v^{\ell }) + n\epsilon \right)^2} \le 2n\sqrt {L}. \end{equation*}

We further observe that $\mathcal {N}_\infty (\mathcal {P}, \epsilon) \le \mathcal {N}_\infty (\mathcal {M}, \epsilon)$. By the definition of the covering number for the auction class ${\mathcal {M}}$, there exists a cover $\hat{{\mathcal {M}}}$ for $\mathcal {M}$ of size $|\hat{{\mathcal {M}}}| \le \mathcal {N}_\infty (\mathcal {M}, \epsilon)$ such that for any $(g, p)\in \mathcal {M}$, there is a $(\hat{g}, \hat{p}) \in \hat{{\mathcal {M}}}$ such that for all v,

\begin{align*} \sum _{i,j} \vert g_{ij}(v) - \hat{g}_{ij}(v)\vert + \sum _i\vert p_i(v) - \hat{p}_i(v)\vert \le \epsilon . \end{align*}

This also implies that $\sum _i\vert p_i(v) - \hat{p}_i(v)\vert \le \epsilon$ and shows the existence of a cover for $\mathcal {P}$ of size at most $\mathcal {N}_\infty (\mathcal {M}, \epsilon)$.

Substituting the bound on the Rademacher complexity term in (19) in (18) and using the fact that $\mathcal {N}_\infty (\mathcal {P}, \epsilon) \le \mathcal {N}_\infty (\mathcal {M}, \epsilon)$, we get

\begin{align*} {\mathbf {E}}_{v\sim F}\Big [-\sum _{i\in N}p_i(v)\Big ] \le & -\frac{1}{L}\sum _{\ell =1}^L \sum _{i=1}^n p_i(v^{(\ell)})+ 2\cdot \inf _{\epsilon \gt 0}\Big \lbrace \epsilon + 2n\sqrt {\frac{2\log ({\mathcal {N}}_\infty ({\mathcal {M}}, \epsilon))}{L}}\Big \rbrace + Cn\sqrt {\frac{\log (1/\delta)}{L}}\,, \end{align*}

which completes the proof.

D.2.4 Generalization Bound for Regret.

We move to the second part, namely a generalization bound for regret, which is the more challenging part of the proof. We first define the class of sum regret functions:

\begin{equation*} \overline{\mathrm{rgt}}\,\circ \,\mathcal {U} = \left\lbrace f: V \rightarrow {\mathbb {R}}\,\bigg |\, f(v) \,=\, \sum _{i=1}^n r_i(v) \text{ for some } (r_1, \ldots , r_n) \in \mathrm{rgt} \circ \mathcal {U} \right\rbrace . \end{equation*}

The proof then proceeds in three steps:

(1)

Bounding the covering number for each regret class $\mathrm{rgt}\circ \mathcal {U}_i$ in terms of the covering number for individual utility classes $\mathcal {U}_i.$

(2)

Bounding the covering number for the combined utility class $\mathcal {U}$ in terms of the covering number for ${\mathcal {M}}.$

(3)

Bounding the covering number for the sum regret class $\overline{\mathrm{rgt}}\,\circ \,\mathcal {U}$ in terms of the covering number for the (combined) utility class ${\mathcal {M}}$.

An application of Lemma D.1 then completes the proof. We prove each of the above steps bnext.

Step 1.

$\mathcal {N}_\infty (\mathrm{rgt}\circ \mathcal {U}_i, \epsilon) \le \mathcal {N}_\infty (\mathcal {U}_i, \epsilon /2)$.

Proof.

By the definition of covering number $\mathcal {N}_\infty (\mathcal {U}_i, \epsilon)$, there exists a cover $\hat{{\mathcal {U}}}_i$ with size at most $\mathcal {N}_\infty (\mathcal {U}_i, \epsilon /2)$ such that for any $u_i\in \mathcal {U}_i$, there is a $\hat{u}_i \in \hat{\mathcal {U}}_i$ with

\begin{align*} \sup _{v, v^{\prime }_i} \vert u_i(v_i, (v^{\prime }_i, v_{-i})) - \hat{u}_i(v_i, (v^{\prime }_i, v_{-i}))\vert \le \epsilon /2. \end{align*}

For any $u_i \in {\mathcal {U}}_i$, taking $\hat{u}_i\in \hat{\mathcal {U}}_i$ satisfying the preceding condition, then for any v,

\begin{align*} & \bigg \vert \max _{v^{\prime }_i\in V}\big (u_i(v_i, (v^{\prime }_i, v_{-i})) - u_i(v_i, (v_i, v_{-i}))\big) - \max _{\bar{v}_i\in V}\big (\hat{u}_i(v_i, (\bar{v}_i, v_{-i})) - \hat{u}_i(v_i, (v_i, v_{-i}))\big)\bigg \vert \\ \le ~ & \bigg \vert \max _{v^{\prime }_i}u_i(v_i, (v^{\prime }_i, v_{-i})) - \max _{\bar{v}_i}\hat{u}_i(v_i, (\bar{v}_i, v_{-i})) + \hat{u}_i(v_i, (v_i, v_{-i})) - u_i(v_i, (v_i, v_{-i}))\bigg \vert \\ \le ~ & \left|\max _{v^{\prime }_i}u_i(v_i, (v^{\prime }_i, v_{-i})) - \max _{\bar{v}_i}\hat{u}_i(v_i, (\bar{v}_i, v_{-i}))\right|+ \bigg \vert \hat{u}_i(v_i, (v_i, v_{-i})) - u_i(v_i, (v_i, v_{-i}))\bigg \vert \\ \le ~ & \left|\max _{v^{\prime }_i}u_i(v_i, (v^{\prime }_i, v_{-i})) - \max _{\bar{v}_i}\hat{u}_i(v_i, (\bar{v}_i, v_{-i}))\right|+ \epsilon /2. \end{align*}

Let $v^*_i \in \arg \max _{v^{\prime }_i} u_i(v_i, (v^{\prime }_i, v_{-i}))$ and $\hat{v}^*_i \in \arg \max _{\bar{v}_i} \hat{u}_i(v_i, (\bar{v}_i, v_{-i}))$, then

\begin{equation*} \begin{aligned}\max _{v^{\prime }_i}u_i(v_i, (v^{\prime }_i, v_{-i})) & = u_i(v^*_i, v_{-i}) \le \hat{u}_i(v^*_i, v_{-i}) + \epsilon /2 \le \hat{u}_i(\hat{v}^*_i, v_{-i}) + \epsilon /2 = \max _{\bar{v}_i}\hat{u}_i(v_i, (\bar{v}_i, v_{-i})) + \epsilon ,\\ \max _{\bar{v}_i}\hat{u}_i(v_i, (\bar{v}_i, v_{-i})) & = \hat{u}_i(\hat{v}^*_i, v_{-i}) \le u_i(\hat{v}^*_i, v_{-i}) + \epsilon /2 \le u_i(v^*_i, v_{-i}) + \epsilon /2 = \max _{v^{\prime }_i}u_i(v_i, (v^{\prime }_i, v_{-i})) + \epsilon /2\,. \end{aligned} \end{equation*}

Thus, for all $u_i \in \mathcal {U}_i$, there exists $\hat{u}_i \in \hat{\mathcal {U}}_i$ such that for any valuation profile v,

\begin{align*} & \bigg \vert \max _{v^{\prime }_i}\big (u_i(v_i, (v^{\prime }_i, v_{-i})) - u_i(v_i, (v_i, v_{-i}))\big) - \max _{\bar{v}_i}\big (\hat{u}_i(v_i, (\bar{v}_i, v_{-i})) - \hat{u}_i(v_i, (v_i, v_{-i}))\big)\bigg \vert \le \epsilon , \end{align*}

which implies $\mathcal {N}_\infty (\mathrm{rgt}\circ \mathcal {U}_i, \epsilon) \le \mathcal {N}_\infty (\mathcal {U}_i, \epsilon /2)$.

This completes the proof of Step 1. □

Step 2.

For all $i\in N$, $\mathcal {N}_\infty (\mathcal {U}, \epsilon)\le \mathcal {N}_\infty (\mathcal {M}, \epsilon /n)$.

Proof.

Recall that the utility function of bidder i is $u_i(v_i, (v^{\prime }_i, v_{-i})) = \langle v_i, g_{i}(v^{\prime }_i, v_{-i})\rangle - p_i(v^{\prime }_i, v_{-i})$. There exists a set $\hat{\mathcal {M}}$ with $|\hat{\mathcal {M}}|\le \mathcal {N}_\infty (\mathcal {M}, \epsilon /n)$ such that there exists $(\hat{g}, \hat{p}) \in \hat{M}$ such that

\begin{align*} \sup _{v\in V}\sum _{i,j} \vert g_{ij}(v) - \hat{g}_{ij}(v)\vert + \Vert p(v) - \hat{p}(v)\Vert _1 \le \epsilon /n. \end{align*}

We denote $\hat{u}_i(v_i, (v^{\prime }_i, v_{-i})) = \langle v_i, \hat{g}_{i}(v^{\prime }_i, v_{-i})\rangle - \hat{p}_i(v^{\prime }_i, v_{-i})$, where we treat $v_i$ as a real-valued vector of length $2^M$.

For all $v \in V, v^{\prime }_i \in V_i$,

\[\begin{eqnarray*} {\left|u_i(v_i, (v^{\prime }_i, v_{-i})) - \hat{u}_i(v_i, (v^{\prime }_i, v_{-i}))\right|}\\ & \le & \left|\langle v_i, g_{i}(v^{\prime }_i, v_{-i})\rangle - \langle v_i, \hat{g}_{i}(v^{\prime }_i, v_{-i})\rangle \right|+ \left|p_i(v^{\prime }_i, v_{-i}) -\hat{p}_i(v^{\prime }_i, v_{-i}) \right|\\ & \le & \Vert v_i\Vert _\infty \cdot \Vert g_{i}(v^{\prime }_i, v_{-i}) - \hat{g}_{i}(v^{\prime }_i, v_{-i})\Vert _1 + \left|p_i(v^{\prime }_i, v_{-i}) -\hat{p}_i(v^{\prime }_i, v_{-i}) \right|\\ & \le & \sum _j \vert g_{ij}(v^{\prime }_i, v_{-i}) - \hat{g}_{ij}(v^{\prime }_i, v_{-i})\vert + \left|p_i(v^{\prime }_i, v_{-i}) -\hat{p}_i(v^{\prime }_i, v_{-i}) \right|\\ &\le & {\epsilon / n.} \end{eqnarray*}\]

Therefore, for any $u\in \mathcal {U}$, take $\hat{u} =(\hat{g}, \hat{p})\in \hat{\mathcal {M}}$, for all $v, v^{\prime }$,

\begin{align*} & \sum _i \vert u_i(v_i, (v^{\prime }_i, v_{-i})) - \hat{u}_i(v_i, (v^{\prime }_i, v_{-i}))\vert \\ & ~\le \sum _{ij} \vert g_{ij}(v^{\prime }_i, v_{-i}) - \hat{g}_{ij}(v^{\prime }_i, v_{-i})\vert + \sum _i\left|p_i(v^{\prime }_i, v_{-i}) -\hat{p}_i(v^{\prime }_i, v_{-i}) \right|\\ & ~\le \epsilon . \end{align*}

This completes the proof of Step 2. □

Step 3.

$\mathcal {N}_\infty (\overline{\mathrm{rgt}}\circ \mathcal {U}, \epsilon)\le \mathcal {N}_\infty ({\mathcal {M}}, \frac{\epsilon }{2n})$.

Proof.

By definition of $\mathcal {N}_\infty (\mathcal {U}, \epsilon)$, there exists $\hat{{\mathcal {U}}}$ with size at most $\mathcal {N}_\infty (\mathcal {U}, \epsilon)$ such that for any $u\in \mathcal {U}$, there exists $\hat{u}$ such that for all $v, v^{\prime } \in V$,

\begin{align*} \sum _{i} \vert u_i(v_i, (v^{\prime }_i, v_{-i})) - \hat{u}_i(v_i, (v^{\prime }_i, v_{-i}))\vert \le \epsilon . \end{align*}

Therefore, for all $v \in V$, $\vert \sum _i u_i(v_i, (v^{\prime }_i, v_{-i})) - \sum _i \hat{u}_i(v_i, (v^{\prime }_i, v_{-i}))\vert \le \epsilon$, from which it follows that $\mathcal {N}_\infty (\overline{\mathrm{rgt}}\circ \mathcal {U}, \epsilon)\le \mathcal {N}_\infty (\mathrm{rgt}\circ \mathcal {U}, \epsilon)$. Following Step 1, it is easy to show $\mathcal {N}_\infty (\mathrm{rgt}\circ \mathcal {U}, \epsilon)\le \mathcal {N}_\infty (\mathcal {U}, \epsilon /2)$.

Together with Step 2, this completes the proof of Step 3. □

Based on the same arguments as in Section D.2.3, we can thus bound the empirical Rademacher complexity as

\begin{align*} \hat{\mathcal {R}}_L(\overline{\mathrm{rgt}}\,\circ \,\mathcal {U}) & \le \inf _{\epsilon \gt 0} \left(\epsilon + 2n\sqrt {\frac{2\log \mathcal {N}_\infty (\overline{\mathrm{rgt}}\circ \mathcal {U}, \epsilon)}{L}}\right)\\ & \le \inf _{\epsilon \gt 0} \left(\epsilon + 2n\sqrt {\frac{2\log \mathcal {N}_\infty (\mathcal {M}, \frac{\epsilon }{2n})}{L}}\right). \end{align*}

Applying Lemma D.1 completes the proof of the generalization bound for regret. $\square$

D.3 Proof of Lemma 3.2

First, given the property of the softmax function and the min operation, $\varphi ^{DS}(s, s^{\prime })$ ensures that the row sums and column sums for the resulting allocation matrix do not exceed 1. In fact, for any doubly stochastic allocation z, there exists scores s and $s^{\prime }$, for which the min of normalized scores recovers z (e.g., $s_{ij} = s^{\prime }_{ij} = log(z_{ij}) + c$ for any $c \in {\mathbb {R}}$). $\square$

D.4 Proof of Lemma A.1

Similar to Lemma 3.2, $\varphi ^{CF}(s, s^{(1)},\ldots ,s^{(m)})$ trivially satisfies the combinatorial feasibility (constraints (14)–(15)). For any allocation z that satisfies the combinatorial feasibility, the following scores

\begin{align*} \forall j~=~ 1,\ldots , m, ~~~~ s_{i, S} & ~=~ s^{(j)}_{i, S} ~=~ \log (z_{i, S}) + c \end{align*}

make $\varphi ^{CF}(s, s^{(1)},\ldots ,s^{(m)})$ recover z. $\square$

D.5 Proof of Theorem 3.1

In Theorem 3.1, we only show the bounds on $\Delta _L$ for RegretNet with additive and unit-demand bidders. We restate this theorem so that it also bounds $\Delta _L$ for the general combinatorial valuations setting (with combinatorial feasible allocation). Recall that the $\ell _1$ norm for a vector x is denoted by $\Vert x \Vert _1,$ and the induced $\ell _{1}$ norm for a matrix $A \in {\mathbb {R}}^{k\times t}$ is denoted by $\Vert A\Vert _{1} \,=\, \max _{1\le j \le t} \sum _{i=1}^k A_{ij}$.

Theorem D.1.

For RegretNet with R hidden layers, K nodes per hidden layer, $d_g$ parameters in the allocation component, $d_p$ parameters in the payment component, and the vector of all model parameters $\Vert w\Vert _{1} \le W$, the following are the bounds on the term $\Delta _{L}$ for different bidder valuation types:

(a)

additive valuations:

$\Delta _{L} \le O\big (\sqrt {R(d_g+d_p) \log (LW\max \lbrace K, mn\rbrace) / {L} } \big)$,

(b)

unit-demand valuations:

$\displaystyle \Delta _{L} \le O\big (\sqrt {R(d_g+d_p) \log (LW\max \lbrace K, mn\rbrace) / {L} }\big)$,

(c)

combinatorial valuations (with combinatorial feasible allocation):

$\displaystyle \Delta _{L} \le O\big (\sqrt {R(d_g+d_p) \log (LW\max \lbrace K, n\,2^m\rbrace) / {L}} \big)$.

We first bound the covering number for a general feed-forward neural network and specialize it to the three architectures we present in Section 3 and Appendix A.2.

Lemma D.2.

Let $\mathcal {F}_k$ be a class of feed-forward neural networks that maps an input vector $x \in {\mathbb {R}}^{d_0}$ to an output vector $y \in {\mathbb {R}}^{d_k}$, with each layer $\ell$ containing $T_\ell$ nodes and computing $z \mapsto \phi _\ell (w^\ell z)$, where each $w^\ell \in \mathbb {R}^{T_{\ell } \times T_{\ell -1}}$ and $\phi _\ell : {\mathbb {R}}^{T_\ell } \rightarrow [-B, +B]^{T_\ell }$. Further, for each network in $\mathcal {F}_k$, let the parameter matrices $\Vert w^\ell \Vert _{1} \le W$ and $\Vert \phi _\ell (s) - \phi _\ell (s^{\prime })\Vert _1 \le \Phi \Vert s-s^{\prime }\Vert _1$ for any $s, s^{\prime } \in {\mathbb {R}}^{T_{\ell -1}}$.

\begin{align*} \mathcal {N}_\infty (\mathcal {F}_k, \epsilon) \le \left\lceil \frac{2Bd^2 W(2\Phi W)^k}{\epsilon } \right\rceil ^d, \end{align*}

where $T = \max _{\ell \in [k]}T_\ell$ and d is the total number of parameters in a network.

Proof.

We shall construct an $\ell _{1,\infty }$ cover for $\mathcal {F}_k$ by discretizing each of the d parameters along $[-W, +W]$ at scale $\epsilon _0/d$, where we will choose $\epsilon _0 \gt 0$ at the end of the proof. We will use $\hat{\mathcal {F}}_k$ to denote the subset of neural networks in $\mathcal {F}_k$ whose parameters are in the range $\lbrace -(\lceil Wd/\epsilon _0 \rceil -1)\,\epsilon _0/d, \ldots , -\epsilon _0/d, 0, \epsilon _0/d,\ldots , \lceil Wd/\epsilon _0 \rceil \epsilon _0/d\rbrace$. The size of $\hat{\mathcal {F}}_k$ is at most $\lceil 2dW/\epsilon _0\rceil ^d$. We shall now show that $\hat{\mathcal {F}}_k$ is an $\epsilon$-cover for ${\mathcal {F}}_k$.

We use mathematical induction on the number of layers k. We wish to show that for any $f \in \mathcal {F}_k,$ there exists a $\hat{f} \in \hat{\mathcal {F}}_k$ such that

\begin{equation*} \Vert f(x) - \hat{f}(x)\Vert _1 \le Bd\epsilon _0(2\Phi W)^k. \end{equation*}

For $k = 0$, the statement holds trivially. Assume that the statement is true for $\mathcal {F}_k$. We now show that the statement holds for $\mathcal {F}_{k+1}$.

A function $f \in \mathcal {F}_{k+1}$ can be written as $f(z) = \phi _{k+1}(w_{k+1} H(z))$ for some $H \in \mathcal {F}_{k}$. Similarly, a function $\hat{f} \in \hat{\mathcal {F}}_{k+1}$ can be written as $\hat{f}(z) = \phi _{k+1}(\hat{w}_{k+1} \hat{H}(z))$ for some $\hat{H} \in \hat{\mathcal {F}}_{k}$ and $\hat{w}_{k+1}$ is a matrix of entries in $\lbrace -(\lceil Wd/\epsilon _0 \rceil -1)\,\epsilon _0/d, \ldots , -\epsilon _0/d, 0, \epsilon _0/d,\ldots , \lceil Wd/\epsilon _0 \rceil \epsilon _0/d\rbrace$. In addition, for any parameter matrix $w^\ell \in \mathbb {R}^{T_{\ell } \times T_{\ell -1}}$, there is a matrix $\hat{w}^\ell$ with discrete entries s.t.

\begin{equation} \Vert w_\ell - \hat{w}_\ell \Vert _{1} = \max _{1\le j\le T_{\ell -1}}\sum _{i=1}^{T_\ell } \vert w^\ell _{\ell ,i,j} - \hat{w}_{\ell ,i,j}\vert \le T_\ell \epsilon _0/d \le \epsilon _0. \end{equation}

(20)

We then have

\begin{align*} & \Vert f(x) - \hat{f}(x)\Vert _1 \\ & \quad = \Vert \phi _{k+1}(w_{k+1} H(x)) - \phi _{k+1}(\hat{w}_{k+1}\hat{H}(x))\Vert _1 \\ & \quad \le \Phi \Vert w_{k+1} H(x) - \hat{w}_{k+1}\hat{H}(x)\Vert _1 \\ & \quad \le \Phi \Vert w_{k+1} H(x) - w_{k+1}\hat{H}(x)\Vert _1 + \Phi \Vert w_{k+1} \hat{H}(x) - \hat{w}_{k+1}\hat{H}(x)\Vert _1 \\ & \quad \le \Phi \Vert w_{k+1}\Vert _{1} \cdot \Vert H(x) - \hat{H}(x)\Vert _1 + \Phi \Vert w_{k+1} - \hat{w}_{k+1}\Vert _{1} \cdot \Vert \hat{H}(x)\Vert _1 \\ & \quad \le \Phi W \Vert H(x) - \hat{H}(x)\Vert _1 + \Phi {T_k B}\Vert w_{k+1} - \hat{w}_{k+1}\Vert _{1} \\ & \quad \le Bd\epsilon _0 \Phi W(2\Phi W)^k + \Phi Bd\epsilon _0 \\ & \quad \le Bd\epsilon _0 (2\Phi W)^{k+1}, \end{align*}

where the second line follows from our assumption on $\phi _{k+1}$, and the sixth line follows from our inductive hypothesis and from (20). By choosing $\epsilon _0 =\frac{\epsilon }{B(2\Phi W)^k}$, we complete the proof. □

We next bound the covering number of the auction class in terms of the covering number for the class of allocation networks and for the class of payment networks. Recall that the payment networks compute a fraction $\alpha : {\mathbb {R}}^{m(n+1)}\rightarrow [0, 1]^n$ and compute a payment $p_i(b) = \alpha _i(b)\cdot \langle v_i, g_i(b)\rangle$ for each bidder i. Let $\mathcal {G}$ be the class of allocation networks and $\mathcal {A}$ be the class of fractional payment functions used to construct auctions in $\mathcal {M}$. Let $\mathcal {N}_\infty (\mathcal {G}, \epsilon)$ and $\mathcal {N}_\infty (\mathcal {\mathcal {A}}, \epsilon)$ be the corresponding covering numbers w.r.t. the $\ell _\infty$ norm. Then we have the following lemma.

Lemma D.3.

$\mathcal {N}_\infty (\mathcal {M}, \epsilon)\le \mathcal {N}_\infty (\mathcal {G}, \epsilon /3)\cdot \mathcal {N}_\infty (\mathcal {A}, \epsilon /3)$

Proof.

Let $\hat{{\mathcal {G}}} \subseteq \mathcal {G}$, $\hat{\mathcal {A}} \subseteq \mathcal {A}$ be $\ell _\infty$ covers for ${\mathcal {G}}$ and $\mathcal {A}$—that is, for any $g \in {\mathcal {G}}$ and $\alpha \in \mathcal {A}$, there exists $\hat{g} \in \hat{{\mathcal {G}}}$ and $\hat{\alpha } \in \hat{\mathcal {A}}$ with

\begin{align} \sup _b \sum _{i,j}\vert g_{ij}(b) - \hat{g}_{ij}(b)\vert \le \epsilon /3, \end{align}

(21)

\begin{align} \sup _b\sum _{i}\vert \alpha _i(b) - \hat{\alpha }_i(b)\vert \le \epsilon /3. \end{align}

(22)

We now show that the class of mechanism $\hat{{\mathcal {M}}} \,=\, \lbrace (\hat{g},\hat{\alpha })\,|\, \hat{g} \in \hat{{\mathcal {G}}},\, \text{and}\, \hat{p}(b) \,=\, \hat{\alpha }_i(b)\cdot \langle v_i, \hat{g}_i(b)\rangle \rbrace$ is an $\epsilon$-cover for ${\mathcal {M}}$ under the $\ell _{1,\infty }$ distance. For any mechanism in $(g,p) \in {\mathcal {M}}$, let $(\hat{g}, \hat{p}) \in \hat{{\mathcal {M}}}$ be a mechanism in $\hat{{\mathcal {M}}}$ that satisfies (22). We have

\begin{align*} & \sum _{i,j}\vert g_{ij}(b) - \hat{g}_{ij}(b)\vert + \sum _{i}\vert p_i(b) - \hat{p}_i(b)\vert \\ & \quad \le \epsilon /3 + \sum _i \left|\alpha _i(b) \cdot \langle b_i, g_{i,\cdot }(b)\rangle - \hat{\alpha }_i(b)\cdot \langle b_i, \hat{g}_i(b)\rangle \right|\\ & \quad \le \epsilon /3 + \sum _i \Big (\vert (\alpha _i(b) - \hat{\alpha }_i(b))\cdot \langle b_i, g_{i}(b)\rangle \vert \\ & \ + \vert \hat{\alpha }_i(b) \cdot (\langle b_i, g_i(b)\rangle -\langle b_i, \hat{g}_{i,\cdot }(b))\rangle \vert \Big) \\ & \quad \le \epsilon /3 + \sum _i\vert \alpha _i(b) - \hat{\alpha }_i(b)\vert + \sum _i \Vert b_i\Vert _{\infty }\cdot \Vert g_i(b) - \hat{g}_i(b)\Vert _1 \\ & \quad \le 2\epsilon /3 + \sum _{i,j}\vert g_{ij}(b) - \hat{g}_{ij}(b)\vert \le \epsilon , \end{align*}

where in the third inequality we use $\langle b_i, g_{i}(b)\rangle \le 1$. The size of the cover $\hat{{\mathcal {M}}}$ is $|\hat{{\mathcal {G}}}| |\hat{\mathcal {A}}|$, which completes the proof. □

We are now ready to prove covering number bounds for the three architectures in Section 3 and Appendix A.2.

Proof of Theorem D.1

All three architectures use the same feed-forward architecture for computing fractional payments, consisting of K hidden layers with tanh activation functions. We also have by our assumption that the $\ell _1$ norm of the vector of all model parameters is at most W, for each $\ell = 1,\ldots ,R+1$, $\Vert w_\ell \Vert _1 \le W$. Using that fact that the tanh activation functions are 1-Lipschitz and bounded in $[-1,1]$, and there are at most $\max \lbrace K, n\rbrace$ number of nodes in any layer of the payment network, we have by an application of Lemma D.2 the following bound on the covering number of the fractional payment networks $\mathcal {A}$ used in each case:

\begin{align*} \mathcal {N}_\infty (\mathcal {A}, \epsilon) \le \left\lceil \frac{\max (K, n)^2 (2W)^{R+1}}{\epsilon } \right\rceil ^{d_p}, \end{align*}

where $d_p$ is the number of parameters in payment networks.

For the covering number of allocation networks $\mathcal {G}$, we consider each architecture separately. In each case, we bound the Lipschitz constant for the activation functions used in the layers of the allocation network and followed by an application of Lemma D.2. For ease of exposition, we omit the dummy scores used in the final layer of neural network architectures.

Additive Bidders . The output layer computes n allocation probabilities for each item j using a softmax function. The activation function $\phi _{R+1}:{\mathbb {R}}^n \rightarrow {\mathbb {R}}^n$ for the final layer for input $s \in {\mathbb {R}}^{n\times m}$ can be described as $\phi _{R+1}(s) = [\mathrm{softmax}(s_{1,1},$ $\ldots , s_{n,1}), \ldots , \mathrm{softmax}(s_{1,m},\ldots ,s_{n,m})]$, where $\mathrm{softmax}: {\mathbb {R}}^n \rightarrow [0,1]^n$ is defined for any $u \in {\mathbb {R}}^n$ as $\mathrm{softmax}_i(u) \,=\, e^{u_i} / \sum _{k=1}^n e^{u_k}$.

We then have for any $s, s^{\prime } \in {\mathbb {R}}^{n\times m}$,

\begin{align} & \Vert \phi _{R+1}(s) - \phi _{R+1}(s^{\prime })\Vert _1 \nonumber \nonumber\\ & \quad {\le } \sum _{j} \left\Vert \mathrm{softmax}(s_{1,j},\ldots ,s_{n,j}) - \mathrm{softmax}(s^{\prime }_{1,j},\ldots ,s^{\prime }_{n,j})\right\Vert _1 \nonumber \nonumber\\ & \quad \le \sqrt {n} \sum _{j} \left\Vert \mathrm{softmax}(s_{1,j},\ldots ,s_{n,j}) - \mathrm{softmax}(s^{\prime }_{1,j},\ldots ,s^{\prime }_{n,j})\right\Vert _2 \nonumber \nonumber\\ & \quad \le \sqrt {n}\frac{\sqrt {n-1}}{n} \sum _{j} \sqrt {\sum _{i} \Vert s_{ij} - s^{\prime }_{ij}\Vert ^2} \nonumber \nonumber\\ & \quad \le \sum _{j}\sum _{i} |s_{ij} - s^{\prime }_{ij}|, \end{align}

(23)

where the third step follows by bounding the Frobenius norm of the Jacobian of the softmax function.

The hidden layers $\ell = 1,\ldots ,R$ are standard feed-forward layers with tanh activations. Since the tanh activation function is 1-Lipschitz, $\Vert \phi _\ell (s) - \phi _\ell (s^{\prime })\Vert _1 \le \Vert s-s^{\prime }\Vert _1$. We also have by our assumption that the $\ell _1$ norm of the vector of all model parameters is at most W, for each $\ell = 1,\ldots ,R+1$, $\Vert w_\ell \Vert _1 \le W$. Moreover, the output of each hidden layer node is in $[-1,1]$, the output layer nodes is in $[0,1]$, and the maximum number of nodes in any layer (including the output layer) is at most $\max \lbrace K, mn\rbrace$.

By an application of Lemma D.2 with $\Phi =1$, $B=1$, and $d = \max \lbrace K,mn\rbrace ,$ we have

\begin{equation*} \displaystyle \mathcal {N}_\infty (\mathcal {G}, \epsilon)\le \left\lceil \frac{\max \lbrace K, mn\rbrace ^2 (2W)^{R+1}}{\epsilon } \right\rceil ^{d_g}, \end{equation*}

where $d_g$ is the number of parameters in allocation networks.

Unit-Demand Bidders . The output layer n allocation probabilities for each item j as an element-wise minimum of two softmax functions. The activation function $\phi _{R+1}:{\mathbb {R}}^2n \rightarrow {\mathbb {R}}^n$ for the final layer for two sets of scores $s, \bar{s} \in {\mathbb {R}}^{n\times m}$ can be described as

\begin{equation*} \phi _{R+1,i,j}(s,s^{\prime }) = \min \lbrace \textrm {softmax}_{j}(s_{i,1}, \ldots , s_{i,m}), \, \textrm {softmax}_{i}(s^{\prime }_{1,j}, \ldots , s^{\prime }_{n,j})\rbrace . \end{equation*}

We then have for any $s, \tilde{s}, s^{\prime }, \tilde{s}^{\prime } \in {\mathbb {R}}^{n\times m}$,

\begin{align*} & \Vert \phi _{R+1}(s,\tilde{s}) - \phi _{R+1}(s^{\prime },\tilde{s}^{\prime })\Vert _1 \\ & \quad {\le }\, \sum _{i,j} \Big | \min \lbrace \textrm {softmax}_{j}(s_{i,1}, \ldots , s_{i,m}), \, \textrm {softmax}_{i}(\tilde{s}_{1,j}, \ldots , \tilde{s}_{n,j})\rbrace \\ & \hspace{42.67912pt} \,-\, \min \lbrace \textrm {softmax}_{j}(s^{\prime }_{i,1}, \ldots , s^{\prime }_{i,m}), \, \textrm {softmax}_{i}(\tilde{s}^{\prime }_{1,j}, \ldots , \tilde{s}^{\prime }_{n,j})\rbrace \Big |\\ & \quad \le \, \sum _{i,j} \Big | \max \lbrace \textrm {softmax}_{j}(s_{i,1}, \ldots , s_{i,m}) \,-\, \textrm {softmax}_{j}(s^{\prime }_{i,1}, \ldots , s^{\prime }_{i,m}),\\ & \hspace{82.51282pt} \textrm {softmax}_{i}(\tilde{s}_{1,j}, \ldots , \tilde{s}_{n,j}) \,-\, \textrm {softmax}_{i}(\tilde{s}^{\prime }_{1,j}, \ldots , \tilde{s}^{\prime }_{n,j})\rbrace \Big |\\ & \quad \le \, \sum _{i} \big \Vert \textrm {softmax}(s_{i,1}, \ldots , s_{i,m}) \,-\, \textrm {softmax}(s^{\prime }_{i,1}, \ldots , s^{\prime }_{i,m}) \big \Vert _1\\ & \hspace{42.67912pt} \,+\, \sum _{j} \big \Vert \textrm {softmax}(\tilde{s}_{1,j}, \ldots , \tilde{s}_{n,j}) \,-\, \textrm {softmax}(\tilde{s}^{\prime }_{1,j}, \ldots , \tilde{s}^{\prime }_{n,j})\rbrace \big \Vert _1\\ & \quad \le \, \sum _{i,j} |s_{ij} - s^{\prime }_{ij}| \,+\, \sum _{i,j} |\tilde{s}_{ij} - \tilde{s}^{\prime }_{ij}|, \end{align*}

where the last step can be derived in the same way as (23).

As with additive bidders, using additionally the fact that hidden layers $\ell = 1,\ldots ,R$ are standard feed-forward layers with tanh activations, we have from Lemma D.2 with $\Phi =1$, $B=1$ and $d = \max \lbrace K,mn\rbrace$,

\begin{equation*} \mathcal {N}_\infty (\mathcal {G}, \epsilon)\le \left\lceil \frac{\max \lbrace K, mn\rbrace ^2 (2 W)^{R+1}}{\epsilon } \right\rceil ^{d_g}. \end{equation*}

Combinatorial Bidders. The output layer outputs an allocation probability for each bidder i and bundle of items $S \subseteq M$. The activation function $\phi _{R+1}:{\mathbb {R}}^{(m+1)n2^m} \rightarrow {\mathbb {R}}^{n2^m}$ for this layer for $m+1$ sets of scores $s, s^{(1)},\ldots , s^{(m)} \in {\mathbb {R}}^{n\times 2^m}$ is given by

\begin{align*} \phi _{R+1,i,S}(s, s^{(1)}, \ldots , s^{(m)}) =\, \min \Big \lbrace & \textrm {softmax}_{S}(s_{i,S^{\prime }}: S^{\prime } \subseteq M),\,\textrm {softmax}_{S}(s^{(1)}_{i,S^{\prime }}: S^{\prime } \subseteq M),\, \ldots ,\\ & \textrm {softmax}_{S}(s^{(m)}_{i,S^{\prime }}: S^{\prime } \subseteq M) \Big \rbrace , \end{align*}

where $\textrm {softmax}_{S}(a_{S^{\prime }}: S^{\prime } \subseteq M) \,=\, e^{a_S}/ \sum _{S^{\prime } \subseteq M} e^{a_{S^{\prime }}}$.

We then have for any $s, s^{(1)}, \ldots , s^{(m)}, s^{\prime }, s^{\prime (1)}, \ldots , s^{\prime (m)} \in {\mathbb {R}}^{n\times 2^m}$,

\begin{align*} &{ \Vert \phi _{R+1}(s, s^{(1)}, \ldots , s^{(m)}) - \phi _{R+1}(s^{\prime }, s^{\prime (1)}, \ldots , s^{\prime (m)})\Vert _1 } \\ & {\le }\, \sum _{i,S} \Big | \min \Big \lbrace \textrm {softmax}_{S}(s_{i,S^{\prime }}: S^{\prime } \subseteq M),\,\\ & \ \textrm {softmax}_{S}(s^{(1)}_{i,S^{\prime }}: S^{\prime } \subseteq M),\, \ldots , \textrm {softmax}_{S}(s^{(m)}_{i,S^{\prime }}: S^{\prime } \subseteq M) \Big \rbrace \\ & \hspace{25.6073pt} \,-\, \min \Big \lbrace \textrm {softmax}_{S}(s^{\prime }_{i,S^{\prime }}: S^{\prime } \subseteq M),\,\\ & \ \textrm {softmax}_{S}(s^{\prime (1)}_{i,S^{\prime }}: S^{\prime } \subseteq M),\, \ldots , \textrm {softmax}_{S}(s^{\prime (m)}_{i,S^{\prime }}: S^{\prime } \subseteq M) \Big \rbrace \Big | \\ & \le \, \sum _{i,S} \max \Big \lbrace \big | \textrm {softmax}_{S}(s_{i,S^{\prime }}: S^{\prime } \subseteq M) \,-\, \textrm {softmax}_{S}(s^{\prime }_{i,S^{\prime }}: S^{\prime } \subseteq M) \big |,\\ & \hspace{36.98866pt} \big | \textrm {softmax}_{S}(s^{(1)}_{i,S^{\prime }}: S^{\prime } \subseteq M) \,-\, \textrm {softmax}_{S}(s^{\prime (1)}_{i,S^{\prime }}: S^{\prime } \subseteq M) \big |, \,\ldots \,\\ & \hspace{36.98866pt} \big | \textrm {softmax}_{S}(s^{(m)}_{i,S^{\prime }}: S^{\prime } \subseteq M) \,-\, \textrm {softmax}_{S}(s^{\prime (m)}_{i,S^{\prime }}: S^{\prime } \subseteq M) \big | \Big \rbrace \\ & \le \, \sum _{i} \big \Vert \textrm {softmax}(s_{i,S^{\prime }}: S^{\prime } \subseteq M) \,-\, \textrm {softmax}(s^{\prime }_{i,S^{\prime }}: S^{\prime } \subseteq M) \big \Vert _1\\ & \hspace{34.14322pt} \,+\, \sum _{i,j} \big \Vert \textrm {softmax}(s^{(j)}_{i,S^{\prime }}: S^{\prime } \subseteq M) \,-\, \textrm {softmax}(s^{\prime (j)}_{i,S^{\prime }}: S^{\prime } \subseteq M) \big \Vert _1 \\ & \le \, \sum _{i,S} |s_{i,S} - s^{\prime }_{i,S}| \,+\, \sum _{i,j,S} |s^{(j)}_{i,S} - s^{\prime (j)}_{i, S}|, \end{align*}

where the last step can be derived in the same way as (23).

\begin{equation*} \mathcal {N}_\infty (\mathcal {G}, \epsilon)\le \left\lceil \frac{\max \lbrace K,n\cdot 2^m\rbrace ^2 (2 W)^{R+1}}{\epsilon } \right\rceil ^{d_g}, \end{equation*}

where $d_g$ is the number of parameters in allocation networks. □

We now bound $\Delta _{L}$ for the three architectures using the covering number bounds we derived previously. In particular, we upper bound the the ‘inf’ over $\epsilon \gt 0$ by substituting a specific value of $\epsilon$:

(a)

For additive bidders, choosing $\epsilon = \frac{1}{\sqrt {L}}$, we get

\begin{equation*} \Delta _{L} \le O\left(\sqrt {R(d_p+d_g)\frac{\log (W\max \lbrace K, mn\rbrace L)}{L}}\right). \end{equation*}

(b)

For unit-demand bidders, choosing $\epsilon =\frac{1}{\sqrt {L}}$, we get

\begin{equation*} \Delta _{L} \le O\left(\sqrt {R(d_p+d_g)\frac{\log ((W\max \lbrace K, mn\rbrace L)}{L}}\right). \end{equation*}

(c)

For combinatorial bidders, choosing $\epsilon =\frac{1}{\sqrt {L}}$, we get

\begin{equation*} \Delta _{L} \le O\left(\sqrt {R(d_p+d_g)\frac{\log (W\max \lbrace K, n\cdot 2^m\rbrace L)}{L}}\right). \end{equation*}

$\square$

D.6 Proof of Theorem 5.1

We apply the duality theory of Daskalakis et al. [2013] to verify the optimality of our proposed mechanism (motivated by empirical results of RochetNet). For the completeness of presentation, we provide a brief introduction of their approach here.

Fig. 26.

Let $f(v)$ be the joint valuation distribution of $v=(v_1, v_2, \ldots , v_m)$ and V be the support of $f(v),$ and define the measure $\mu$ with the following density,

\[\begin{eqnarray} {\mathbb {I}}_{v=\bar{v}} + {\mathbb {I}}_{v\in \partial V}\cdot f(v)(v\cdot \hat{n}(v)) - (\nabla f(v)\cdot v + (m+1)f(v)), \end{eqnarray}\]

(24)

where $\bar{v}$ is the “base valuation” (i.e., $u(\bar{v})=0$), $\partial V$ denotes the boundary of V, $\hat{n}(v)$ is the outer unit normal vector at point $v\in \partial V$, and m is the number of items. Let $\Gamma _+(X)$ be the unsigned (Radon) measures on X. Consider an unsigned measure $\gamma \in \Gamma _+(X\times X)$, let $\gamma _1$ and $\gamma _2$ be the two marginal measures of $\gamma$—that is, $\gamma _1(A) = \gamma (A\times X)$ and $\gamma _2(A) = \gamma (X\times A)$ for all measurable sets $A\subseteq X$. We say measure $\alpha$ dominates $\beta$ if and only if for all (non-decreasing, convex) functions u, $\int u\,d\alpha \ge \int u\,d\beta$. Then by strong duality theory, we have

\begin{equation} {\sup _u \int _V u\,d\mu = \inf _{\gamma \in \Gamma _+(V, V), \gamma _1 - \gamma _2\succeq \mu }\int _{V\times V}\Vert v - v^{\prime }\Vert _1 \, d\gamma ,} \end{equation}

(25)

and both the supremum and infimum are achieved. Based on “complementary slackness” of LP, the optimal solution of Equation (25) needs to satisfy the following conditions.

Corollary D.1 ([Daskalakis et al. 2017]).

Let $u^*$ and $\gamma ^*$ be feasible for their respective problems in Equation (25), then $\int u^*\,d\mu = \int \Vert v - v^{\prime }\Vert _1\,d\gamma ^*$ if and only if the following two conditions hold:

\begin{equation*} \begin{aligned}& \int u^*\,d(\gamma ^*_1 - \gamma ^*_2) = \int u^*\,d\mu \\ & \int u^*(v) - u^*(v^{\prime }) = \Vert v - v^{\prime }\Vert _1, \gamma ^*\text{-almost surely.} \end{aligned} \end{equation*}

Then we prove the utility function $u^*$ induced by the mechanism for Setting 1 is optimal. Here we only focus on Setting 1 with $c \gt 1$, for $c \le 1$ the proof is analogous and we omit it here.¹⁰ The transformed measure $\mu$ of the valuation distribution is composed of

(1)

A point mass of $+1$ at $(0,1)$.

(2)

Mass $-3$ uniformly distributed throughout the triangle area (density $-\frac{6}{c}$).

(3)

Mass $-2$ uniformly distributed on lower edge of triangle (density $-\frac{2}{c}$).

(4)

Mass $+4$ uniformly distributed on right-upper edge of triangle (density $+\frac{4}{\sqrt {1+c^2}}$).

It is straightforward to verify that $\mu (R_1)=\mu (R_2)=\mu (R_3)=0$. We will show there exists an optimal measure $\gamma ^*$ for the dual program of Theorem 2 (Equation (5)) in the work of Daskalakis et al. [2013]. $\gamma ^*$ can be decomposed into $\gamma ^* =\gamma ^{R_1} + \gamma ^{R_2} + \gamma ^{R_3}$ with $\gamma ^{R_1} \in \Gamma _+(R_1\times R_1), \gamma ^{R_2} \in \Gamma _+(R_2\times R_2), \gamma ^{R_3} \in \Gamma _+(R_3\times R_3)$. We will show the feasibility of $\gamma ^*$ such that

\begin{equation} \begin{array}{ccc} \gamma ^{R_1}_1 - \gamma ^{R_1}_2 \succeq \mu |_{R_1}; & \gamma ^{R_2}_1 - \gamma ^{R_2}_2 \succeq \mu |_{R_2}; & \gamma ^{R_3}_1 - \gamma ^{R_3}_2 \succeq \mu |_{R_3}. \end{array} \end{equation}

(26)

Then we show that the conditions of Corollary 1 of Daskalakis et al. [2013] hold for each of the measures $\gamma ^{R_1}, \gamma ^{R_2}, \gamma ^{R_3}$ separately such that $\int u^* d(\gamma ^A_1 - \gamma ^A_2) = \int _A u^*d\mu$ and $u^*(v) - u^*(v^{\prime }) = \Vert v - v^{\prime }\Vert _1$ hold $\gamma ^{A}$ almost surely for any $A=R_1, R_2,$ and $R_3$. We visualize the transport of measure $\gamma ^*$ in Figure 26.

Construction of $\gamma ^{R_1}$. $\mu _+|_{R_1}$ is concentrated on a single point $(0,1)$ and $\mu _-|_{R_1}$ is distributed throughout a region which is coordinate-wise greater than $(0,1)$, then it is obviously to show $0\succeq \mu |_{R_1}$. We set $\gamma ^{R_1}$ to be zero measure, and we get $\gamma ^{R_1}_1-\gamma ^{R_1}_2 = 0 \succeq \mu |_{R_1}$. In addition, $u^*(v)=0, \forall v\in R_1$, then the conditions in Corollary 1 of Daskalakis et al. [2013] hold trivially.

Construction of $\gamma ^{R_2}$. $\mu _+|_{R_2}$ is uniformly distributed on upper edge CF of the triangle, and $\mu _-|_{R_2}$ is uniformly distributed in $R_2$. Since we have $\mu (R_2) = 0$, we construct $\gamma ^{R_2}$ by “transporting” $\mu _+|_{R_2}$ into $\mu _-|_{R_2}$ downward—that is $\gamma ^{R_2}_1 = \mu _+|_{R_2}, \gamma ^{R_2}_2=\mu _-|_{R_2}$. Therefore, $\int u^* d(\gamma ^{R_2}_1 - \gamma ^{R_2}_2) = \int u^* d\mu$ holds trivially. The measure $\gamma ^{R_2}$ is only concentrated on the pairs $(v, v^{\prime })$ such that $v_1 = v^{\prime }_1, v_2 \ge v^{\prime }_2$. Thus, for such pairs $(v^{\prime } v^{\prime })$, we have $u^*(v) - u^*(v^{\prime }) = (\frac{v_1}{c} + v_2 - \frac{4}{3}) - (\frac{v_1}{c} + v^{\prime }_2 - \frac{4}{3}) = ||v - v^{\prime }||_1$.

Construction of $\gamma ^{R_3}$. It is intricate to directly construct $\gamma ^{R_3}$ analytically; however, we will show that the optimal measure $\gamma ^{R_3}$ only transports mass from $\mu _+|_{R_3}$ to $\mu _-|_{R_3}$ leftward and downward. Let us consider a point H on edge BF with coordinates $(v^H_1, v^H_2)$. Define the regions $R^H_{L}=\lbrace v^{\prime }\in R_3|v^{\prime }_1\le v^H_1\rbrace$ and $R^H_U=\lbrace v^{\prime }\in R_3|v^{\prime }_2 \ge v^H_2\rbrace$. Let $\ell (\cdot)$ represent the length of segment, then we have $\ell (FH) \lt \frac{2}{3\sqrt {c^2+1}}$. Thus,

\begin{align*} \mu (R^H_U) & = \frac{4\ell (FH)}{\sqrt {c^2 +1}} - \frac{6}{c}\cdot \frac{\ell ^2(FH)c}{2(c^2+1)} = \frac{\ell (FH)}{\sqrt {c^2 +1}}\cdot \left(4- \frac{3\ell (FH)}{\sqrt {c^2+1}}\right) \gt 0 \\ \mu (R^H_L) & = \frac{4\ell (FH)}{\sqrt {c^2 +1}} - \frac{2}{c}\cdot \frac{\ell (FH)c}{\sqrt {c^2+1}} - \frac{6}{c}\cdot \left(\frac{2\ell (FH)c}{3\sqrt {c^2+1}} - \frac{\ell ^2(FH)c}{2(c^2+1)}\right) \\ & = \frac{\ell (FH)}{\sqrt {c^2+1}}\cdot \left(\frac{3\ell (FH)}{\sqrt {c^2+1}}-2\right) \lt 0. \end{align*}

Thus, there exists a unique line $l_H$ with positive slope that intersects H and separate $R_3$ into two parts, $R^H_U$ (above $l_H$) and $R^H_B$ (below $l_H$) such that $\mu _+(R^H_U) = \mu _-(R^H_U)$. We will then show for any two points H and I on edge BF that lines $l_H$ and $l_I$ will not intersect inside $R_3$. In Figure 26, on the contrary, we assume $l_H=HK$ and $l_I =IJ$ intersect inside $R_3$. Given the definition of $l_H$ and $l_I$, we have

\[\begin{eqnarray*} \mu _+(FHKD) = \mu _-(FHKD); & \mu _+(FIJD) = \mu _-(FIJD). \end{eqnarray*}\]

Since $\mu _+$ is only distributed along the edge BF, we have

\[\begin{eqnarray*} \mu _+(FIKD) = \mu _+(FIJD) = \mu _-(FIJD). \end{eqnarray*}\]

Notice that $\mu _-$ is only distributed inside $R_3$ and edge DB, thus $\mu _-(FIKD) \gt \mu _-(FIJD)$. Given the preceding discussion, we have

\begin{equation} \begin{aligned}\mu _+(HIK) & = \mu _+(FIJD) - \mu _+(FHKD) = \mu _-(FIJD) - \mu _-(FHKD) \\ & \lt \mu _-(FIKD) - \mu _-(FHKD) = \mu _-(HIK). \end{aligned} \end{equation}

(27)

However, let $S(HIK)$ be the area of triangle HIK, DG be the altitude of triangle DBF w.r.t. BF, and h be the altitude of triangle HJK w.r.t. the base HI:

\begin{align*} \mu _-(HJK) & = \frac{6}{c}\cdot S(HIK) = \frac{6}{c}\cdot \frac{1}{2} \ell (HI)h \le \frac{3}{c}\cdot \ell (HI)\cdot \ell (DG) \\ & =\frac{3}{c} \cdot \frac{2c}{3\sqrt {c^2+1}}\cdot \ell (HI) = \frac{2}{\sqrt {c^2+1}}\cdot \ell (HI) \\ & \lt \frac{2}{\sqrt {c^2+1}}\cdot \ell (HI) = \mu _+(HIK), \end{align*}

which is a contradiction of Equation (27). Thus, we show that $l_H$ and $l_I$ do not intersect inside $R_3$. Let $\gamma ^{R_3}$ be the measure that transport mass from $\mu _+|_{R_3}$ to $\mu _-|_{R_3}$ through lines $\lbrace l_H|H\in BF\rbrace$. Then we have $\gamma ^{R_3}_1 = \mu _+|_{R_3}, \gamma ^{R_3}_2 = \mu _-|_{R_3}$, which leads to $\int u^*d(\gamma ^{R_3}_1 - \gamma ^{R_3}_2) =\int u^* d\mu$. The measure $\gamma ^{R_3}$ is only concentrated on the pairs $(v, v^{\prime })$, s.t. $v_1 \ge v^{\prime }_1$ and $v_2 \ge v^{\prime }_2$. Therefore, for such pairs $(v, v^{\prime })$, we have $u^*(v) - u^*(v^{\prime }) = (v_1 + v_2 - \frac{c}{3}-1) - (v^{\prime }_1 + v^{\prime }_2 - \frac{c}{3}-1) = (v_1 - v^{\prime }_1) + (v_2 - v^{\prime }_2) = ||v-v^{\prime }||_1$.

Finally, we show there must exist an optimal measure $\gamma$ for the dual program of Theorem 2 in the work of Daskalakis et al. [2013]. $\square$

References

[1]

Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. 2019. A convergence theory for deep learning via over-parameterization. In Proceedings of the 36th International Conference on Machine Learning. 242–252.

Abstract

1 Introduction

1.1 Our Contribution

1.2 Discussion

1.3 Further Related Work

1.4 Organization

2 Auction Design as A Learning Problem

2.1 Preliminaries

2.2 Formulation as a Learning Problem

2.2.1 Characterization-Based Approach.

2.2.2 Characterization-Free Approach.

2.3 Quantile-Based Regret

2.4 Generalization Bounds

3 Neural Network Architectures

3.1 The RochetNet Architecture

3.2 The RegretNet Architecture

3.2.1 Additive Valuations.

3.2.2 Unit-Demand Valuations.

3.3 Covering Number Bounds

4 Training the Networks

5 Experimental Results

5.1 Setup

5.2 Evaluation

5.3 The Manelli-Vincent and Pavlov Auctions

5.4 The Straight-Jacket Auction

5.5 Discovering New Analytical Results

5.6 Experiments with Optimal Mechanisms That Require an Infinitely Sized Menu

5.7 Scaling Up

5.8 Comparison to LP

6 Conclusion

Acknowledgments

Footnotes

A Additional Architectures

A.1 The MyersonNet Approach

A.1.1 Modeling Monotone Transforms.

A.1.2 Modeling SPA with Zero Reserve.

A.2 RegretNet for Combinatorial Valuations

A.2.1 RegretNet for Two-Item Auctions with Implementable Allocations.

B Additional Experiments

B.1 Experiments with MyersonNet

B.2 Additional Experiments with RochetNet and RegretNet

B.3 Experiments with RegretNet with Combinatorial Valuations

B.4 Experiments with RochetNet with Varying Linear Units

B.5 Additional Experiments with Discovering New Analytical Results

C Comparison to LP

D Omitted Proofs

D.1 Proof of Lemma 2.1

D.2 Proof of Theorem 2.2

D.2.1 Definitions.

D.2.2 Auxiliary Lemma.

D.2.3 Generalization Bound for Revenue.

D.2.4 Generalization Bound for Regret.

D.3 Proof of Lemma 3.2

D.4 Proof of Lemma A.1

D.5 Proof of Theorem 3.1

D.6 Proof of Theorem 5.1

References

Cited By

Index Terms

Recommendations

Simple, Credible, and Approximately-Optimal Auctions

Simple Auctions with Simple Strategies

Efficiency of Non-Truthful Auctions in Auto-bidding: The Power of Randomization

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options