[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Dual subgradient algorithms for large-scale nonsmooth learning problems

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

“Classical” First Order (FO) algorithms of convex optimization, such as Mirror Descent algorithm or Nesterov’s optimal algorithm of smooth convex optimization, are well known to have optimal (theoretical) complexity estimates which do not depend on the problem dimension. However, to attain the optimality, the domain of the problem should admit a “good proximal setup”. The latter essentially means that (1) the problem domain should satisfy certain geometric conditions of “favorable geometry”, and (2) the practical use of these methods is conditioned by our ability to compute at a moderate cost proximal transformation at each iteration. More often than not these two conditions are satisfied in optimization problems arising in computational learning, what explains why proximal type FO methods recently became methods of choice when solving various learning problems. Yet, they meet their limits in several important problems such as multi-task learning with large number of tasks, where the problem domain does not exhibit favorable geometry, and learning and matrix completion problems with nuclear norm constraint, when the numerical cost of computing proximal transformation becomes prohibitive in large-scale problems. We propose a novel approach to solving nonsmooth optimization problems arising in learning applications where Fenchel-type representation of the objective function is available. The approach is based on applying FO algorithms to the dual problem and using the accuracy certificates supplied by the method to recover the primal solution. While suboptimal in terms of accuracy guaranties, the proposed approach does not rely upon “good proximal setup” for the primal problem but requires the problem domain to admit a Linear Optimization oracle—the ability to efficiently maximize a linear form on the domain of the primal problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. This is a natural normalization: indeed, \(\Vert e_j\Vert \ll 1\) means that \(j\)th coordinate of \(x\) has a large Lipschitz constant w.r.t. \(\Vert \cdot \Vert \), in spite of the fact that this coordinate is “perfectly well behaved” on \(X\)—its variation on the set is just 2.

  2. We assume here that \(g_\tau \ne 0\) for all \(\tau \le t\). In the opposite case, the situation is trivial: when \(g(y_{\tau _*})=0\), for some \(\tau *\le t\), setting \(\lambda ^t_\tau =0\) for \(\tau \ne \tau _*\) and \(\lambda ^t_{\tau _*}=1\), we ensure that \(\epsilon (y^t,\lambda ^t)=0\).

  3. We assume that \(f'(y_\tau )\ne 0\), for \(\tau \le t\); otherwise, as we remember, the situation is trivial.

  4. Note that to ensure the nonemptiness of \(Y^s_0\), it suffices to set \(h_{0,j}^s(\cdot )=h^s(\cdot )\), so that \(h_{0,j}(y)>\ell _s\) for \(y\in \mathop {\hbox {Argmax}}_Yh^s(\cdot )\); recall that \(f_s=\max _{y\in Y} h^s(y)>0\).

  5. In typical applications, \(\psi \) is just linear, so that computing \(y(x)\) is as easy as computing the value of the prox-mapping associated with \(Y\), \(\omega (\cdot )\).

  6. If our only goal were to approximate \(f_*\) by a smooth concave function, we could use other options, most notably the famous Moreau–Yosida regularization [14, 25] . The advantage of the smoothing based on Fenchel-type representation (1) of \(f_*\) and proximal setup for \(Y\) is that in many cases it indeed allows for computationally cheap approximations, which is usually not the case for Moreau–Yosida regularization.

  7. Provided the parameters \(\gamma \in (0,1)\), \(\theta \in (0,1)\) in (29), (37) are treated as absolute constants.

  8. The restriction \(R\ge 1\) is quite natural. Indeed, with the optimal choice of \(x\), we want most of the terms \(\left[ 1 -\epsilon _j\left[ \langle x,z_j\rangle +b\right] \right] _+\) to be \(\ll 1\); assuming that the number of examples with \(\epsilon _j=-1\) and \(\epsilon _j=1\) are of order of \(N\), this condition can be met only when \(|\langle x,z_j\rangle |\) are at least of order of 1 for most of \(j\)’s. The latter, in view of (48), implies that \(\Vert \sigma (x)\Vert _{1}\) should be at least \(O(1)\).

  9. To arrive at this situation, one can augment \(z_j\) by additional entry, equal to 1, and to redefine \(x\): the new \(x\) is the old \([x,b]\).

  10. In our straightforward software implementation of the algorithm, handling memory of depth \(m\) requires storing in RAM up to \(m+1\) \(p\times p\) matrices, which makes the implementation too space-consuming when \(p\) and \(m\) are large; this is why in our experiments the larger \(p\), the smaller is the allowed values of \(m\). Note that with a more sophisticated software implementation, handling memory \(m\) would require storing just \(m+1\) of rank 1 \(p\times p\) matrices, reducing dramatically the required space.

References

  1. Amit, Y., Fink, M., Srebro, N., Ullman, S.: Uncovering shared structures in multiclass classification. In: Proceedings of the 24th International Conference on Machine Learning, pp. 17–24. ACM (2007)

  2. Ben-Tal, A., Nemirovski, A.: Non-euclidean restricted memory level method for large-scale convex optimization. Math. Program. 102(3), 407–456 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)

    Article  Google Scholar 

  4. Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  5. Combettes, P.L., Dũng, D., Vũ, B.C.: Dualization of signal recovery problems. Set-Valued Var. Anal. 18(3–4), 373–404 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  6. Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)

    MATH  Google Scholar 

  7. Demyanov, V., Rubinov, A.: Approximate Methods in Optimization Problems. Elsevier, Amsterdam (1970)

    Google Scholar 

  8. Dunn, J.C., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2), 432–444 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  9. Fan, J., Liao, Y., Mincheva, M.: High dimensional covariance matrix estimation in approximate factor models. Ann. Stat. 39(6), 3320–3356 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  10. Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956)

    Article  MathSciNet  Google Scholar 

  11. Juditsky, A., Nemirovski, A.: First order methods for nonsmooth large-scale convex minimization, i: General purpose methods; ii: Utilizing problem’s structure. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 121–254. MIT Press, Cambridge, MA (2011)

  12. Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69(1–3), 111–147 (1995)

    Article  MATH  Google Scholar 

  13. Moreau, J.-J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. CR Acad. Sci. Paris Sér. A Math. 255, 2897–2899 (1962)

    MATH  MathSciNet  Google Scholar 

  14. Moreau, J.-J.: Proximité et dualité dans un espace hilbertien. Bulletin de la Société mathématique de France 93, 273–299 (1965)

    MATH  MathSciNet  Google Scholar 

  15. Nemirovski, A.: Prox-method with rate of convergence \( {O}(1/t)\) for variational inequalities with lipschitz continuous monotone operators and smooth convex–concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  16. Nemirovski, A., Onn, S., Rothblum, U.G.: Accuracy certificates for computational problems with convex structure. Math. Oper. Res. 35(1), 52–78 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  17. Nemirovskii, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, Chichester (1983)

    Google Scholar 

  18. Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence \( {O}(1/k^2)\). Soviet Math. Dokl. 27(2), 372–376 (1983)

    MATH  Google Scholar 

  19. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Springer, Berlin (2004)

    Book  Google Scholar 

  20. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  21. Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program. 109(2–3), 319–344 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  22. Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  23. Nesterov, Y., Nemirovski, A.: Some first order algorithms for \(\ell _1\)/nuclear norm minimization. Acta Numerica 22, 509–575 (2013)

    Google Scholar 

  24. Pshenichnyi, B.N., Danilin, Y.M.: Numerical Methods in Extremal Problems. Mir Publishers, Moscow (1978)

    Google Scholar 

  25. Yosida, K.: Functional Analysis. Springer, Berlin (1964)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anatoli Juditsky.

Additional information

Research of the third author was supported by the ONR Grant N000140811104 and the NSF Grants DMS 0914785, CMMI 1232623.

Appendix: Proofs

Appendix: Proofs

We need the following technical result originating from [4] (for proof, see [4], or section 2.1 and Lemma A.2 of [23]).

Lemma 7.1

. Let \(Y\) be a nonempty closed and bounded subset of a Euclidean space \(E_y\), and let \(\Vert \cdot \Vert \), \(\omega (\cdot )\) be the corresponding proximal setup. Let, further, \(U\) be a closed convex subset of \(Y\) intersecting the relative interior of \(Y\), and let \(p\in E_y\).

  1. (i)

    The optimization problem

    $$\begin{aligned} \min _{y\in U} h_p(y):=\left[ \langle p,y\rangle + \omega (y)\right] \end{aligned}$$

    has a unique solution \(y_*\). This solution is fully characterized by the inclusion \(y_*\in U\cap Y^o\), \(Y^o=\{y\in Y:\partial \omega (y)\ne \emptyset \}\), coupled with the relation

    $$\begin{aligned} \langle p+\omega '(y_*),u-y_*\rangle \ge 0\;\;\forall u\in U. \end{aligned}$$
    (63)
  2. (ii)

    When \(U\) is cut off \(Y\) by a system of linear inequalities \(e_i(y)\le 0\), \(i=1,\ldots ,m\), there exist Lagrange multipliers \(\lambda _i\ge 0\) such that \(\lambda _ie_i(y_*)=0,\;1\le i\le m\), and

    $$\begin{aligned} \forall u\in Y:\;\;\left\langle p+\omega '(y_*)+\sum _{i=1}^m\lambda _ie_i^\prime ,u-y_*\right\rangle \ge 0. \end{aligned}$$
    (64)
  3. (iii)

    In the situation of (ii), assuming \(p=\xi -\omega '(y)\) for some \(y\in Y^o\), we have

    $$\begin{aligned} \forall u\in Y: \langle \xi ,y_*-u\rangle -\sum _i\lambda _ie_i(u)\le V_y(u)-V_{y_*}(u)-V_{y}(y_*). \end{aligned}$$
    (65)

1.1 Proof of Proposition 3.2

1\(^0\). Observe that when \(t\) is a non-terminal step of the algorithm, the level set \(U_t\) is a closed convex subset of \(Y\) which intersects the relative interior of \(Y\); indeed, by (25) and due to \(\epsilon _t>0\) for a non-terminal \(t\), there exists \(y\in Y\) such that \(h_\tau (y)\ge \epsilon _t>\ell _t=\gamma \epsilon _t\), \(\tau \in S_t\), that is, \(U_t\) is cut off \(Y\) by a system of constraints satisfying the Slater condition. Denoting \(S_t=\{y_{\tau _1},y_{\tau _2},\ldots ,y_{\tau _k}\}\) and invoking item (iii) of Lemma 7.1 with \(y=\widehat{y}_t\), \(\xi =0\) and \(e_i(\cdot )=\gamma \epsilon _t-h_{\tau _i}(\cdot )\) (so that \(U_t=\{u\in Y: e_i(u)\le 0, 1\le i\le k\}\), we get \(y_*=y_{t+1}\) and

$$\begin{aligned} -\sum _i\lambda _ie_i(u) \le V_{\widehat{y}_t}(u)-V_{y_{t+1}}(u)-V_{\widehat{y}_t}(y_{t+1}). \end{aligned}$$

with some \(\lambda _i\ge 0\). When \(u\in U_t\), we have \(e_i(u)\le 0\), that is,

$$\begin{aligned} \forall u\in U_t: V_{y_{t+1}}(u)\le V_{\widehat{y}_t}(u) - V_{\widehat{y}_t}(y_{t+1}). \end{aligned}$$
(66)

2\(^0\). When \(t\) starts a phase, we have \(\widehat{y}_t=y_\omega =y_1\), and clearly \(1\in I_t\), whence \(h_\tau (\widehat{y}_t)\le 0\) for some \(\tau \in I_t\) (specifically, for \(\tau =1\)). When \(t\) does not start a phase, we have \(\widehat{y}_t=y_t\) and \(t\in I_t\), so that here again \(h_\tau (\widehat{y}_t)\le 0\) for some \(\tau \in I_t\). On the other hand, \(h_\tau (y_{t+1})\ge \gamma \epsilon _t\) for all \(\tau \in I_t\) due to \(y_{t+1}\in U_t\). Thus, when passing from \(\widehat{y}_t\) to \(y_{t+1}\), at least one of \(h_\tau (\cdot )\) grows by at least \(\gamma \epsilon _t\). Taking into account that \(h_\tau (z)=\langle g(y_\tau ),y_\tau -z\rangle \) is Lipschitz continuous with constant \(L[g]\) w.r.t. \(\Vert \cdot \Vert \) (by (14)), we conclude that \(\Vert \widehat{y}_t-y_{t+1}\Vert \ge \gamma \epsilon _t/L[g]\). With this in mind, (66) combines with (8) to imply that

$$\begin{aligned} \forall u\in U_t: V_{y_{t+1}}(u)\le V_{\widehat{y}_t}(u) - {1\over 2}V_{\widehat{y}_t}(y_{t+1})\le V_{\widehat{y}_t}(u)-{\gamma ^2\epsilon _t^2\over 2 L^2[g]}. \end{aligned}$$
(67)

3\(^0\). Let the algorithm perform phase \(s\), let \(t_s\) be the first step of this phase, and \(r\) be another step of the phase. We claim that all level sets \(U_t\), \(t_s\le t\le r\), have a point in common, specifically, (any) \(u\in \mathop {\hbox {Argmax}}_{y\in Y}\min _{\tau \in I_r} h_\tau (y)\). Indeed, since \(r\) belongs to phase \(s\), we have

$$\begin{aligned} \gamma \Delta _s<\epsilon _r=\max _{y\in Y}\min _{\tau \in I_r} h_\tau (y)=\min _{\tau \in I_r} h_\tau (u) \end{aligned}$$

and \(\Delta _s=\epsilon _{t_s}=\max _{y\in Y}\min _{\tau \in I_{t_s}} h_\tau (y)\) (see (25) and the definition of \(\Delta _s\)). Besides this, \(r\) belongs to phase \(s\), and within a phase, sets \(I_t\) extend as \(t\) grows, so that \(I_{t_s}\subset I_t\subset I_r\) when \(t_s\le t\le r\), implying that \(\epsilon _{t_s}\ge \epsilon _{t_s+1}\ge \cdots \ge \epsilon _r\). Thus, for \(t\in \{t_s,t_s+1,\ldots ,r\}\) we have

$$\begin{aligned} \min _{\tau \in I_t}h_\tau (u)\ge \min _{\tau \in I_r}h_\tau (u)\ge \gamma \Delta _s=\gamma \epsilon _{t_s}\ge \gamma \epsilon _t, \end{aligned}$$

implying that \(u\in U_t\).

With the just defined \(u\), let us look at the quantities \(v_t:=V_{\widehat{y}_t}(u)\), \(t_s\le t\le r\). We have \(v_{t_s}\le {1\over 2}\Omega ^2\) due to \(\widehat{y}_{t_s}=y_\omega \) and (11), and

$$\begin{aligned} 0\le v_{t+1}\le v_t -{\gamma ^2\epsilon _t^2\over 2L^2[g]}\le v_t-{\gamma ^4\Delta _s^2\over 2L^2[g]} \end{aligned}$$

when \(t_s\le t<r\) (due to (67) combined with \(\widehat{y}_t=y_t\) when \(t_s<t\le r\)). We conclude that \((r-t_s)\gamma ^4\Delta _s^2\le \Omega ^2L^2[g]\). Thus, the number \(T_s\) of steps of phase \(s\) admits the bound

$$\begin{aligned} T_s\le {\Omega ^2L^2[g]\over \gamma ^4\Delta _s^2}+1\le {2\Omega ^2L^2[g]\over \gamma ^4\Delta _s^2}, \end{aligned}$$
(68)

where the concluding inequality follows from \(\Delta _s\le \Delta _1=\max _{y\in Y} \langle g(y_\omega ),y_\omega -y\rangle \le L[g]\Omega \), see (11), combined with \(\gamma \in (0,1)\).

4\(^0\). Assume that MDL does not terminate in course of first \(T\ge 1\) steps, and let \(s_*\) be the index of the phase to which the step \(T\) belongs. Then \(\Delta _{s_*}>\epsilon \) (otherwise we would terminate not later than at the first step of phase \(s_*\)); and besides this, by construction, \(\Delta _{s+1}\le \gamma \Delta _s\) whenever phase \(s+1\) takes place. Therefore

$$\begin{aligned} T\le \sum _{s=1}^{s_*} T_s&\le {2\Omega ^2L^2[g]\over \gamma ^4}\sum _{r=0}^{s_*-1}\Delta _{s_*-r}^{-2} \le {2\Omega ^2L^2[g]\over \gamma ^4\Delta _{s_*}^2}\sum _{r=0}^{s_*-1}\gamma ^{2r}\nonumber \\&\le {2\Omega ^2L^2[g]\over \gamma ^4(1-\gamma ^2)\Delta _{s_*}^2}\le {2\Omega ^2L^2[g]\over \gamma ^4(1-\gamma ^2)\epsilon ^2}. \end{aligned}$$

\(\square \)

1.2 Proof of Proposition 3.3

Observe that the algorithm can terminate only in the case A of B.2.2.3, and in this case the output is indeed as claimed in Proposition. Thus, all we need to prove is the upper bound (36) on the number of steps before termination.

1\(^0\). Let us bound from above the number of steps at an arbitrary phase \(s\). Assume that phase \(s\) did not terminate in course of the first \(T\) steps, so that \(u_1,\ldots ,u_T\) are well defined. We claim that then

$$\begin{aligned} \Vert u_\tau -u_{\tau +1}\Vert \ge \ell _s/L[g],\,1\le \tau <T. \end{aligned}$$
(69)

Indeed, by construction \(h^s_{\tau -1,m+1}(y):=\langle g(u_\tau ),u_\tau -y\rangle \) is \(\ge \ell _s=\gamma f_s\) when \(y=u_{\tau +1}\) (due to \(u_{\tau +1}\in Y_\tau \)). Since \(\Vert g(u)\Vert _*\le L[g]\) for all \(u\in Y\), (69) follows.

Now let us look at what happens with the quantities \(\omega (u_\tau )\) as \(\tau \) grows. By strong convexity of \(\omega \) we have

$$\begin{aligned} \omega (u_{\tau +1})-\omega (u_\tau )\ge \langle \omega ^\prime (u_\tau ),u_{\tau +1}-u_\tau \rangle +{1\over 2}\Vert u_\tau -u_{\tau +1}\Vert ^2 \end{aligned}$$

The first term in the right hand side is \(\ge 0\), since \(u_\tau \) is the minimizer of \(\omega (\cdot )\) over \(Y^s_{\tau -1}\), while \(u_{\tau +1}\in Y_\tau \subset Y^s_{\tau -1}\). The second term in the right hand side is \(\ge {\ell _s^2\over 2L^2[g]}\) by (69). Thus, \(\omega (u_{\tau +1})-\omega (u_\tau )\ge {\ell _s^2\over 2L^2[g]}\), whence \( \omega (u_T)-\omega (u_1)\ge (T-1){\ell _s^2\over 2L^2[g]}=(T-1){\gamma ^2f_s^2\over 2L^2[g]}. \) Recalling the definition of \(\Omega \) and that \(u_1=y_\omega \), the left hand side in this inequality is \(\le {1\over 2}\Omega ^2\). It follows that whenever phase \(s\) does not terminate in course of the first \(T\) steps, one has \( T\le {\Omega ^2L^2[g]\over \gamma ^2f_s^2}+1, \) that is, the total number of steps at phase \(s\), provided this phase exists, is at most \( T_s={\Omega ^2L^2[g]\over \gamma ^2f_s^2}+2. \) Now, we have

$$\begin{aligned} f_s\le f_1=\max \limits _{y\in Y}\langle g(y_\omega ),y-y_\omega \rangle \le L[g]\max \limits _{x\in Y}\Vert y-y_\omega \Vert \le \Omega L[g] \end{aligned}$$

(recall that \(\Vert g(y)\Vert _*\le L[g]\) and see (11)). Thus

$$\begin{aligned} T_s={\Omega ^2L^2[g]\over \gamma ^2f_s^2}+2\le {(1+2\gamma ^2)\over \gamma ^2}{\Omega ^2 L^2[g]\over f_s^2} \end{aligned}$$

for all \(s\) such that \(s\)th phase exists. By construction, we have \(f_s\ge \epsilon \) and \(f_s\le (\gamma +(1-\gamma )\theta ) f_{s-1}\), whence the method eventually terminates (since \(\gamma +(1-\gamma )\theta <1\)). Assuming that the termination happens at phase \({s_*}\), we have \(f_s\ge (\gamma +(1-\gamma )\theta )^{s-{s_*}}f_{{s_*}}\) when \(1\le s\le {s_*}\), so that the total number of steps is bounded by

$$\begin{aligned}&{\sum _{s=1}^{{s_*}}{(1+2\gamma ^{2})\over \gamma ^{2}}{\Omega ^{2}L^{2}[g]\over f_s^2} \le \sum _{s=1}^{{s_*}} {(1+2\gamma ^{2})\over \gamma ^{2}}{\Omega ^{2}L^{2}[g](\gamma +(1-\gamma )\theta )^{2({s_{*}}-s)}\over f_{{s_*}}^2}}\\&\quad \le \sum _{s=1}^{{s_{*}}} {(1+2\gamma ^{2})\over \gamma ^{2}}{\Omega ^{2}L^{2}[g](\gamma +(1-\gamma )\theta )^{2({s_{*}}-s)}\over \epsilon ^{2}}\\&\quad \le {(1+2\gamma ^{2})\over \gamma ^{2}[1-(\gamma +(1-\gamma )\theta )^{2}]}{\Omega ^{2}L^{2}[g]\over \epsilon ^{2}}, \end{aligned}$$

as claimed. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cox, B., Juditsky, A. & Nemirovski, A. Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 148, 143–180 (2014). https://doi.org/10.1007/s10107-013-0725-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-013-0725-1

Mathematics Subject Classification (2010)

Navigation