[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

How to use expert advice

Published: 01 May 1997 Publication History

Abstract

We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictins. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently know in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.

References

[1]
BIRGE, L., AND MASSART, P. 1993. Rates of convergence for minimum contrast estimators. Prob. Theory Rel. Fields 97, 113-150.
[2]
BLUMER, A., EHRENFEUCHT, A., HAUSSLER, D., AND WARMUTH, M.K. 1989. Learnability and the Vapnik-Chervonenkis dimension. J. ACM 36, 4, 929-965.
[3]
CESA-BIANCHI, N., FREUND, Y., HELMBOLD, D. P., HAUSSLER, D., SCHAPIRE, R. E., AND WARMUTH, M.K. 1993. How to use expert advice. In Proceedings of the 25th Annual ACM Symposium on the Theory of Computing (San Diego, Calif., May 16-18). ACM, New York, pp. 382-391.
[4]
CESA-BIANCHI, N., FREUND, Y., HELMBOLD, D. P., AND WARMUTH, M.K. 1996. On-line prediction and conversion strategies. Mach. Learn., to appear.
[5]
CHUNG, T.H. 1994. Approximate methods for sequential decision making using expert advice. In Proceedings of the 7th Annual ACM Conference on Computational Learning Theory (New Brunswick, N.J., July 12-15). ACM, New York, pp. 183-189.
[6]
COVER, T.M. 1965. Behaviour of sequential predictors of binary sequences. In Transactions of the 4th Prague Conference on Information Theory, Statistical Decision Functions, Random Processes. Publishing House of the Czechoslovak Academy of Sciences.
[7]
COVER, T. M., AND SHANAR, A. 1977. Compound Bayes predictors for sequences with apparent Markov structure. IEEE Trans. Syst. Man Cybernet. SMC-7, 6 (June), 421-424.
[8]
DAWID, A. P. 1984. Statistical theory: The prequential approach. J. Roy. Stat. Soc., Series A, 278-292.
[9]
DAWlD, A. 1991. Prequential analysis, stochastic complexity and Bayesian inference. In Bayesian Statistics, vol. 4. Oxford University Press, pp. 109-125.
[10]
DAWlD, A.P. 1996. Prequential data analysis. Curt. Iss. Stat. Inference, to appear.
[11]
DESANTIS, A., MARKOWSKI, G., AND WEGMAN, M. N. 1988. Learning probabilistic prediction functions. In Proceedings of the 1988 Workshop on Computational Learning Theory. Morgan- Kaufmann, San Mateo, Calif., pp. 312-328.
[12]
FEDER, M., MERHAV, N., AND GUTMAN, M. 1992. Universal prediction of individual sequences. IEEE Trans. Inf. Theory 38, 1258-1270.
[13]
FIAT, A., FOSTER, D., KARLOFF, H., RABANI, Y., RAVID, Y., AND VISWANATHAN, S. 1991a. Competitive algorithms for layered graph traversal. In Proceedings of the 32nd Annual Symposium on Foundations of Computer Science. IEEE, New York, pp. 288-297.
[14]
FIAT, A., KARP, R., LUBY, M., MCGEOCH, L., SLEATOR, D., AND YOUNG, N. 1991b. Competitive paging algorithms. J. Algorithms 12, 688-699.
[15]
FIAT, A., RABANI, Y., AND RAVID, Y. 1994. Competitive k-server algorithms. J. Comput Syst. Sci. 48, 3, 410-428.
[16]
GALAMBOS, J. 1987. The Asymptotic Theory of Extreme Order Statistics, 2nd ed. R. E. Kreiger.
[17]
HANNAN, J. 1957. Approximation to Bayes risk in repeated play. In Contributions to the Theory of Games, vol. 3. Princeton University Press, Princeton, N.J., pp. 97-139.
[18]
HAUSSLER, D., AND BARRON, A. 1992. How well do Bayes methods work for on-line prediction of { + 1, -1 } values? In Proceedings of the 3rd NEC Symposium on Computation and Cognition. SIAM.
[19]
HAUSSLER, D., KEARNS, M., LITTLESTONE, N., AND WARMUTH, M.K. 1991. Equivalence of models for polynomial learnability. Inf. Comput. 95, 129-161.
[20]
HAUSSLER, D., KEARNS, M., AND SCHAPIRE, R. 1994. Bounds on the sample complexity of Bayesian learning using information theory and the VC dimensions. Mach. Learn. 14, 84-114.
[21]
HAUSSLER, D., KIVINEN, J., AND WARMUTH, M. K. 1995. Tight worst-case loss bounds for predicting with expert advice. In Computational Learning Theory: Second European Conference, EuroCOLT '95. Springer-Verlag, New York, pp. 69-83.
[22]
HAUSSLER, D., LITTLESTONE, N., AND WARMUTH, M. K. 1994. Predicting {0, 1}-functions on randomly drawn points. Inf. Comput. 115, 2, 248-292.
[23]
HAYKIN, S. 1994. Neural Networks: A Comprehensive Foundation. MacMillan, New York.
[24]
HELMBOLD, U., AND WARMUTH, M.K. 1995. On weak learning. J. Comput. Syst. Sci. 50, 3 (June), 551-573.
[25]
KEARNS, M. J., AND SCHAPIRE, R. E. 1994. Efficient distribution-free learning of probabilistic concepts. J. Comput. Syst. Sci. 48, 3, 464-497.
[26]
KEARNS, M. J., SCHAPIRE, R. E., AND NELLIE, L. M. 1994. Toward efficient agnostic learning. Mach. Learn. 17, 115-141.
[27]
KIVINEN, J., AND WARMUTH, M.K. 1994. Using experts for predicting continuous outcomes. In Computational Learning Theory: EuroCOLT '93. Springer-Verlag, New York, pp. 109-120.
[28]
LITTLESTONE, N. 1989. From on-line to batch learning. In Proceedings of the 2nd Annual Workshop ~on Computational Learning Theory. Morgan-Kaufmann, San Mateo, Calif., pp. 269-284.
[29]
LITTLESTONE, N., LONG, P. M., AND WARMUTH, M.K. 1995. On-line learning of linear functions. Computat. Complexity 5, 1, 1-23.
[30]
LITTLESTONE, N., AND WARMUTH, M.K. 1994. The weighted majority algorithm. Inf. Comput. 108, 2, 212-261.
[31]
MERHAV, N., AND FEDER, M. 1993. Universal schemes for sequential decision for individual data sequences. IEEE Trans. Inf. Theory, 39, 4, 1280-1292.
[32]
RISSANEN, J. 1978. Modeling by shortest data description. Automatica 14, 465-471.
[33]
RISSANEN, J. 1986. Stochastic complexity and modeling. Ann. Star. 14, 3, 1080-1100.
[34]
RISSANEN, J., AND LANGDON, G. G., JR. 1981. Universal modeling and coding. IEEE Trans. Inf. Theory IT-27, 1 (Jan.), 12-23.
[35]
SEUNG, n. S., SOMPOLINSKY, n., AND TISHBY, N. 1992. Statistical mechanics of learning from examples. Phys. Rev A 45, 8, 6056-6091.
[36]
SHTARKOV, YU. M. 1975. Coding of discrete sources with unknown statistics. In Topics in Information Theory. North-Holland, Amsterdam, The Netherlands, pp. 559-574.
[37]
SHTARKOV, YU. M. 1987. Universal sequential coding of single messages. Prob. Inf. Transm. 23 (July-September), 175-186.
[38]
SOMPOLINSKY, H., TISHBY, N., AND SEUNG, H. S. 1992. Learning from examples in large neural networks. Phys. Rev. Lett. 65, 1683-1686.
[39]
STONE, C. J. 1977. Cross-validation: A review. Math. Operationforsch. Statist. Ser Statist. 9, 127-139.
[40]
TALAGRAND, M. 1994. Sharper bounds for Gaussian and empirical processes. Ann. Prob. 22, 1, 28-76.
[41]
VALIANT, L.G. 1984. A theory of the learnable. Commun. ACM 27, 11 (Nov.), 1134-1142.
[42]
VAPNIK, V.N. 1982. Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York.
[43]
VAPNIK, V. 1992. Principles of risk minimization for learning theory. In Advances in Neural Information Processing Systems, vol. 4. John E. Moody, Steve J. Hanson, and Richard P. Lippman, eds. Morgan Kaufmann, San Mateo, Calif.
[44]
VOVK, V.G. 1990. Aggregating strategies. In Proceedings of the 3rd Annual Workshop on Computational Learning Theory. Morgan-Kaufmann, San Mateo, Calif., pp. 371-383.
[45]
VOVK, V.G. 1992. Universal forecasting algorithms. Inf. Comput. 96, 2 (Feb.), 245-277.
[46]
VOVK, V.G. 1993. A logic of probability, with application to the foundations of statistics. J. Roy. Statis Soc, Ser. B-Methodolical 55, 2, 317-351.
[47]
YAMANISHI, K. 1995. A loss bound model for on-line stochastic prediction algorithms. Inf. Comput. 119, 1, 39-54.

Cited By

View all
  • (2025)Bayesian Predictive Synthesis with Outcome-Dependent PoolsStatistical Science10.1214/24-STS95440:1Online publication date: 1-Jan-2025
  • (2024)Non-stationary online convex optimization with arbitrary delaysProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694115(49991-50011)Online publication date: 21-Jul-2024
  • (2024)Monotone individual fairnessProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692200(3266-3283)Online publication date: 21-Jul-2024
  • Show More Cited By

Recommendations

Reviews

Giuseppina Carla Gini

While the title of this paper suggests the use of traditional expert systems technology, the research reported here concerns how to make predictions when a team of experts is available and their predictions are known. In particular, the authors define an algorithm to solve the following prediction problem: We want to predict whether an event will occur, and we may use the predictions of a finite set of experts; each expert can see a sequence of measurements, and gives his or her opinion on the probability of the event as a real number between 0 and 1. The problem is to build an algorithm that uses the experts' advice to give the best possible estimate. This requires minimization of the loss, where loss is defined as the absolute value of the difference between the predicted number and the real outcome. The authors define a family of algorithms, some of them taken from the literature, and study their performance. In particular, the optimal solution algorithms are discussed, and their lower and upper bounds are demonstrated. Since the optimal algorithms have complexity exponential in the number of experts, the authors provide other algorithms to generate the prediction, and demonstrate their lower and upper bounds. The algorithm P is based on computing and upgrading a nonnegative weight for each expert that reflects the expert's prediction ability. A parameter controls when to drop poorly performing experts. The choice of the parameter uses a priori knowledge. An iterative variation P* can make predictions without a priori knowledge of the best expert's loss. In the last part of the paper, the authors discuss an application of the algorithm in pattern recognition, where the problem is to predict the label (0 or 1) of examples taken at random from an arbitrary distribution. The authors compare their solution with other, similar results. The comparison indicates that the new approach is better. The proofs of the main theorems are difficult and require a few lemmas. This paper is important for everyone working in the field. It covers almost every aspect of a loss function that minimizes the number of mistakes and can use any kind of expert output.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 44, Issue 3
May 1997
163 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/258128
  • Editor:
  • F. T. Leighton
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1997
Published in JACM Volume 44, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. algorithms

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)319
  • Downloads (Last 6 weeks)39
Reflects downloads up to 31 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Bayesian Predictive Synthesis with Outcome-Dependent PoolsStatistical Science10.1214/24-STS95440:1Online publication date: 1-Jan-2025
  • (2024)Non-stationary online convex optimization with arbitrary delaysProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694115(49991-50011)Online publication date: 21-Jul-2024
  • (2024)Monotone individual fairnessProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692200(3266-3283)Online publication date: 21-Jul-2024
  • (2024)An Approximate Dynamic Programming Approach to Repeated Games with Vector LossesOperations Research10.1287/opre.2022.233472:1(373-388)Online publication date: 1-Jan-2024
  • (2024)Dynamic Fairness-aware Recommendation Through Multi-agent Social ChoiceACM Transactions on Recommender Systems10.1145/36906533:2(1-35)Online publication date: 28-Sep-2024
  • (2024)Causal Structure Learning for Recommender SystemACM Transactions on Recommender Systems10.1145/36802963:1(1-23)Online publication date: 31-Jul-2024
  • (2024)Fast Swap Regret Minimization and Applications to Approximate Correlated EquilibriaProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649691(1223-1234)Online publication date: 10-Jun-2024
  • (2024)Evolve: Enhancing Unsupervised Continual Learning with Multiple Experts2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00236(2355-2366)Online publication date: 3-Jan-2024
  • (2024)Getting the Best Out of Both Worlds: Algorithms for Hierarchical Inference at the EdgeIEEE Transactions on Machine Learning in Communications and Networking10.1109/TMLCN.2024.33665012(280-297)Online publication date: 2024
  • (2024)Adversarial Bandits With Multi-User Delayed Feedback: Theory and ApplicationIEEE Transactions on Mobile Computing10.1109/TMC.2024.336223723:10(9383-9397)Online publication date: 1-Oct-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media