More Web Proxy on the site http://driver.im/

article

Free access

Toward Optimal Classifier System Performance in Non-Markov Environments

Authors:

Pier Luca Lanzi,

Stewart W. WilsonAuthors Info & Claims

Evolutionary Computation, Volume 8, Issue 4

Pages 393 - 418

https://doi.org/10.1162/106365600568239

Published: 01 December 2000 Publication History

Abstract

Wilson's (1994) bit-register memory scheme was incorporated into the XCS classifier system and investigated in a series of non-Markov environments. Two extensions to the scheme were important in obtaining near-optimal performance in the harder environments. The first was an exploration strategy in which exploration of external actions was probabilistic as in Markov environments, but internal "actions" (register settings) were selected deterministically. The second was use of a register having more bit-positions than were strictly necessary to resolve environmental aliasing. The origins and effects of the two extensions are discussed.

References

[1]

Chrisman, L. and Littman, M. (1993). Hidden state and short-term memory. Presentation at the Reinforcement Learning Workshop. Proceedings of the International Conference on Machine Learning (ICML-93).

[2]

Cliff, D. and Ross, S. (1994). Adding memory to ZCS. Adaptive Behavior, 3(2): 101-150.

[3]

Hansen, E. (1998). Solving POMDPs by searching in policy space. Ph.D. thesis, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts.

[4]

Hauskrecht, M. (1998). Planning and Control in Stochastic Domains with Imperfect Information. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts.

[5]

Holland, J. H., Holyoak, K. J., Nisbett, R. E., and Thagard, P. R. (1986). Induction: Processes of Inference, Learning, and Discovery. MIT Press, Cambridge, Massachusetts.

[6]

Jaakkola, T., Singh, S. P., and Jordan, M. I. (1995). Reinforcement learning algorithm for partially observable markov decision problems. In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances in Neural Information Processing Systems 7, pages 345-352, MIT Press, Cambridge, Massachusetts.

[7]

Kaelbling, L. P., Littmann, M. L., and Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2): 99-134.

[8]

Kovacs, T. (1999). Deletion schemes for classifier systems. In Banzhaf, W., Daida, J., Eiben, A. E., Garzon, M. H., Honavar, V., Jakiela, M., and Smith, R. E., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-99), pages 329-336, Morgan Kaufmann, San Francisco, California.

[9]

Lanzi, P. L. (1998a). Adding Memory to XCS. In Proceedings of the IEEE Conference on Evolutionary Computation (ICEC98), IEEE Press, Piscataway, New Jersey.

[10]

Lanzi, P. L. (1998b). An Analysis of the Memory Mechanism of XCSM. In Koza, J., Banzhaf, W., Chellapilla, K., Deb, K. et al., editors, Proceedings of the Third Annual Genetic Programming Conference, pages 643-651, Morgan Kaufmann, San Francisco, California.

[11]

Lanzi, P.L. (1999). An analysis of generalization in the XCS classifier system. Evolutionary Computation, 7(2): 125-149.

[12]

Lanzi, P. L. and Colombetti, M. (1999). An extension to the XCS classifier system for stochastic environments. In Banzhaf, W., Daida, J., Eiben, A. E., Garzon, M. H., Honavar, V., Jakiela, M., and Smith, R. E., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-99), pages 353-360, Morgan Kaufmann, San Francisco, California.

[13]

Lin, L. (1993). Reinforcement learning for robots using neural networks.Technical Report CMU-CS- 93-103, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania.

[14]

Lin, L. and Mitchell, T. (1992). Memory approaches to reinforcement learning in non-Markovian domains. Technical Report CMU-CS-92-138, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania.

[15]

Littman, M. L. (1994). Memoryless policies: Theoretical limitations and practical results. In Cliff, D., Husband, P., Meyer, J.-A., and Wilson, S., editors, From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior, pages 238-245, MIT Press, Cambridge, Massachusetts.

[16]

McCallum, A. K. (1995). Reinforcement Learning with Selective Perception and Hidden State. Ph.D. thesis, University of Rochester, Rochester, New York.

[17]

McCallum, R. A. (1996). Hidden state and reinforcement learning with instance-based state identification. IEEE Transactions on Systems, Man and Cybernetics - Part B (Special issue on Learning Autonomous Robots), 26(3): 464-473.

[18]

Meuleau, N., Peshkin, L., Kim, K., and Kaelbling, L. (1999). Learning finite-state controllers for partially observable environments. In Laskey, K., editor, Fifteenth International Conference on Uncertainty inArtificial Intelligence, pages 427-436, Morgan Kaufmann, San Francisco, California.

[19]

Peshkin, L., Meuleau, N., and Kaelbling, L. P. (1999). Learning policies with external memory. In Bratko, I. and Dzeroski, S., editors, Machine Learning: Proceedings of the Sixteenth International Conference, pages 307-314, Morgan Kaufmann, San Francisco, California.

[20]

Riolo, R. L. (1988). Cfs-c/fsw1: An implementation of the cfs-c classifier system in a domain that involves learning to control a markov decision process. Technical Report, Logic of Computer Group, Division of Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan. Available at ftp://ftp.cs.bham.ac.uk/pub/tech-reports/1996/CSRP-96-17.ps.gz.

[21]

Robertson, G. G. and Riolo, R. L. (1988). A tale of two classifier systems. Machine Learning, 3:139-159.

[22]

Schmidhuber, J. (1991). Reinforcement Learning in Markovian and non-Markovian environments. In Lippman, R. P., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 500-506, MIT Press, Cambridge, Massachusetts.

[23]

Smith, R. E. (1994). Memory exploitation in learning classifier systems. Evolutionary Computation, 2(3): 199-220.

[24]

Steels, L. (1996). Emergent adaptive lexicons. In Maes, P., Mataric, M. J., Meyer, J. A., Pollack, J., and Wilson, S. W., editors, From Animals to Animat 4. Proceedings of the Simulation of Adaptive Behavior Conference, pages 562-567, MIT Press, Cambridge, Massachusetts.

[25]

Stolzmann, W. (1999). Latent learning in Khepera robots with anticipatory classifier systems. In Wu, A., editor, Proceedings of the 1999 Genetic and Evolutionary Computation Conference Workshop Program, pages 290-297, Morgan Kaufmann, San Francisco, California.

[26]

Teller, A. (1994). The evolution of mental models. In Advances in Genetic Programming, MIT Press, Cambridge, Massachusetts.

[27]

Tomlinson, A. and Bull, L. (1998). A corporate classifier system. In Eiben, A. E., Bäck, T., Schoenauer, M., and Schwefel, H.-P., editors, Parallel Problem Solving from Nature, PPSN V, pages 550-559, Springer, Berlin, Germany.

[28]

Tomlinson, A. and Bull, L. (1999a). On corporate classifier systems: Increasing the benefits of rule linkage. In Banzhaf, W., Daida, J., Eiben, A. E., Garzon, M. H., Honavar, V., Jakiela, M., and Smith, R. E., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO- 99), pages 649-656, Morgan Kaufmann, San Francisco, California.

[29]

Tomlinson, A. and Bull, L. (1999b). A corporate XCS. In Wu, A. S., editor, Proceedings of the 1999 Genetic and Evolutionary Computation Conference Workshop Program, pages 298-305, Morgan Kaufmann, San Francisco, California.

[30]

Widrow, B. and Hoff, M. (1960). Adaptive switching circuits. In Western Electronic Show and Convention, Volume 4, pages 96-104, Institute of Radio Engineers (now IEEE).

[31]

Wiering, M. and Schmidhuber, J. (1998). HQ-learning. Adaptive Behavior, 6(2): 219-246.

[32]

Wilson, S.W. (1985). Knowledge growth in an artificial animal. In Grefenstette, J. J., editor, Proceedings of the First International Conference on Genetic Algorithms and Their Applications, pages 16-23, Lawrence Erlbaum, Pittsburgh, Pennsylvania.

[33]

Wilson, S.W. (1994). ZCS: a zeroth level classifier system. Evolutionary Computation, 1(2): 1-18.

[34]

Wilson, S.W. (1995). Classifier fitness based on accuracy. Evolutionary Computation, 3(2): 149-175.

[35]

Wilson, S.W. (1998). Generalization in the XCS classifier system. In Koza, J., Banzhaf, W., Chellapilla, K., Deb, K. et al., editors, Proceedings of the Third Annual Genetic Programming Conference, pages 665-674, Morgan Kaufmann, San Francisco, California.

[36]

Wilson, S. W. and Goldberg, D. E. (1989). A critical review of classifier systems. In Schaffer, J. D., editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 244-255, Morgan Kaufmann, San Francisco, California.

Cited By

Uwano FBrowne WLi XHandl J(2024)Cognitive Learning System for Sequential Aliasing Patterns of States in Multistep Decision-MakingProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654110(315-318)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654110
Siddique ABrowne WUrbanowicz RSilva SPaquete L(2023)Modern Applications of Evolutionary Rule-based Machine LearningProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3595047(1301-1330)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1145/3583133.3595047
Siddique ABrowne WWagner M(2022)Learning classifier systemsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3520304.3533664(1081-1110)Online publication date: 9-Jul-2022
https://dl.acm.org/doi/10.1145/3520304.3533664
Show More Cited By

Toward Optimal Classifier System Performance in Non-Markov Environments
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies

Recommendations

Uniqueness and Stability of Optimal Policies of Finite State Markov Decision Processes

In this paper we consider infinite horizon discrete-time optimal control of Markov decision processes (MDPs) with finite state spaces and compact action sets. We restrict attention to unicost MDPs, which form a class that contains all the weakly ...
An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes

This paper concerns discrete-time, finite state multichain MDPs with compact action sets. The optimality criterion is long-run average cost. Simple examples illustrate that optimal stationary Markov policies do not always exist. We establish the ...
Existence of Markov Controls and Characterization of Optimal Markov Controls

Given a solution of a controlled martingale problem it is shown under general conditions that there exists a solution having Markov controls which has the same cost as the original solution. This result is then used to show that the original stochastic ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Evolutionary Computation

Evolutionary Computation Volume 8, Issue 4

December 2000

122 pages

ISSN:1063-6560

EISSN:1530-9304

Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 December 2000

Published in EVOL Volume 8, Issue 4

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
149
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)6

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Uwano FBrowne WLi XHandl J(2024)Cognitive Learning System for Sequential Aliasing Patterns of States in Multistep Decision-MakingProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654110(315-318)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654110
Siddique ABrowne WUrbanowicz RSilva SPaquete L(2023)Modern Applications of Evolutionary Rule-based Machine LearningProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3595047(1301-1330)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1145/3583133.3595047
Siddique ABrowne WWagner M(2022)Learning classifier systemsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3520304.3533664(1081-1110)Online publication date: 9-Jul-2022
https://dl.acm.org/doi/10.1145/3520304.3533664
Stein AMaier RRosenbauer LHähner JCoello Coello C(2020)XCS classifier system with experience replayProceedings of the 2020 Genetic and Evolutionary Computation Conference10.1145/3377930.3390249(404-413)Online publication date: 25-Jun-2020
https://dl.acm.org/doi/10.1145/3377930.3390249
Orhand RJeannin-Girardon AParrend PCollet P(2020)BACS: A Thorough Study of Using Behavioral Sequences in ACS2Parallel Problem Solving from Nature – PPSN XVI10.1007/978-3-030-58112-1_36(524-538)Online publication date: 5-Sep-2020
https://dl.acm.org/doi/10.1007/978-3-030-58112-1_36
Tatsumi TTakadama KAuger AStützle T(2019)XCS-CR for handling input, output, and reward noiseProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3319619.3326863(1303-1311)Online publication date: 13-Jul-2019
https://dl.acm.org/doi/10.1145/3319619.3326863
Tatsumi TKovacs TTakadama KTakadama K(2018)XCS-CRProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3205651.3208271(1457-1464)Online publication date: 6-Jul-2018
https://dl.acm.org/doi/10.1145/3205651.3208271
Tatsumi TSato HTakadama KBosman P(2017)Automatic adjustment of selection pressure based on range of reward in learning classifier systemProceedings of the Genetic and Evolutionary Computation Conference10.1145/3071178.3080531(505-512)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1145/3071178.3080531
Hayashida TNishizaki ISekizaki STakeuchi H(2017)Improved anticipatory classifier system with internal memory for POMDPs with aliased statesProcedia Computer Science10.1016/j.procs.2017.08.092112:C(215-224)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1016/j.procs.2017.08.092
Chen GDouch CZhang M(2015)Using Learning Classifier Systems to Learn Stochastic Decision PoliciesIEEE Transactions on Evolutionary Computation10.1109/TEVC.2015.241546419:6(885-902)Online publication date: 1-Dec-2015
https://dl.acm.org/doi/10.1109/TEVC.2015.2415464
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents