Approximate solutions to markov decision processes

January 1999

Author:
Geoffrey J. Gordon,
Chair:
Tom Mitchell

Publisher:

Carnegie Mellon University
Schenley Park Pittsburgh, PA
United States

ISBN:978-0-599-93864-9

Order Number:AAI9986593

Pages:

140

Purchase on ProQuest

Bibliometrics

Abstract

One of the basic problems of machine learning is deciding how to act in an uncertain world. For example, if I want my robot to bring me a cup of coffee, it must be able to compute the correct sequence of electrical impulses to send to its motors to navigate from the coffee pot to my office. In fact, since the results of its actions are not completely predictable, it is not enough just to compute the correct sequence; instead the robot must sense and correct for deviations from its intended path.

In order for any machine learner to act reasonably in an uncertain environment, it must solve problems like the above one quickly and reliably. Unfortunately, the world is often so complicated that it is difficult or impossible to find the optimal sequence of actions to achieve a given goal. So, in order to scale our learners up to real-world problems, we usually must settle for approximate solutions.

One representation for a learner's environment and goals is a Markov decision process or MDP. MDPs allow us to represent actions that have probabilistic outcomes, and to plan for complicated, temporally-extended goals. An MDP consists of a set of states that the environment can be in, together with rules for how the environment can change state and for what the learner is supposed to do.

One way to approach a large MDP is to try to compute an approximation to its optimal state evaluation function, the function which tells us how much reward the learner can be expected to achieve if the world is in a particular state. If the approximation is good enough, we can use a shallow search to find a good action from most states. Researchers have tried many different ways to approximate evaluation functions. This thesis aims for a middle ground, between algorithms that don't scale well because they use an impoverished representation for the evaluation function and algorithms that we can't analyze because they use too complicated a representation.

Cited By

Contributors

Geoffrey J Gordon
Carnegie Mellon University
- Publication Years1961 - 2022
- Publication counts94
- Citation count2,561
- Available for Download45
- Downloads (cumulative)27,704
- Downloads (12 months)4,638
- Downloads (6 weeks)649
- Average Downloads per Article616
- Average Citation per Article27
View Full Profile
Tom Michael Mitchell
Carnegie Mellon University
- Publication Years1977 - 2024
- Publication counts128
- Citation count10,169
- Available for Download38
- Downloads (cumulative)94,026
- Downloads (12 months)11,174
- Downloads (6 weeks)1,391
- Average Downloads per Article2,474
- Average Citation per Article79
View Full Profile

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Variability Sensitive Markov Decision Processes

Considered are time-average Markov Decision Processes MDPs with finite state and action spaces. Two definitions of variability are introduced, namely, the expected time-average variability and time-average expected variability. The two criteria are in ...
Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces

This paper deals with continuous-time Markov decision processes in Polish spaces, under an expected discounted reward criterion. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates ...
Markov Decision Processes with Sample Path Constraints: The Communicating Case

We consider time-average Markov Decision Processes MDPs, which accumulate a reward and cost at each decision epoch. A policy meets the sample-path constraint if the time-average cost is below a specified value with probability one. The optimization ...

Browse Theses

Sections

Cited By

Variability Sensitive Markov Decision Processes

Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces

Markov Decision Processes with Sample Path Constraints: The Communicating Case

Sections

Cited By

Save to Binder

Recommendations

Variability Sensitive Markov Decision Processes

Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces

Markov Decision Processes with Sample Path Constraints: The Communicating Case