A Computational Note on Markov Decision Processes Without Discounting
Pfeiffer, Paul E.
Dennis, J.E. Jr.
The Markov decision process is treated in a variety of forms or cases: finite or infinite horizon, with or without discounting. The finite horizon cases and the case of infinite horizon with discounting have received considerable attention. In the infinite horizon case, with discounting, the problem either receives a linear programming treatment or is treated by the elegant and effective policy-iteration procedure by Ronald Howard. In the undiscounted case, however, a special form of this procedure is required, which detracts from the directness and elegance of the method. The difficulty comes in the step generally called the value-determination procedure. The equations used in this step are linearly dependent, so that the solution of the system of linear equations requires some adjustment. We propose a new computational procedure which avoids this difficulty and works directly with the average next-period gains and powers of the transition probability matrix. The fundamental computational tools are matrix multiplication and addition.
Citable link to this pagehttps://hdl.handle.net/1911/101629
MetadataShow full item record
- CAAM Technical Reports