Abstract
This paper deals with constrained Markov decision processes (MDPs) with first passage criteria. The objective is to maximize the expected reward obtained during a first passage time to some target set, and a constraint is imposed on the associated expected cost over this first passage time. The state space is denumerable, and the rewards/costs are possibly unbounded. In addition, the discount factor is state-action dependent and is allowed to be equal to one. We develop suitable conditions for the existence of a constrained optimal policy, which are generalizations of those for constrained MDPs with the standard discount criteria. Moreover, it is revealed that the constrained optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our results, which exhibits some advantage of our optimality conditions.
Similar content being viewed by others
References
Alvarez-Mena, J., & Hernández-Lerma, O. (2002). Convergence of the optimal values of constrained Markov control processes. Mathematical Methods of Operations Research, 55, 461–484.
Berument, H., Kilinc, Z., & Ozlale, U. (2004). The effects of different inflation risk premiums on interest rate spreads. Physica. A, 333, 317–324.
Beutler, F. J., & Ross, K. W. (1985). Optimal policies for controlled Markov chains with a constraint. Journal of Mathematical Analysis and Applications, 112, 236–252.
Bhatnagar, S. (2010). An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes. Systems & Control Letters, 59, 760–766.
Boda, K., Filar, J. A., Lin, Y., & Spanjers, L. (2004). Stochastic target hitting time and the problem of early retirement. IEEE Transactions on Automatic Control, 49, 409–419.
Derman, C. (1970). Mathematics in science and engineering: Vol. 67. Finite state Markovian decision processes. New York: Academic Press.
Guo, X. P. (2000). Constrained denumerable state non-stationary MDPs with expected total reward criterion. Acta Mathematicae Applicatae Sinica, 16, 205–212.
Guo, X. P., & Hernández-Lerma, O. (2003). Constrained continuous-time Markov control processes with discounted criteria. Stochastic Analysis and Applications, 21, 379–399.
Guo, X. P., & Hernández-Lerma, O. (2009). Continuous-time Markov decision processes: theory and applications. Berlin Heidelberg: Springer.
Haberman, S., & Sung, J. (2005). Optimal pension funding dynamics over infinite control horizon when stochastic rates of return are stationary. Insurance. Mathematics & Economics, 36, 103–116.
Hernández-Lerma, O., & Lasserre, J. B. (1996). Discrete-time Markov control processes: basic optimality criteria. New York: Springer.
Hernández-Lerma, O., & Lasserre, J. B. (1999). Further topics on discrete-time Markov control processes. New York: Springer.
Hernández-Lerma, O., & González-Hernández, J. (2000). Constrained Markov control processes in Borel spaces: the discounted case. Mathematical Methods of Operations Research, 52, 271–285.
Hernández-Lerma, O., González-Hernández, J., & López-Martínez, R. R. (2003). Constrained average cost Markov control processes in Borel spaces. SIAM Journal on Control and Optimization, 42, 442–468.
Huang, Y. H., & Guo, X. P. (2009). Optimal risk probability for first passage models in semi-Markov decision processes. Journal of Mathematical Analysis and Applications, 359, 404–420.
Huang, Y. H., & Guo, X. P. (2011). First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta Mathematicae Applicatae Sinica, 27, 177–190.
Kushner, H. (1971). Introduction to stochastic control. New York: Holt, Rinehart & Winston
Kurano, M., Nakagami, J.-I., & Huang, Y. (2000). Constrained Markov decision processes with compact state and action spaces: the average case. Optimization, 48, 255–269.
Lee, P., & Rosenfield, D. B. (2005). When to refinance a mortgage: a dynamic programming approach. European Journal of Operational Research, 166, 266–277.
Liu, J. Y., & Huang, S. M. (2001). Markov decision processes with distribution function criterion of first-passage time. Applied Mathematics & Optimization, 43, 187–201.
Liu, J. Y., & Liu, K. (1992). Markov decision programming-the first passage model with denumerable state space. Systems Science and Mathematics Sciences, 5, 340–351.
Mendoza-Pérez, A. F., & Hernández-Lerma, O. (2012). Deterministic optimal policies for Markov control processes with pathwise constraints. Applicationes Mathematicae (Warsaw), 39, 185–209.
Newell, R. G., & Pizer, W. A. (2003). Discounting the distant future: how much do uncertain rates increase valuation. Journal of Environmental Economics and Management, 46, 52–71.
Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley
Sack, B., & Wieland, V. (2000). Interest-rate smoothing and optimal monetary policy: a review of recent empirical evidence. Journal of Economics and Business, 52, 205–228.
Schmidli, H. (2008). Stochastic control in insurance, probability and its applications. London: Springer.
Sennott, L. I. (1991). Constrained discounted Markov decision chains. Probability in the Engineering and Informational Sciences, 5, 463–475.
Tanaka, K. (1991). On discounted dynamic programming with constraints. Journal of Mathematical Analysis and Applications, 155, 264–277.
Yu, S. X., Lin, Y. L., & Yan, P. F. (1998). Optimization models for the first arrival target distribution function in discrete time. Journal of Mathematical Analysis and Applications, 225, 193–223.
Zhang, L. L., & Guo, X. P. (2008). Constrained continuous-time Markov control processes with average criteria. Mathematical Methods of Operations Research, 67, 323–340.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partially supported by NSFC, GDUPS and Guangdong Province Key Laboratory of Computational Science at the Sun Yat-Sen University.
Rights and permissions
About this article
Cite this article
Huang, Y., Wei, Q. & Guo, X. Constrained Markov decision processes with first passage criteria. Ann Oper Res 206, 197–219 (2013). https://doi.org/10.1007/s10479-012-1292-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-012-1292-1