Skip to main content
Log in

Constrained Markov decision processes with first passage criteria

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

This paper deals with constrained Markov decision processes (MDPs) with first passage criteria. The objective is to maximize the expected reward obtained during a first passage time to some target set, and a constraint is imposed on the associated expected cost over this first passage time. The state space is denumerable, and the rewards/costs are possibly unbounded. In addition, the discount factor is state-action dependent and is allowed to be equal to one. We develop suitable conditions for the existence of a constrained optimal policy, which are generalizations of those for constrained MDPs with the standard discount criteria. Moreover, it is revealed that the constrained optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our results, which exhibits some advantage of our optimality conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alvarez-Mena, J., & Hernández-Lerma, O. (2002). Convergence of the optimal values of constrained Markov control processes. Mathematical Methods of Operations Research, 55, 461–484.

    Article  Google Scholar 

  • Berument, H., Kilinc, Z., & Ozlale, U. (2004). The effects of different inflation risk premiums on interest rate spreads. Physica. A, 333, 317–324.

    Article  Google Scholar 

  • Beutler, F. J., & Ross, K. W. (1985). Optimal policies for controlled Markov chains with a constraint. Journal of Mathematical Analysis and Applications, 112, 236–252.

    Article  Google Scholar 

  • Bhatnagar, S. (2010). An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes. Systems & Control Letters, 59, 760–766.

    Article  Google Scholar 

  • Boda, K., Filar, J. A., Lin, Y., & Spanjers, L. (2004). Stochastic target hitting time and the problem of early retirement. IEEE Transactions on Automatic Control, 49, 409–419.

    Article  Google Scholar 

  • Derman, C. (1970). Mathematics in science and engineering: Vol. 67. Finite state Markovian decision processes. New York: Academic Press.

    Google Scholar 

  • Guo, X. P. (2000). Constrained denumerable state non-stationary MDPs with expected total reward criterion. Acta Mathematicae Applicatae Sinica, 16, 205–212.

    Article  Google Scholar 

  • Guo, X. P., & Hernández-Lerma, O. (2003). Constrained continuous-time Markov control processes with discounted criteria. Stochastic Analysis and Applications, 21, 379–399.

    Article  Google Scholar 

  • Guo, X. P., & Hernández-Lerma, O. (2009). Continuous-time Markov decision processes: theory and applications. Berlin Heidelberg: Springer.

    Book  Google Scholar 

  • Haberman, S., & Sung, J. (2005). Optimal pension funding dynamics over infinite control horizon when stochastic rates of return are stationary. Insurance. Mathematics & Economics, 36, 103–116.

    Article  Google Scholar 

  • Hernández-Lerma, O., & Lasserre, J. B. (1996). Discrete-time Markov control processes: basic optimality criteria. New York: Springer.

    Book  Google Scholar 

  • Hernández-Lerma, O., & Lasserre, J. B. (1999). Further topics on discrete-time Markov control processes. New York: Springer.

    Book  Google Scholar 

  • Hernández-Lerma, O., & González-Hernández, J. (2000). Constrained Markov control processes in Borel spaces: the discounted case. Mathematical Methods of Operations Research, 52, 271–285.

    Article  Google Scholar 

  • Hernández-Lerma, O., González-Hernández, J., & López-Martínez, R. R. (2003). Constrained average cost Markov control processes in Borel spaces. SIAM Journal on Control and Optimization, 42, 442–468.

    Article  Google Scholar 

  • Huang, Y. H., & Guo, X. P. (2009). Optimal risk probability for first passage models in semi-Markov decision processes. Journal of Mathematical Analysis and Applications, 359, 404–420.

    Article  Google Scholar 

  • Huang, Y. H., & Guo, X. P. (2011). First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta Mathematicae Applicatae Sinica, 27, 177–190.

    Article  Google Scholar 

  • Kushner, H. (1971). Introduction to stochastic control. New York: Holt, Rinehart & Winston

    Google Scholar 

  • Kurano, M., Nakagami, J.-I., & Huang, Y. (2000). Constrained Markov decision processes with compact state and action spaces: the average case. Optimization, 48, 255–269.

    Article  Google Scholar 

  • Lee, P., & Rosenfield, D. B. (2005). When to refinance a mortgage: a dynamic programming approach. European Journal of Operational Research, 166, 266–277.

    Article  Google Scholar 

  • Liu, J. Y., & Huang, S. M. (2001). Markov decision processes with distribution function criterion of first-passage time. Applied Mathematics & Optimization, 43, 187–201.

    Article  Google Scholar 

  • Liu, J. Y., & Liu, K. (1992). Markov decision programming-the first passage model with denumerable state space. Systems Science and Mathematics Sciences, 5, 340–351.

    Google Scholar 

  • Mendoza-Pérez, A. F., & Hernández-Lerma, O. (2012). Deterministic optimal policies for Markov control processes with pathwise constraints. Applicationes Mathematicae (Warsaw), 39, 185–209.

    Article  Google Scholar 

  • Newell, R. G., & Pizer, W. A. (2003). Discounting the distant future: how much do uncertain rates increase valuation. Journal of Environmental Economics and Management, 46, 52–71.

    Article  Google Scholar 

  • Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley

    Book  Google Scholar 

  • Sack, B., & Wieland, V. (2000). Interest-rate smoothing and optimal monetary policy: a review of recent empirical evidence. Journal of Economics and Business, 52, 205–228.

    Article  Google Scholar 

  • Schmidli, H. (2008). Stochastic control in insurance, probability and its applications. London: Springer.

    Google Scholar 

  • Sennott, L. I. (1991). Constrained discounted Markov decision chains. Probability in the Engineering and Informational Sciences, 5, 463–475.

    Article  Google Scholar 

  • Tanaka, K. (1991). On discounted dynamic programming with constraints. Journal of Mathematical Analysis and Applications, 155, 264–277.

    Article  Google Scholar 

  • Yu, S. X., Lin, Y. L., & Yan, P. F. (1998). Optimization models for the first arrival target distribution function in discrete time. Journal of Mathematical Analysis and Applications, 225, 193–223.

    Article  Google Scholar 

  • Zhang, L. L., & Guo, X. P. (2008). Constrained continuous-time Markov control processes with average criteria. Mathematical Methods of Operations Research, 67, 323–340.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianping Guo.

Additional information

This work was partially supported by NSFC, GDUPS and Guangdong Province Key Laboratory of Computational Science at the Sun Yat-Sen University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, Y., Wei, Q. & Guo, X. Constrained Markov decision processes with first passage criteria. Ann Oper Res 206, 197–219 (2013). https://doi.org/10.1007/s10479-012-1292-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-012-1292-1

Keywords

Navigation