Constrained Markov decision processes with first passage criteria

Huang, Yonghui; Wei, Qingda; Guo, Xianping

doi:10.1007/s10479-012-1292-1

Constrained Markov decision processes with first passage criteria

Published: 22 December 2012

Volume 206, pages 197–219, (2013)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Yonghui Huang¹,
Qingda Wei¹ &
Xianping Guo¹

448 Accesses
8 Citations
Explore all metrics

Abstract

This paper deals with constrained Markov decision processes (MDPs) with first passage criteria. The objective is to maximize the expected reward obtained during a first passage time to some target set, and a constraint is imposed on the associated expected cost over this first passage time. The state space is denumerable, and the rewards/costs are possibly unbounded. In addition, the discount factor is state-action dependent and is allowed to be equal to one. We develop suitable conditions for the existence of a constrained optimal policy, which are generalizations of those for constrained MDPs with the standard discount criteria. Moreover, it is revealed that the constrained optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our results, which exhibits some advantage of our optimality conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Closed-form expressions of the run-length distribution of the nonparametric double sampling precedence monitoring scheme

Article Open access 12 April 2024

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Article 17 January 2019

Simulation optimization: a review of algorithms and applications

Article Open access 23 September 2015

References

Alvarez-Mena, J., & Hernández-Lerma, O. (2002). Convergence of the optimal values of constrained Markov control processes. Mathematical Methods of Operations Research, 55, 461–484.
Article Google Scholar
Berument, H., Kilinc, Z., & Ozlale, U. (2004). The effects of different inflation risk premiums on interest rate spreads. Physica. A, 333, 317–324.
Article Google Scholar
Beutler, F. J., & Ross, K. W. (1985). Optimal policies for controlled Markov chains with a constraint. Journal of Mathematical Analysis and Applications, 112, 236–252.
Article Google Scholar
Bhatnagar, S. (2010). An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes. Systems & Control Letters, 59, 760–766.
Article Google Scholar
Boda, K., Filar, J. A., Lin, Y., & Spanjers, L. (2004). Stochastic target hitting time and the problem of early retirement. IEEE Transactions on Automatic Control, 49, 409–419.
Article Google Scholar
Derman, C. (1970). Mathematics in science and engineering: Vol. 67. Finite state Markovian decision processes. New York: Academic Press.
Google Scholar
Guo, X. P. (2000). Constrained denumerable state non-stationary MDPs with expected total reward criterion. Acta Mathematicae Applicatae Sinica, 16, 205–212.
Article Google Scholar
Guo, X. P., & Hernández-Lerma, O. (2003). Constrained continuous-time Markov control processes with discounted criteria. Stochastic Analysis and Applications, 21, 379–399.
Article Google Scholar
Guo, X. P., & Hernández-Lerma, O. (2009). Continuous-time Markov decision processes: theory and applications. Berlin Heidelberg: Springer.
Book Google Scholar
Haberman, S., & Sung, J. (2005). Optimal pension funding dynamics over infinite control horizon when stochastic rates of return are stationary. Insurance. Mathematics & Economics, 36, 103–116.
Article Google Scholar
Hernández-Lerma, O., & Lasserre, J. B. (1996). Discrete-time Markov control processes: basic optimality criteria. New York: Springer.
Book Google Scholar
Hernández-Lerma, O., & Lasserre, J. B. (1999). Further topics on discrete-time Markov control processes. New York: Springer.
Book Google Scholar
Hernández-Lerma, O., & González-Hernández, J. (2000). Constrained Markov control processes in Borel spaces: the discounted case. Mathematical Methods of Operations Research, 52, 271–285.
Article Google Scholar
Hernández-Lerma, O., González-Hernández, J., & López-Martínez, R. R. (2003). Constrained average cost Markov control processes in Borel spaces. SIAM Journal on Control and Optimization, 42, 442–468.
Article Google Scholar
Huang, Y. H., & Guo, X. P. (2009). Optimal risk probability for first passage models in semi-Markov decision processes. Journal of Mathematical Analysis and Applications, 359, 404–420.
Article Google Scholar
Huang, Y. H., & Guo, X. P. (2011). First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta Mathematicae Applicatae Sinica, 27, 177–190.
Article Google Scholar
Kushner, H. (1971). Introduction to stochastic control. New York: Holt, Rinehart & Winston
Google Scholar
Kurano, M., Nakagami, J.-I., & Huang, Y. (2000). Constrained Markov decision processes with compact state and action spaces: the average case. Optimization, 48, 255–269.
Article Google Scholar
Lee, P., & Rosenfield, D. B. (2005). When to refinance a mortgage: a dynamic programming approach. European Journal of Operational Research, 166, 266–277.
Article Google Scholar
Liu, J. Y., & Huang, S. M. (2001). Markov decision processes with distribution function criterion of first-passage time. Applied Mathematics & Optimization, 43, 187–201.
Article Google Scholar
Liu, J. Y., & Liu, K. (1992). Markov decision programming-the first passage model with denumerable state space. Systems Science and Mathematics Sciences, 5, 340–351.
Google Scholar
Mendoza-Pérez, A. F., & Hernández-Lerma, O. (2012). Deterministic optimal policies for Markov control processes with pathwise constraints. Applicationes Mathematicae (Warsaw), 39, 185–209.
Article Google Scholar
Newell, R. G., & Pizer, W. A. (2003). Discounting the distant future: how much do uncertain rates increase valuation. Journal of Environmental Economics and Management, 46, 52–71.
Article Google Scholar
Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley
Book Google Scholar
Sack, B., & Wieland, V. (2000). Interest-rate smoothing and optimal monetary policy: a review of recent empirical evidence. Journal of Economics and Business, 52, 205–228.
Article Google Scholar
Schmidli, H. (2008). Stochastic control in insurance, probability and its applications. London: Springer.
Google Scholar
Sennott, L. I. (1991). Constrained discounted Markov decision chains. Probability in the Engineering and Informational Sciences, 5, 463–475.
Article Google Scholar
Tanaka, K. (1991). On discounted dynamic programming with constraints. Journal of Mathematical Analysis and Applications, 155, 264–277.
Article Google Scholar
Yu, S. X., Lin, Y. L., & Yan, P. F. (1998). Optimization models for the first arrival target distribution function in discrete time. Journal of Mathematical Analysis and Applications, 225, 193–223.
Article Google Scholar
Zhang, L. L., & Guo, X. P. (2008). Constrained continuous-time Markov control processes with average criteria. Mathematical Methods of Operations Research, 67, 323–340.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, 510275, China
Yonghui Huang, Qingda Wei & Xianping Guo

Authors

Yonghui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Qingda Wei
View author publications
You can also search for this author in PubMed Google Scholar
Xianping Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianping Guo.

Additional information

This work was partially supported by NSFC, GDUPS and Guangdong Province Key Laboratory of Computational Science at the Sun Yat-Sen University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, Y., Wei, Q. & Guo, X. Constrained Markov decision processes with first passage criteria. Ann Oper Res 206, 197–219 (2013). https://doi.org/10.1007/s10479-012-1292-1

Download citation

Published: 22 December 2012
Issue Date: July 2013
DOI: https://doi.org/10.1007/s10479-012-1292-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constrained Markov decision processes with first passage criteria

Abstract

Access this article

Similar content being viewed by others

Closed-form expressions of the run-length distribution of the nonparametric double sampling precedence monitoring scheme

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Simulation optimization: a review of algorithms and applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Constrained Markov decision processes with first passage criteria

Abstract

Access this article

Similar content being viewed by others

Closed-form expressions of the run-length distribution of the nonparametric double sampling precedence monitoring scheme

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Simulation optimization: a review of algorithms and applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation