ABSTRACT
Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to |π¬| max problems, where |π¬| is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set π is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of π, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.
- Dimitri P Bertsekas and Steven Shreve. 2004. Stochastic optimal control: the discrete-time case.Google Scholar
- Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press.Google ScholarDigital Library
- Murat Cubuktepe, Nils Jansen, Sebastian Junges, Joost-Pieter Katoen, and Ufuk Topcu. 2021. Convex Optimization for Parameter Synthesis in MDPs. IEEE Trans. Automat. Control (2021).Google ScholarCross Ref
- Giannis Delimpaltadakis, Luca Laurenti, and Manuel Mazo Jr. 2022. Formal Analysis of the Sampling Behaviour of Stochastic Event-Triggered Control. arXiv preprint arXiv:2202.10178 (2022).Google Scholar
- Maxence Dutreix, Jeongmin Huh, and Samuel Coogan. 2022. Abstraction-based synthesis for stochastic systems with omega-regular objectives. Nonlinear Analysis: Hybrid Systems 45 (2022), 101204.Google ScholarCross Ref
- James E Falk. 1973. A linear maxβmin problem. Mathematical Programming 5, 1 (1973), 169β188.Google ScholarDigital Library
- Sicun Gao, Soonho Kong, and Edmund M Clarke. 2013. dReal: An SMT solver for nonlinear theories over the reals. In International conference on automated deduction. Springer, 208β214.Google ScholarDigital Library
- Robert Givan, Sonia Leach, and Thomas Dean. 2000. Bounded-parameter Markov decision processes. Artificial Intelligence 122, 1-2 (2000), 71β109.Google ScholarDigital Library
- Ernst Moritz Hahn, Holger Hermanns, and Lijun Zhang. 2011. Probabilistic reachability for parametric Markov models. International Journal on Software Tools for Technology Transfer 13, 1 (2011), 3β19.Google ScholarDigital Library
- John Jackson, Luca Laurenti, Eric Frew, and Morteza Lahijanian. 2021. Strategy synthesis for partially-known switched stochastic systems. In Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control. 1β11.Google ScholarDigital Library
- Xenofon Koutsoukos and Derek Riley. 2006. Computational methods for reachability analysis of stochastic hybrid systems. In International Workshop on Hybrid Systems: Computation and Control. Springer, 377β391.Google ScholarDigital Library
- Morteza Lahijanian, Sean B Andersson, and Calin Belta. 2015. Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Automat. Control 60, 8 (2015), 2031β2045.Google ScholarCross Ref
- Ruggero Lanotte, Andrea Maggiolo-Schettini, and Angelo Troina. 2007. Parametric probabilistic transition systems for system design and analysis. Formal Aspects of Computing 19, 1 (2007), 93β109.Google ScholarDigital Library
- Luca Laurenti, Morteza Lahijanian, Alessandro Abate, Luca Cardelli, and Marta Kwiatkowska. 2020. Formal and efficient synthesis for continuous-time linear stochastic hybrid processes. IEEE Trans. Automat. Control 66, 1 (2020), 17β32.Google ScholarCross Ref
- Abolfazl Lavaei, Sadegh Soudjani, Alessandro Abate, and Majid Zamani. 2022. Automated verification and synthesis of stochastic hybrid systems: A survey. Automatica 146 (2022), 110617.Google ScholarDigital Library
- Arnab Nilim and Laurent El Ghaoui. 2005. Robust control of Markov decision processes with uncertain transition matrices. Operations Research 53, 5 (2005), 780β798.Google ScholarDigital Library
- Andrzej S Nowak. 1984. On zero-sum stochastic games with general state space I. Probability and Mathematical Statistics 4, 1 (1984), 13β32.Google Scholar
- R Tyrrell Rockafellar. 1970. Convex analysis. Vol. 18. Princeton university press.Google Scholar
- G George Yin and Chao Zhu. 2009. Hybrid switching diffusions: properties and applications. Vol. 63. Springer Science & Business Media.Google Scholar
Index Terms
- Interval Markov Decision Processes with Continuous Action-Spaces
Recommendations
Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes
This paper deals with a mean-variance problem for finite horizon semi-Markov decision processes. The state and action spaces are Borel spaces, while the reward function may be unbounded. The goal is to seek an optimal policy with minimal finite horizon ...
The risk probability criterion for discounted continuous-time Markov decision processes
In this paper, we consider the risk probability minimization problem for infinite discounted continuous-time Markov decision processes (CTMDPs) with unbounded transition rates. First, we introduce a class of policies depending on histories with the ...
Partially Observable Risk-Sensitive Markov Decision Processes
We consider the problem of minimizing a certainty equivalent of the total or discounted cost over a finite and an infinite time horizon that is generated by a partially observable Markov decision process POMDP. In contrast to a risk-neutral decision ...
Comments