Interval Markov Decision Processes with Continuous Action-Spaces

Authors:
Giannis Delimpaltadakis

Control Systems Technology Group, Mechanical Engineering, Eindhoven University of Technology, Netherlands

Control Systems Technology Group, Mechanical Engineering, Eindhoven University of Technology, Netherlands

0000-0002-2651-0629
View Profile

,
Morteza Lahijanian

Aerospace Engineering Sciences, University of Colorado Boulder, USA

Aerospace Engineering Sciences, University of Colorado Boulder, USA

0000-0001-7549-4365
View Profile

,
Manuel Mazo Jr.

Delft Center for Systems and Control, Delft University of Technology, Netherlands

Delft Center for Systems and Control, Delft University of Technology, Netherlands

0000-0002-5638-5283
View Profile

,
Luca Laurenti

Delft Center for Systems and Control, Delft University of Technology, Netherlands

Delft Center for Systems and Control, Delft University of Technology, Netherlands

0000-0003-1190-6097
View Profile

HSCC '23: Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and ControlMay 2023Article No.: 12Pages 1–10https://doi.org/10.1145/3575870.3587117

Published:09 May 2023Publication History

HSCC '23: Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control

Pages 1–10

ABSTRACT

Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to |𝒬| max problems, where |𝒬| is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set 𝒜 is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of 𝒜, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.

References

Dimitri P Bertsekas and Steven Shreve. 2004. Stochastic optimal control: the discrete-time case.Google Scholar
Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press.Google ScholarDigital Library
Murat Cubuktepe, Nils Jansen, Sebastian Junges, Joost-Pieter Katoen, and Ufuk Topcu. 2021. Convex Optimization for Parameter Synthesis in MDPs. IEEE Trans. Automat. Control (2021).Google ScholarCross Ref
Giannis Delimpaltadakis, Luca Laurenti, and Manuel Mazo Jr. 2022. Formal Analysis of the Sampling Behaviour of Stochastic Event-Triggered Control. arXiv preprint arXiv:2202.10178 (2022).Google Scholar
Maxence Dutreix, Jeongmin Huh, and Samuel Coogan. 2022. Abstraction-based synthesis for stochastic systems with omega-regular objectives. Nonlinear Analysis: Hybrid Systems 45 (2022), 101204.Google ScholarCross Ref
James E Falk. 1973. A linear max—min problem. Mathematical Programming 5, 1 (1973), 169–188.Google ScholarDigital Library
Sicun Gao, Soonho Kong, and Edmund M Clarke. 2013. dReal: An SMT solver for nonlinear theories over the reals. In International conference on automated deduction. Springer, 208–214.Google ScholarDigital Library
Robert Givan, Sonia Leach, and Thomas Dean. 2000. Bounded-parameter Markov decision processes. Artificial Intelligence 122, 1-2 (2000), 71–109.Google ScholarDigital Library
Ernst Moritz Hahn, Holger Hermanns, and Lijun Zhang. 2011. Probabilistic reachability for parametric Markov models. International Journal on Software Tools for Technology Transfer 13, 1 (2011), 3–19.Google ScholarDigital Library
John Jackson, Luca Laurenti, Eric Frew, and Morteza Lahijanian. 2021. Strategy synthesis for partially-known switched stochastic systems. In Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control. 1–11.Google ScholarDigital Library
Xenofon Koutsoukos and Derek Riley. 2006. Computational methods for reachability analysis of stochastic hybrid systems. In International Workshop on Hybrid Systems: Computation and Control. Springer, 377–391.Google ScholarDigital Library
Morteza Lahijanian, Sean B Andersson, and Calin Belta. 2015. Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Automat. Control 60, 8 (2015), 2031–2045.Google ScholarCross Ref
Ruggero Lanotte, Andrea Maggiolo-Schettini, and Angelo Troina. 2007. Parametric probabilistic transition systems for system design and analysis. Formal Aspects of Computing 19, 1 (2007), 93–109.Google ScholarDigital Library
Luca Laurenti, Morteza Lahijanian, Alessandro Abate, Luca Cardelli, and Marta Kwiatkowska. 2020. Formal and efficient synthesis for continuous-time linear stochastic hybrid processes. IEEE Trans. Automat. Control 66, 1 (2020), 17–32.Google ScholarCross Ref
Abolfazl Lavaei, Sadegh Soudjani, Alessandro Abate, and Majid Zamani. 2022. Automated verification and synthesis of stochastic hybrid systems: A survey. Automatica 146 (2022), 110617.Google ScholarDigital Library
Arnab Nilim and Laurent El Ghaoui. 2005. Robust control of Markov decision processes with uncertain transition matrices. Operations Research 53, 5 (2005), 780–798.Google ScholarDigital Library
Andrzej S Nowak. 1984. On zero-sum stochastic games with general state space I. Probability and Mathematical Statistics 4, 1 (1984), 13–32.Google Scholar
R Tyrrell Rockafellar. 1970. Convex analysis. Vol. 18. Princeton university press.Google Scholar
G George Yin and Chao Zhu. 2009. Hybrid switching diffusions: properties and applications. Vol. 63. Springer Science & Business Media.Google Scholar

Index Terms

Interval Markov Decision Processes with Continuous Action-Spaces
1. Computing methodologies
  1. Artificial intelligence
    1. Control methods
      1. Computational control theory
    2. Planning and scheduling
      1. Planning under uncertainty
2. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
      1. Continuous optimization
        Stochastic control and optimization
  2. Probability and statistics
    1. Stochastic processes
      1. Markov processes

Recommendations

Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes

This paper deals with a mean-variance problem for finite horizon semi-Markov decision processes. The state and action spaces are Borel spaces, while the reward function may be unbounded. The goal is to seek an optimal policy with minimal finite horizon ...
Read More
The risk probability criterion for discounted continuous-time Markov decision processes

In this paper, we consider the risk probability minimization problem for infinite discounted continuous-time Markov decision processes (CTMDPs) with unbounded transition rates. First, we introduce a class of policies depending on histories with the ...
Read More
Partially Observable Risk-Sensitive Markov Decision Processes

We consider the problem of minimizing a certainty equivalent of the total or discounted cost over a finite and an infinite time horizon that is generated by a partially observable Markov decision process POMDP. In contrast to a risk-neutral decision ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HSCC '23: Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control
May 2023
239 pages
ISBN:9798400700330
DOI:10.1145/3575870

Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2023
Check for updates
Badges
- Results Reproduced / v1.1
Author Tags
bounded-parameter Markov decision processes
control synthesis
planning under uncertainty
uncertain Markov decision processes
value iteration
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate153of373submissions,41%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 265
  Total Downloads
- Downloads (Last 12 months)265
- Downloads (Last 6 weeks)37
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Interval Markov Decision Processes with Continuous Action-Spaces

HSCC '23: Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes

The risk probability criterion for discounted continuous-time Markov decision processes

Partially Observable Risk-Sensitive Markov Decision Processes