Abstract
Formal models of cyber-physical systems, such as priced timed Markov decision processes, require a state space with continuous and discrete components. The problem of controller synthesis for such systems then can be cast as finding optimal strategies for Markov decision processes over a Euclidean state space. We develop two different reinforcement learning strategies that tackle the problem of continuous state spaces via online partition refinement techniques. We provide theoretical insights into the convergence of partition refinement schemes. Our techniques are implemented in . Experimental results show the advantages of our new techniques over previous optimization algorithms of .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We shall assume that the equation system has a solution for the considered MDP and goal set under any strategy.
- 2.
\(\mathcal {A}\) is consistent with \(\mathcal {G}\) if for any \(\nu \in \mathcal {A}\) either \(\nu \subseteq \mathcal {G}\) of \(\nu \cap \mathcal {G}=\emptyset \).
- 3.
Here \(\mathcal {C}_\mathcal {A}^{\inf }(\nu ,a,\nu ')= \underset{s\in \nu ,s'\in \nu '}{\inf }\mathcal {C}(s,a,s')\).
- 4.
Notice that Definition 1 is only applicable to integrable transition functions, voiding most statistical assumptions, including normality. We thus merely provide heuristics.
- 5.
- 6.
- 7.
References
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artif. Intell. 72(1–2), 81–138 (1995). https://doi.org/10.1016/0004-3702(94)00011-O. ISSN 0004–3702
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees (1984)
D’Argenio, P.R., Jeannet, B., Jensen, H.E., Larsen, K.G.: Reduction and refinement strategies for probabilistic analysis. In: Hermanns, H., Segala, R. (eds.) PAPM-PROBMIV 2002. LNCS, vol. 2399, pp. 57–76. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45605-8_5
David, A., et al.: Statistical model checking for networks of priced timed automata. In: Fahrenberg, U., Tripakis, S. (eds.) FORMATS 2011. LNCS, vol. 6919, pp. 80–96. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24310-3_7
David, A., et al.: On time with minimal expected cost!. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 129–145. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_10
David, A., Jensen, P.G., Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Uppaal Stratego. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 206–211. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_16
David, A., Larsen, K.G., Legay, A., Mikucionis, M., Poulsen, D.B.: Uppaal SMC tutorial. STTT 17(4), 397–415 (2015). https://doi.org/10.1007/s10009-014-0361-y
Henriques, D., Martins, J.G., Zuliani, P., Platzer, A., Clarke, E.M.: Statistical model checking for Markov decision processes. In: QEST 2012, pp. 84–93 (2012). https://doi.org/10.1109/QEST.2012.19
Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 3–29. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_1
Kwiatkowska, M.Z., Norman, G., Parker, D.: Game-based abstraction for Markov decision processes. In: QEST 2006, pp. 157–166. IEEE Computer Society (2006). https://doi.org/10.1109/QEST.2006.19. ISBN 0-7695-2665-9
Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Safe and optimal adaptive cruise control. In: Meyer, R., Platzer, A., Wehrheim, H. (eds.) Correct System Design. LNCS, vol. 9360, pp. 260–277. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23506-6_17
Larsen, K.G., Mikučionis, M., Muñiz, M., Srba, J., Taankvist, J.H.: Online and compositional learning of controllers with application to floor heating. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 244–259. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_14
Larsen, K.G., Le Coënt, A., Mikučionis, M., Taankvist, J.H.: Guaranteed control synthesis for continuous systems in Uppaal Tiga. In: Chamberlain, R., Taha, W., Törngren, M. (eds.) CyPhy/WESE-2018. LNCS, vol. 11615, pp. 113–133. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23703-5_6. ISBN 978-3-030-23703-5
Lun, Y.Z., Wheatley, J., D’Innocenzo, A., Abate, A.: Approximate abstractions of Markov chains with interval decision processes. ADHS 2018, pp. 91–96 (2018). https://doi.org/10.1016/j.ifacol.2018.08.016
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Strehl, L. Li, A.L., Littman, M.L.: Incremental model-based learners with formal learning-time guarantees. CoRR (2012)
Sun, L., Guo, Y., Barbu, A.: A novel framework for online supervised learning with feature selection. arXiv e-prints, art. arXiv:1803.11521 (2018)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2018)
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)
Welford, B.P.: Note on a method for calculating corrected sums of squares and products. Technometrics 4(3), 419–420 (1962). https://doi.org/10.1080/00401706.1962.10490022
Acknowledgements
This work is partly supported by the Innovation Fund Denmark center DiCyPS, the ERC Advanced Grant LASSO, and the JST ERATO project: HASUO Metamathematics for Systems Design (JPMJER1603).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jaeger, M., Jensen, P.G., Guldstrand Larsen, K., Legay, A., Sedwards, S., Taankvist, J.H. (2019). Teaching Stratego to Play Ball: Optimal Synthesis for Continuous Space MDPs. In: Chen, YF., Cheng, CH., Esparza, J. (eds) Automated Technology for Verification and Analysis. ATVA 2019. Lecture Notes in Computer Science(), vol 11781. Springer, Cham. https://doi.org/10.1007/978-3-030-31784-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-31784-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31783-6
Online ISBN: 978-3-030-31784-3
eBook Packages: Computer ScienceComputer Science (R0)