Planning using hierarchical constrained Markov decision processes

Feyzabadi, Seyedshams; Carpin, Stefano

doi:10.1007/s10514-017-9630-4

Planning using hierarchical constrained Markov decision processes

Published: 12 April 2017

Volume 41, pages 1589–1607, (2017)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

1005 Accesses
4 Citations
Explore all metrics

Abstract

Constrained Markov decision processes offer a principled method to determine policies for sequential stochastic decision problems where multiple costs are concurrently considered. Although they could be very valuable in numerous robotic applications, to date their use has been quite limited. Among the reasons for their limited adoption is their computational complexity, since policy computation requires the solution of constrained linear programs with an extremely large number of variables. To overcome this limitation, we propose a hierarchical method to solve large problem instances. States are clustered into macro states and the parameters defining the dynamic behavior and the costs of the clustered model are determined using a Monte Carlo approach. We show that the algorithm we propose to create clustered states maintains valuable properties of the original model, like the existence of a solution for the problem. Our algorithm is validated in various planning problems in simulation and on a mobile robot platform, and we experimentally show that the clustered approach significantly outperforms the non-hierarchical solution while experiencing only moderate losses in terms of objective functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Path Planning and Trajectory Planning Algorithms: A General Overview

Game-theoretic multi-agent motion planning in a mixed environment

Article 15 March 2024

Semantic anomaly detection with large language models

Article 23 October 2023

Notes

Note that even if the policy is deterministic the action at time t is a random variable since it is a function of the random variable $X_t$.
To the best of our knowledge no method has been proposed to analytically estimate costs and probabilities.
The set of paths define a policy because for each vertex they identify an edge to traverse along the shortest path, and by construction this edge is associated with an action.
For states close the boundary or to an obstacle, the action set is adjusted by removing actions that would violate these constraints.
This means that if $S_i = S_{i+1}$ we remove the latter from the sequence and we reiterate this step until $S_i\ne S_{i+1}$ for all symbols left in the sequence.

References

Altman, E. (1999). Constrained Markov decision processes. Boca Raton: CRC Press.
MATH Google Scholar
Bai, A., Wu, F., & Chen, X. (2012). Online planning for large MDPs with MAXQ decomposition. In Proceedings of the 11th international conference on autonomous agents and multiagent systems (Vol. 3, pp. 1215–1216).
Barry, J., Kaelbling, L. P., & Lozano-Pérez, T. (2010). Hierarchical solution of large Markov decision processes. Technical report, MIT.
Barry, J. L., Kaelbling, L. P., & Lozano-Pérez, T. T. (2011). DetH*: Approximate hierarchical solution of large markov decision processes. In International joint conference on artificial intelligence (IJCAI).
Bertsekas, D. P. (2005). Dynamic programming and optimal control (Vol. 1, 2). Belmont, MA: Athena Scientific.
MATH Google Scholar
Bouvrie, J., & Maggioni, M. (2012). Efficient solution of Markov decision problems with multiscale representations. In 2012 50th annual Allerton conference on communication, control, and computing (Allerton) (pp. 474–481). IEEE.
Carpin, S., Pavone, M., & Sadler, B. M. (2014). Rapid multirobot deployment with time constraints. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1147–1154).
Chow, Y.-L., Pavone, M., Sadler, B. M., & Carpin, S. (2015). Trading safety versus performance: rapid deployment of robotic swarms with robust performance constraints. ASME Journal of Dynamic Systems, Measurement and Control, 137(3), 031005-1–031005-11.
Dai, P., & Goldsmith, J. (2007). Topological value iteration algorithm for Markov decision processes. In Proceedings of the international joint conference on artificial intelligence (pp. 1860–1865).
Dai, P., Mausam, M., & Weld, D. S. (2009). Focused topological value iteration. In International conference on automated planning and scheduling.
Dai, P., Mausam, M., Weld, D. S., & Goldsmith, J. (2011). Topological value iteration algorithms. Journal of Artificial Intelligence Research, 42(1), 181–209.
MathSciNet MATH Google Scholar
Ding, X. C., Englot, B., Pinto, A., Speranzon, A., & Surana, A. (2014). Hierarchical multi-objective planning: From mission specifications to contingency management. In 2014 IEEE international conference on robotics and automation (ICRA) (pp .3735–3742). IEEE.
Ding, X. C., Pinto, A., & Surana, A. (2013). Strategic planning under uncertainties via constrained Markov decision processes. In Proceedings of the IEEE international conference on robotics and automation (pp. 4568–4575).
El Chamie, M., & Açikmeşe, B. (2016). Convex synthesis of optimal policies for Markov decision processes with sequentially-observed transitions. In Proceedings of the American control conference (pp. 3862–3867).
Feyzabadi, S., & Carpin, S. (2014). Risk aware path planning using hierarchical constrained Markov decision processes. In Proceedings of the IEEE international conference on automation science and engineering (pp. 297–303).
Feyzabadi, S., & Carpin, S. (2015). HCMDP: A hierarchical solution to constrained markov decision processes. In Proceedings of the IEEE international conference on robotics and automation (pp. 3791–3798).
Grisetti, G., Stachniss, C., & Burgard, W. (2007). Improved techniques for grid mapping with Rao-Blackwellized particle filters. IEEE Transactions on Robotics, 23(1), 36–46.
Article Google Scholar
Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of Markov decision processes using macro-actions. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence (pp. 220–229). Morgan Kaufmann Publishers.
Hoey, J., St-Aubin, R., Hu, A.J., & Boutilier, C. C. (1999). SPUDD: Stochastic planning using decision diagrams. In Proceedings of uncertainty in artificial intelligence (pp .279–288).
Karaman, S., & Frazzoli, E. (2011). Sampling-based algorithms for optimal motion planning. International Journal of Robotics Research, 30(7), 846–894.
Article MATH Google Scholar
Kavraki, L. E., Švetska, P., Latombe, J. C., & Overmars, M. H. (1996). Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4), 566–580.
Article Google Scholar
Kochenderfer, M. J. (2015). Decision making under uncertainty: Theory and application. Cambridge: MIT Press.
MATH Google Scholar
LaValle, S. M. (2006). Planning algorithms. Cambridge: Cambridge University Press.
Book MATH Google Scholar
LaValle, S. M., & Kuffner, J. J. (2001). Randomized kinodynamic planning. International Journal of Robotics Research, 20(5), 378–400.
Article Google Scholar
Moldovan, T. M., & Abbeel, P. (2012). Risk aversion in Markov decision processes via near optimal Chernoff bounds. In NIPS (pp. 3140–3148).
Pineau, J., Roy, N., & Thrun, S. (2001). A hierarchical approach to pomdp planning and execution. In Workshop on hierarchy and memory in reinforcement learning (ICML) (Vol. 65, p. 51).
Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming. Hoboken: Wiley-Interscience.
MATH Google Scholar
Thrun, S., Burgard, W., & Fox, D. (2006). Probabilistic robotics. Cambridge: MIT Press.
MATH Google Scholar
Vien, N. A., & Toussaint, M. (2015). Hierarchical Monte-Carlo planning. In AAAI (pp. 3613–3619).

Download references

Acknowledgements

This paper extends preliminary results presented in Feyzabadi and Carpin (2015). This work is supported by the National Institute of Standards and Technology under cooperative agreement 70NANB12H143. Any opinions, findings, and conclusions or recommendations expressed in these materials are those of the authors and should not be interpreted as representing the official policies, either expressly or implied, of the funding agencies of the U.S. Government.

Author information

Authors and Affiliations

School of Engineering, University of California, Merced, 5200 North Lake Rd., Merced, CA, 95343, USA
Seyedshams Feyzabadi & Stefano Carpin

Authors

Seyedshams Feyzabadi
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Carpin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefano Carpin.

Appendix

Proof of Theorem 1

Definition 2 establishes two conditions for saying that an HCMDP preserves connectivity. The first requires that $\mathcal {X}_H$ is a partition of $\mathcal {X}$. Algorithm 1 never considers a state twice, i.e., once a state has been assigned to a cluster it will not be considered again for assignment (line 4). Moreover, the main loop ensures that all states in $\mathcal {X}$ are assigned to a cluster. Therefore, $\mathcal {X}_H$ is a partition of $\mathcal {X}$.

We next turn to the second condition. Let $z \in \mathcal {X}'$ and $y\in M$ be two states such that $z\leadsto y$. By definition this means that there exists a sequence of states $\mathcal {S}=s_1,s_2,\dots ,s_n$ such that for each $1\le i\le n-1$ $P_{s_i,s_{i+1}}^{u_i}$ for some $u_i\in U(s_i)$ and $s_1=y$ and $s_n = z$. Since $\mathcal {X}_H$ is a partition of $\mathcal {X}$, this sequence of states is associated with a sequence of macrostates $Z_H=S_1\dots S_n=Y_H$ such that $s_i \in S_i$ for each i. Note that in general there could be some repeated elements in the sequence of macrostates. Let $S_1,\dots S_k$ $(k\le n)$ be the sequence obtained removing subsequences of repeated macrostates.^{Footnote 5} First note that this sequence includes at least two elements. This is true because we started assuming $y\notin M$ while $z\in M$. According to Algorithm 1 all and only the states in M are mapped to an individual macrostate (line 1), so y cannot be in the same macrostate as z. Next, consider two successive elements in the sequence of macrostates, say $S_i$ and $S_{i+1}$. By construction, there exist two successive states in $\mathcal {S}$, say $s_j$ and $s_{j+1}$, such that $s_j \in S_i$ and $s_{j+1} \in S_{i+1}$. Since these two states are part of $\mathcal {S}$, there exists one input $u_j\in U(s_j)$ such that $P_{s_j,s_{j+1}}^{u_j}>0$. As per Eq. (7), this implies that an action $S_{j+1}$ is added to the set of actions $U(S_j)$. Next, consider the method described in Sect. 4.3, and in particular the definition of the boundary B between two macro states. It follows that $s_{j+1} \in B_{S_i,S_{i+1}}$. The algorithm further continues computing the shortest path between each state in $S_i$ and B, where the shortest path is computed over the induced graph G. For $s_i$ the path trivially consists of a single edge to $s_{i+1}$ (or some other vertex in B that is also one hop away from $s_i$.) Next, the algorithm randomly selects one vertex from $S_j$ using a uniform distribution and executes the policy to reach B. Let m be the total number of Monte Carlo samples generated. Then the probability that the estimate of $P_{S_i,S_i+1}^{S_{i+1}}=0$ is bounded from above by

$$\begin{aligned} (1-\gamma )^{k_1}\left( 1-P_{s_j,s_{j+1}}^{u_j}\right) ^{k_2} \end{aligned}$$

where $\gamma = \frac{1}{S_i}$, $k_1$ is the number of times $s_j$ was not sampled and $k_2$ is the number of times $s_j$ was sampled ($k_1+k_2 =m, k_{1,2}\ge 0$). This proves that as the total number of samples m grows, the estimate for $P_{S_i,S_i+1}^{S_{i+1}}$ will be eventually be positive. This reasoning can be repeated for each couple of successive macro states, thus showing that $Z_H\leadsto Y_H$, and this concludes the proof. $\square $

Proof of Theorem 2

We start observing that Algorithm 3 builds and solves a sequence of HCMDPs. Each is a CMDP with a suitable set of parameters and at every iteration the constrained linear program given in Eq. (5) is solved. Theorem 1 guarantees that state $M_H$ is accessible from every macrostate, and therefore there exists at least one policy $\pi '$ for which $c(\pi ')$ is finite. Let us next consider the inequality constraints in Eq. (5). If the linear program is not feasible, then each bound $D_{i,H}$ is increased by $\varDelta D_{i,H}$ (line 6.) By construction, all costs $d_{i,H}(x,u)\ge 0$ for each state/action pair (x, u). Let $n_s = |\mathcal {K}_H'|$ be the number of state/action pairs in the HCMDP, $d_{max} = \max _{(x,u)\in K_H'} \{d_{i,H}(x,u)\}$ the largest among the additional costs, and $D_{min} = \min \{\varDelta D_{i,H}\}$ the smallest among the increments in line 6. Therefore after at most $\lceil \frac{n_sd_{max}}{D_{min}}\rceil $ iterations all inequality constraints become feasible. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feyzabadi, S., Carpin, S. Planning using hierarchical constrained Markov decision processes. Auton Robot 41, 1589–1607 (2017). https://doi.org/10.1007/s10514-017-9630-4

Download citation

Received: 22 June 2016
Accepted: 01 April 2017
Published: 12 April 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10514-017-9630-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Planning using hierarchical constrained Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Path Planning and Trajectory Planning Algorithms: A General Overview

Game-theoretic multi-agent motion planning in a mixed environment

Semantic anomaly detection with large language models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Planning using hierarchical constrained Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Path Planning and Trajectory Planning Algorithms: A General Overview

Game-theoretic multi-agent motion planning in a mixed environment

Semantic anomaly detection with large language models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation