A Tutorial on Newton Methods for Constrained Trajectory Optimization and Relations to SLAM, Gaussian Process Smoothing, Optimal Control, and Probabilistic Inference

Toussaint, Marc

doi:10.1007/978-3-319-51547-2_15

Marc Toussaint⁶

Part of the book series: Springer Tracts in Advanced Robotics ((STAR,volume 117))

2477 Accesses
17 Citations

Abstract

Many state-of-the-art approaches to trajectory optimization and optimal control are intimately related to standard Newton methods. For researchers that work in the intersections of machine learning, robotics, control, and optimization, such relations are highly relevant but sometimes hard to see across disciplines, due also to the different notations and conventions used in the disciplines. The aim of this tutorial is to introduce to constrained trajectory optimization in a manner that allows us to establish these relations. We consider a basic but general formalization of the problem and discuss the structure of Newton steps in this setting. The computation of Newton steps can then be related to dynamic programming, establishing relations to DDP, iLQG, and AICO. We can also clarify how inverting a banded symmetric matrix is related to dynamic programming as well as message passing in Markov chains and factor graphs. Further, for a machine learner, path optimization and Gaussian Processes seem intuitively related problems. We establish such a relation and show how to solve a Gaussian Process-regularized path optimization problem efficiently. Further topics include how to derive an optimal controller around the path, model predictive control in constrained k-order control processes, and the pullback metric interpretation of the Gauss–Newton approximation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Extended LQR: Locally-Optimal Feedback Control for Systems with Non-Linear Dynamics and Non-Quadratic Cost

Time-Optimal Path Planning Along Specified Trajectories

STEAP: simultaneous trajectory estimation and planning

Article 24 July 2018

Notes

1.
We use the words path and trajectory interchangeably: we always think of a path as a mapping $[0,T] \rightarrow {\mathbb {R}}^n$, including its temporal profile.
2.
$\partial i$ denotes the neighborhood of feature i in the bipartite graph of features and variables; and thereby indexes the tuple of variables on which the ith feature depends.
3.
$[\textit{expr}]$ is the indicator function of a boolean expression.
4.
There is little literature on the AuLa updates to handle inequalities. The update rule described here is mentioned in by-passing in [22]; a more elaborate, any-time update that does not strictly require $x' = \min _x \hat{L}(x)$ is derived in [33], which also discusses more literature on AuLa.
5.
The time steps can, e.g., be chosen “uniformly” within [0, T], $t_k = T~ {\left\{ \begin{array}{ll} 0 &{} k\le p \\ 1 &{} k\ge K{{{+}}1}\\ \frac{k-p}{K+1-p} &{} \text {otherwise} \end{array}\right. }$, which also assigns $t_{0:p}=0$ and $t_{K{{{+}}1}:K+p+1}=T$, ensuring that $x_0=z_0$ and $x_T=z_K$.
6.
The coefficients can be computed recursively. We initialize $b_k^0(t) = [t_k \le t < t_{k{{{+}}1}}]$ and then compute recursively for $d=1,..,p$
$$\begin{aligned} b_k^d(t) = \frac{t-t_k}{t_{k+d}-t_k} b^{d{{}{-}1}}_k(t) + \frac{t_{k+d+1}-t}{t_{k+d+1}-t_{t{{{+}}1}}} b^{d{{}{-}1}}_{k{{}{-}1}}(t) ~,{(25)} \end{aligned}$$
up to the desired degree p, to get $b(t) \equiv b_{0:K}^p(t)$.
7.
As x is a matrix, $J_x$ is, strictly speaking, a tensor and the above equations are tensor equations in which the t index of B binds to only one index of $J_x$ and $H_x$.
8.
In the AuLa case, $g={\nabla _{\!\!}}\, \hat{L}(x)$, see Eq. (12). In the SQP case, the inner loop for solving the QP (9) would compute Newton steps w.r.t. the Hessian $\bar{H}$.
9.
We use the word separator as in Junction Trees: a separator makes the sub-trees conditionally independent. In the Markov context, the future becomes independent from the past conditional to the separator.
10.
Note the relation to Levenberg–Marquardt regularization.
11.
Why is this a natural relation? Let us assume we have p(x). We want to find a cost quantity f(x) which is some function of p(x). We require that if a certain value $x_1$ is more likely than another, $p(x_1) > p(x_2)$, then picking $x_1$ should imply less cost, $f(x_1) < f(x_2)$ (Axiom 1). Further, when we have two independent random variables x and y probabilities are multiplicative, $p(x,y) = p(x) p(y)$. We require that, for independent variables, cost is additive, $f(x,y) = f(x) + f(y)$ (Axiom 2). From both follows that f needs to be a logarithm of p.

References

R. Bellman, Dynamic programming and lagrange multipliers. Proc. National Acad. Sci. 42(10), 767–769 (1956)
Article MathSciNet MATH Google Scholar
A. Bemporad, M. Morari, V. Dua, E.N. Pistikopoulos, The explicit linear quadratic regulator for constrained systems. Automatica 38(1), 3–20 (2002)
Article MathSciNet MATH Google Scholar
J.T. Betts, Survey of numerical methods for trajectory optimization. J. Guid Control Dyn. 21(2), 193–207 (1998)
Article MathSciNet MATH Google Scholar
A.R. Conn, N.I. Gould, P. Toint, A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal. 28(2), 545–572 (1991)
Article MathSciNet MATH Google Scholar
F. Dellaert, Factor graphs and GTSAM: A hands-on introduction. Technical Report Technical Report GT-RIM-CP&R-2012-002, Georgia Tech (2012)
Google Scholar
M. Diehl, H.J. Ferreau, N. Haverbeke, Efficient numerical methods for nonlinear MPC and moving horizon estimation, in Nonlinear Model Predictive Control (Springer, 2009), pp. 391–417
Google Scholar
C.R. Dohrmann, R.D. Robinett, Dynamic programming method for constrained discrete-time optimal control. J. Optim. Theory Appl. 101(2), 259–283 (1999)
Article MathSciNet MATH Google Scholar
J. Dong, M. Mukadam, F. Dellaert, B. Boots, Motion planning as probabilistic inference using Gaussian processes and factor graphs, in Proceedings of Robotics: Science and Systems (RSS-2016) (2016)
Google Scholar
P. Englert, M. Toussaint, Inverse KKT–learning cost functions of manipulation tasks from demonstrations, in Proceedings of the International Symposium of Robotics Research (2015)
Google Scholar
J. Folkesson, H. Christensen, Graphical SLAM-a self-correcting map, in 2004 IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04, vol. 1 (IEEE, 2004), pp. 383–390
Google Scholar
G.H. Golub, C.F. Van Loan, Matrix Computations, vol. 3 (JHU Press, Baltimore, 2012)
MATH Google Scholar
S.J. Julier, J.K. Uhlmann, New extension of the Kalman filter to nonlinear systems, in AeroSense’97 (International Society for Optics and Photonics, 1997), pp. 182–193
Google Scholar
M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, S. Schaal, STOMP: stochastic trajectory optimization for motion planning, in 2011 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2011), pp. 4569–4574
Google Scholar
H.J. Kappen, V. Gómez, M. Opper, Optimal control as a graphical model inference problem. Mach. Learn. 87(2), 159–182 (2012)
Article MathSciNet MATH Google Scholar
S. Kolev, E. Todorov, Physically consistent state estimation and system identification for contacts, in 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids) (IEEE, 2015), pp. 1036–1043
Google Scholar
F.R. Kschischang, B.J. Frey, H.-A. Loeliger, Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)
Article MathSciNet MATH Google Scholar
R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, W. Burgard, g2o: a general framework for graph optimization, in 2011 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2011), pp. 3607–3613
Google Scholar
J. Lafferty, A. McCallum, F.C. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in Proceedings of 18th International Conference on Machine Learning (ICML) (2001), pp. 282–289
Google Scholar
L.-z. Liao, C. A. Shoemaker, Advantages of differential dynamic programming over Newton’s method for discrete-time optimal control problems. Technical report, Cornell University (1992)
Google Scholar
D. Mayne, A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems. Int. J. Control 3(1), 85–95 (1966)
Article Google Scholar
T.P. Minka, Expectation propagation for approximate Bayesian inference, in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (Morgan Kaufmann Publishers Inc., 2001), pp. 362–369
Google Scholar
J. Nocedal, S. Wright, Numerical Optimization (Springer Science & Business Media, New York, 2006)
MATH Google Scholar
J. Peters, S. Schaal, Natural actor-critic. Neurocomputing 71(7), 1180–1190 (2008)
Article Google Scholar
N. Ratliff, M. Zucker, J.A. Bagnell, S. Srinivasa, CHOMP: gradient optimization techniques for efficient motion planning, in IEEE International Conference on Robotics and Automation, 2009. ICRA’09 (IEEE, 2009), pp. 489–494
Google Scholar
N. Ratliff, M. Toussaint, S. Schaal, Understanding the geometry of workspace obstacles in motion optimization, in 2015 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2015), pp. 4202–4209
Google Scholar
K. Rawlik, M. Toussaint, S. Vijayakumar, On stochastic optimal control and reinforcement learning by approximate inference, in Proceedings of Robotics: Science and Systems (R:SS 2012) (2012). Runner Up Best Paper Award
Google Scholar
J. Schulman, J. Ho, A.X. Lee, I. Awwal, H. Bradlow, P. Abbeel, Finding locally optimal, collision-free trajectories with sequential convex optimization, in Robotics: Science and Systems, vol. 9 (2013), pp. 1–10. Citeseer
Google Scholar
Y. Tassa, N. Mansard, E. Todorov, Control-limited differential dynamic programming, in 2014 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2014), pp. 1168–1175
Google Scholar
S. Thrun, M. Montemerlo, The graph SLAM algorithm with applications to large-scale mapping of urban structures. Int. J. Robot. Res. 25(5–6), 403–429 (2006)
Article Google Scholar
E. Todorov, W. Li, A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems, in American Control Conference, 2005. Proceedings of the 2005 (IEEE, 2005), pp. 300–306
Google Scholar
M. Toussaint, Robot trajectory optimization using approximate inference, in Proceedings of the International Conference on Machine Learning (ICML 2009) (ACM, 2009), pp. 1049–1056. ISBN 978-1-60558-516-1
Google Scholar
M. Toussaint, Pros and cons of truncated Gaussian EP in the context of approximate inference control, in NIPS Workshop on Probabilistic Approaches for Robotics and Control (2009)
Google Scholar
M. Toussaint, A novel augmented lagrangian approach for inequalities and convergent any-time non-central updates. e-Print arXiv:1412.4329 (2014)
M. Toussaint, KOMO: newton methods for k-order markov constrained motion problems. e-Print arXiv:1407.0414 (2014)
N. Vlassis, M. Toussaint, Model-free reinforcement learning as mixture learning, in Proceedings of the International Conference on Machine Learning (ICML 2009) (2009), pp. 1081–1088. ISBN 978-1-60558-516-1
Google Scholar
O. Von Stryk, R. Bulirsch, Direct and indirect methods for trajectory optimization. Ann. Oper. Res. 37(1), 357–373 (1992)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by the DFG under grants TO 409/9-1 and the 3rdHand EU-Project FP7-ICT-2013-10610878.

Author information

Authors and Affiliations

Machine Learning and Robotics Lab, University Stuttgart, Stuttgart, Germany
Marc Toussaint

Authors

Marc Toussaint
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Toussaint .

Editor information

Editors and Affiliations

LAAS-CNRS , Toulouse, France
Jean-Paul Laumond
LAAS-CNRS , Toulouse, France
Nicolas Mansard
LAAS-CNRS , Toulouse, France
Jean-Bernard Lasserre

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Toussaint, M. (2017). A Tutorial on Newton Methods for Constrained Trajectory Optimization and Relations to SLAM, Gaussian Process Smoothing, Optimal Control, and Probabilistic Inference. In: Laumond, JP., Mansard, N., Lasserre, JB. (eds) Geometric and Numerical Foundations of Movements . Springer Tracts in Advanced Robotics, vol 117. Springer, Cham. https://doi.org/10.1007/978-3-319-51547-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-51547-2_15
Published: 03 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51546-5
Online ISBN: 978-3-319-51547-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A Tutorial on Newton Methods for Constrained Trajectory Optimization and Relations to SLAM, Gaussian Process Smoothing, Optimal Control, and Probabilistic Inference

Abstract

Access this chapter

Similar content being viewed by others

Extended LQR: Locally-Optimal Feedback Control for Systems with Non-Linear Dynamics and Non-Quadratic Cost

Time-Optimal Path Planning Along Specified Trajectories

STEAP: simultaneous trajectory estimation and planning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

A Tutorial on Newton Methods for Constrained Trajectory Optimization and Relations to SLAM, Gaussian Process Smoothing, Optimal Control, and Probabilistic Inference

Abstract

Access this chapter

Similar content being viewed by others

Extended LQR: Locally-Optimal Feedback Control for Systems with Non-Linear Dynamics and Non-Quadratic Cost

Time-Optimal Path Planning Along Specified Trajectories

STEAP: simultaneous trajectory estimation and planning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation