Horizontal combinations of online and offline approximate dynamic programming for stochastic dynamic vehicle routing

Ulmer, Marlin W.

doi:10.1007/s10100-018-0588-x

Horizontal combinations of online and offline approximate dynamic programming for stochastic dynamic vehicle routing

Original Paper
Published: 04 October 2018

Volume 28, pages 279–308, (2020)
Cite this article

Central European Journal of Operations Research Aims and scope Submit manuscript

Marlin W. Ulmer ORCID: orcid.org/0000-0003-2499-6570¹

873 Accesses
14 Citations
Explore all metrics

Abstract

Stochastic and dynamic vehicle routing problems gain increasing attention in the research community. In these problems, routing plans are dynamically updated based on realizations of stochastic information. Due to the complexity of the corresponding Markov decision processes (MDPs), the calculation of optimal policies for these problems is usually not possible and researchers draw on heuristical methods of approximate dynamic programming (ADP). These methods use simulation to approximate the value of a state and decision in the MDP. The simulations are either conducted offline or online. Offline methods such as value function approximations (VFAs) generally neglect the full detail of the state space due to aggregation. Online methods such as rollout algorithms (RAs) are often not able to capture decision and transition space sufficiently due to runtime limitations. In this paper, we alleviate this tradeoff by combining two methods of ADP, an online RA and an offline VFA in two ways. In addition to the integration of the VFA as a base policy into the online RA to strengthen the RA’s simulations, we also limit the RA’s simulation horizon, estimating the remaining reward-to-go again via the VFA. For two stochastic dynamic routing problems from the literature, we show how this combination outperforms state-of-the-art solutions while simultaneously reducing the required time for online calculations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Multistage Stochastic Programming Approach to the Dynamic and Stochastic VRPTW

Anticipation in Dynamic Vehicle Routing

Dynamic routing with real-time traffic information

Article 15 May 2017

Guodong Yu & Yu Yang

References

Albareda-Sambola M, Fernández E, Laporte G (2014) The dynamic multiperiod vehicle routing problem with probabilistic information. Comput Oper Res 48:31–39
Article Google Scholar
Angelelli E, Bianchessi N, Renata Mansini M, Speranza G (2009) Short term strategies for a dynamic multi-period routing problem. Transp Res C Emerg Technol 17(2):106–119
Article Google Scholar
Archetti C, Ola Jabali M, Speranza G (2015) Multi-period vehicle routing problem with due dates. Comput Oper Res 61:122–134
Article Google Scholar
Azi N, Gendreau M, Potvin J-Y (2012) A dynamic vehicle routing problem with multiple delivery routes. Ann Oper Res 199(1):103–112
Article Google Scholar
Bent RW, Van Hentenryck P (2004) Scenario-based planning for partially dynamic vehicle routing with stochastic customers. Oper Res 52(6):977–987
Article Google Scholar
Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of monte carlo tree search methods. IEEE Trans Comput Intell AI Games 4(1):1–43
Article Google Scholar
Chen X, Thomas BW, Hewitt M (2017) Multi-period technician scheduling with experience-based service times and stochastic customers. Comput Oper Res 82:1–14
Article Google Scholar
Ferrucci F, Bock S, Gendreau M (2013) A pro-active real-time control approach for dynamic vehicle routing problems dealing with the delivery of urgent goods. Eur J Oper Res 225(1):130–141
Article Google Scholar
Gendreau M, Jabali O, Rei W (2016) 50th anniversary invited articlefuture research directions in stochastic vehicle routing. Transp Sci 50(4):1163–1173
Article Google Scholar
Ghiani G, Manni E, Quaranta A, Triki C (2009) Anticipatory algorithms for same-day courier dispatching. Transp Res E Logist Transp Rev 45(1):96–106
Article Google Scholar
Goodson JC (2010) Solution methodologies for vehicle routing problems with stochastic demand. University of Iowa, Iowa
Book Google Scholar
Goodson JC, Thomas BW, Ohlmann JW (2016) Restocking-based rollout policies for the vehicle routing problem with stochastic demand and duration limits. Transp Sci 50(2):591–607
Article Google Scholar
Goodson JC, Thomas BW, Ohlmann JW (2017) A rollout algorithm framework for heuristic solutions to finite-horizon stochastic dynamic programs. Eur J Oper Res 258(1):216–229
Article Google Scholar
Ichoua S, Gendreau M, Potvin J-Y (2006) Exploiting knowledge about future demands for real-time vehicle dispatching. Transp Sci 40(2):211–225
Article Google Scholar
Kim S-H, Nelson BL (2006) On the asymptotic validity of fully sequential selection procedures for steady-state simulation. Oper Res 54(3):475–488
Article Google Scholar
Klapp MA, Erera AL, Toriello A (2018a) The dynamic dispatch waves problem for same-day delivery. Eur J Oper Res 271(2):519–534
Article Google Scholar
Klapp MA, Erera AL, Toriello A (2018b) The one-dimensional dynamic dispatch waves problem. Transp Sci 52(2):402–415
Article Google Scholar
Li H, Womer N (2015) Solving stochastic resource-constrained project scheduling problems by closed-loop approximate dynamic programming. Eur J Oper Res 246:20–33
Article Google Scholar
Meisel S (2011) Anticipatory optimization for dynamic decision making, operations research/computer science interfaces series, vol 51. Springer, New York
Book Google Scholar
Powell WB (2011) approximate dynamic programming: solving the curses of dimensionality, Wiley series in probability and statistics, vol 842. Wiley, New York
Book Google Scholar
Powell WB, Meisel S (2016) Tutorial on stochastic optimization in energy-part II: an energy storage illustration. IEEE Trans Power Syst 31(2):1468–1475
Article Google Scholar
Powell WB, Simao HP, Bouzaiene-Ayari B (2012) Approximate dynamic programming in transportation and logistics: a unified framework. EURO J Transp Logist 1(3):237–284
Article Google Scholar
Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Google Scholar
Rosenkrantz DJ, Stearns RE, Lewis PM (1974) Approximate algorithms for the traveling salesperson problem. In: IEEE conference record of 15th annual symposium on switching and automata theory. IEEE, pp 33–42
Sarasola B, Doerner KF, Schmid V, Alba E (2016) Variable neighborhood search for the stochastic and dynamic vehicle routing problem. Ann Oper Res 236(2):425–461
Article Google Scholar
Savelsbergh M, Van Woensel T (2016) 50th anniversary invited article–city logistics: challenges and opportunities. Transp Sci 50(2):579–590
Article Google Scholar
Schilde M, Doerner KF, Hartl RF (2014) Integrating stochastic time-dependent travel speed in solution methods for the dynamic dial-a-ride problem. Eur J Oper Res 238(1):18–30
Article Google Scholar
Speranza MG (2018) Trends in transportation and logistics. Eur J Oper Res 264(3):830–836
Article Google Scholar
Thomas BW (2007) Waiting strategies for anticipating service requests from known customer locations. Transp Sci 41(3):319–331
Article Google Scholar
Tirado G, Hvattum LM (2017) Improved solutions to dynamic and stochastic maritime pick-up and delivery problems using local search. Ann Oper Res 253(2):825–843
Article Google Scholar
Ulmer MW, Goodson JC, Mattfeld DC, Hennig M (2018a) Offline-online approximate dynamic programming for dynamic vehicle routing with stochastic requests. Transp Sci. https://doi.org/10.1287/trsc.2017.0767
Article Google Scholar
Ulmer MW, Goodson JC, Mattfeld DC, Thomas BW (2017) Dynamic vehicle routing: literature review and modeling framework (under review)
Ulmer MW, Mattfeld DC, Hennig M, Goodson JC (2015) A rollout algorithm for vehicle routing with stochastic customer requests. Logistics Management, pp 217–227
Ulmer MW, Thomas BW, Mattfeld DC (2018d) Preemptive depot returns for dynamic same-day delivery. EURO J Transp Logist. https://doi.org/10.1007/s13676-018-0124-0
Article Google Scholar
Ulmer MW (2017) Approximate dynamic programming for dynamic vehicle routing, operations research/computer science interfaces series, vol 61. Springer, New York
Book Google Scholar
Ulmer MW, Soeffker N, Mattfeld DC (2018c) Value function approximation for dynamic multi-period vehicle routing. Eur J Oper Res 269(3):883–899
Article Google Scholar
Ulmer MW, Mattfeld DC, Köster F (2018b) Budgeting time for dynamic vehicle routing with stochastic customer requests. Transp Sci 52(1):20–37
Article Google Scholar
Voccia SA, Campbell AM, Thomas BW (2017) The same-day delivery problem for online purchases. Transp Sci. https://doi.org/10.1287/trsc.2016.0732
Article Google Scholar
Vonolfen S, Affenzeller M (2016) Distribution of waiting time for dynamic pickup and delivery problems. Ann Oper Res 236(2):359–382
Article Google Scholar
Wen M, Cordeau J-F, Laporte G, Larsen J (2010) The dynamic multi-period vehicle routing problem. Comput Oper Res 37(9):1615–1623
Article Google Scholar
Zhang S, Ohlmann JW, Thomas BW (2018) Dynamic orienteering on a network of queues. Transp Sci 52:691–706
Article Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Braunschweig, Braunschweig, Germany
Marlin W. Ulmer

Authors

Marlin W. Ulmer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marlin W. Ulmer.

Additional information

The author is grateful to Justin C. Goodson, Dirk C. Mattfeld, Warren B. Powell, Ninja Soeffker, and Barrett W. Thomas for their valuable advice. The author further thanks the Editor and Reviewers for their detailed and thorough reviews. They were significantly helpful for the revision of content and presentation of this paper.

Appendix

In the Appendix, we present describe the routing heuristic and the detailed algorithm for the V-LHRA. We further describe how the customer distributions are generated and finally provide the results of the tuning and runtimes in detail.

1.1 Routing heuristic

The routing is conducted with a cheapest insertion procedure (CI). The tour is initialized and updated regarding the following procedure: In \(\delta =1\) and \(t=0\), CI starts with a pre-decision tour \(\theta _0=(D,D)\) only consisting of the depot. In every decision point k, a pre-decision tour \(\theta ^{\delta }_k\) and a set of candidate subsets \(\hat{{\mathcal {C}}}^a_k \subseteq {\mathcal {C}}^r_k, i=1,\dots ,2^{|{\mathcal {C}}^r_k|}\), i.e., the power set of new requests is given. For every candidate subset, CI subsequently selects the cheapest request of the subset regarding the insertion time and adds it at the cheapest insertion position in the tour. The same procedure is applied for \(\theta ^{\delta +1}_k\) with the postponed requests \({\mathcal {C}}^r_k\backslash \hat{{\mathcal {C}}}^a_k\). The routing is induced by the accepted subset \({{\mathcal {C}}}^a_k\) and the according post-decision tour \(\theta ^{\delta ,x}_k\). If the selected routing decision of x is not waiting, the first customer in the tour \(C_{\text {next}}=\theta _1\) is the next customer to visit. After the travel to the next customer, the visited customer is removed from \(\theta ^{\delta ,x}_k\) resulting in pre-decision tour \(\theta ^{\delta }_{k+1}\). Waiting is applied if no customers are left to serve, i.e., \(\theta ^{\delta }_k=({\mathcal {P}}_k,{\mathcal {D}})\), but free time budget is left. In this case, the pre-decision tour \(\theta {\delta }_{k+1}\) is identical to \(\theta ^{\delta ,x}_k\). If no time is left, the vehicle returns to the depot. A candidate subset and the according decision are considered feasible if the resulting candidate post-decision tour does not exceed the time limit. Because the decision space is reduced, CI may reject candidate subsets, which are feasible regarding a different routing approach, e.g., an optimal traveling salesman solution.

1.2 V-LHRA algorithm

Algorithm 1 describes the procedure of decision making for a VFA-based limited horizon post-decision RA. Let a state \(S_k\), decisions \(x_1,\dots ,x_n\), the reward-function R, the PDSs \({\mathcal {P}}_k=(S_k^{x_1},\dots , S_k^{x_n})\), a set of sampled realizations \(\{\omega _1,\dots ,\omega _m\}\), the VFA base policy \(\pi _{\text {VFA}}\), and the VFA-values \(\tilde{V}\) be given. Then, for every PDS \(S_k^x\), the RA simulates the next h decision points of every realization \(\omega _i\). In the decision points, the decision \(X^{\pi _{\text {VFA}}}(S_{k+j})\) induced by the VFA base policy \(\pi _{\text {VFA}}\) is applied and the rewards \(R(S_{k+j},X^{\pi _{\text {VFA}}}(S_{k+j}))\) are accumulated. After the h decision points, the value \(\tilde{V}(S^x_{k+h})\) is added to the accumulated rewards. The overall reward-to-go of a PDS \(\hat{V}(S_k^{x})\) is the average of the single rewards per realization. The RA selects the decision \(x^*\) leading to the maximum sum of immediate reward \(R(S_k,x^*)\) and expected future rewards \(\hat{V}(S_k^{x^*})\).

1.3 Customer distributions

In this section, we describe the spatial customer distributions for small service area size. For the large service area, the quantities are multiplied by 1.5. Given U, a spatial realization (x, y) is defined as \((x,y) \sim U[0,10] \times U[0,10]\). For 2C, the customers are equally distributed to each cluster. The cluster centers are located at \(\mu _1=(2.5,2.5), \mu _2=(7.5,7.5)\). The standard deviation within the clusters is \(\sigma =0.5\). For 3C, the cluster centers are located at \(\mu _1=(2.5,2.5), \mu _2=(2.5,7.5), \mu _3=(7.5,5)\). \(50\%\) of the requests are assigned to cluster two, \(25\%\) to each other cluster. The standard deviations are set to \(\sigma =0.5\).

1.4 Results

In this section, we present the best simulation horizons and the maximal runtimes per decision point per instance setting.

See Tables 4, 5, 6, 7 and 8.

Table 4 Results: improvement compared to Myopic (in %)

Full size table

Table 5 Results: best horizon, \(\delta _{\text {max}}=1\)

Full size table

Table 6 Results: best horizon, \(\delta _{\text {max}}=3\)

Full size table

Table 7 Results: maximal runtime per decision point (in s), \(\delta _{\text {max}}=1\)

Full size table

Table 8 Results: maximal runtime per decision point (in s), \(\delta _{\text {max}}=3\)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ulmer, M.W. Horizontal combinations of online and offline approximate dynamic programming for stochastic dynamic vehicle routing. Cent Eur J Oper Res 28, 279–308 (2020). https://doi.org/10.1007/s10100-018-0588-x

Download citation

Published: 04 October 2018
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10100-018-0588-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Horizontal combinations of online and offline approximate dynamic programming for stochastic dynamic vehicle routing

Abstract

Access this article

Similar content being viewed by others

A Multistage Stochastic Programming Approach to the Dynamic and Stochastic VRPTW

Anticipation in Dynamic Vehicle Routing

Dynamic routing with real-time traffic information

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Routing heuristic

1.2 V-LHRA algorithm

1.3 Customer distributions

1.4 Results

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Horizontal combinations of online and offline approximate dynamic programming for stochastic dynamic vehicle routing

Abstract

Access this article

Similar content being viewed by others

A Multistage Stochastic Programming Approach to the Dynamic and Stochastic VRPTW

Anticipation in Dynamic Vehicle Routing

Dynamic routing with real-time traffic information

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Routing heuristic

1.2 V-LHRA algorithm

1.3 Customer distributions

1.4 Results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation