Skip to main content
Log in

Horizontal combinations of online and offline approximate dynamic programming for stochastic dynamic vehicle routing

  • Original Paper
  • Published:
Central European Journal of Operations Research Aims and scope Submit manuscript

Abstract

Stochastic and dynamic vehicle routing problems gain increasing attention in the research community. In these problems, routing plans are dynamically updated based on realizations of stochastic information. Due to the complexity of the corresponding Markov decision processes (MDPs), the calculation of optimal policies for these problems is usually not possible and researchers draw on heuristical methods of approximate dynamic programming (ADP). These methods use simulation to approximate the value of a state and decision in the MDP. The simulations are either conducted offline or online. Offline methods such as value function approximations (VFAs) generally neglect the full detail of the state space due to aggregation. Online methods such as rollout algorithms (RAs) are often not able to capture decision and transition space sufficiently due to runtime limitations. In this paper, we alleviate this tradeoff by combining two methods of ADP, an online RA and an offline VFA in two ways. In addition to the integration of the VFA as a base policy into the online RA to strengthen the RA’s simulations, we also limit the RA’s simulation horizon, estimating the remaining reward-to-go again via the VFA. For two stochastic dynamic routing problems from the literature, we show how this combination outperforms state-of-the-art solutions while simultaneously reducing the required time for online calculations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Albareda-Sambola M, Fernández E, Laporte G (2014) The dynamic multiperiod vehicle routing problem with probabilistic information. Comput Oper Res 48:31–39

    Article  Google Scholar 

  • Angelelli E, Bianchessi N, Renata Mansini M, Speranza G (2009) Short term strategies for a dynamic multi-period routing problem. Transp Res C Emerg Technol 17(2):106–119

    Article  Google Scholar 

  • Archetti C, Ola Jabali M, Speranza G (2015) Multi-period vehicle routing problem with due dates. Comput Oper Res 61:122–134

    Article  Google Scholar 

  • Azi N, Gendreau M, Potvin J-Y (2012) A dynamic vehicle routing problem with multiple delivery routes. Ann Oper Res 199(1):103–112

    Article  Google Scholar 

  • Bent RW, Van Hentenryck P (2004) Scenario-based planning for partially dynamic vehicle routing with stochastic customers. Oper Res 52(6):977–987

    Article  Google Scholar 

  • Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of monte carlo tree search methods. IEEE Trans Comput Intell AI Games 4(1):1–43

    Article  Google Scholar 

  • Chen X, Thomas BW, Hewitt M (2017) Multi-period technician scheduling with experience-based service times and stochastic customers. Comput Oper Res 82:1–14

    Article  Google Scholar 

  • Ferrucci F, Bock S, Gendreau M (2013) A pro-active real-time control approach for dynamic vehicle routing problems dealing with the delivery of urgent goods. Eur J Oper Res 225(1):130–141

    Article  Google Scholar 

  • Gendreau M, Jabali O, Rei W (2016) 50th anniversary invited articlefuture research directions in stochastic vehicle routing. Transp Sci 50(4):1163–1173

    Article  Google Scholar 

  • Ghiani G, Manni E, Quaranta A, Triki C (2009) Anticipatory algorithms for same-day courier dispatching. Transp Res E Logist Transp Rev 45(1):96–106

    Article  Google Scholar 

  • Goodson JC (2010) Solution methodologies for vehicle routing problems with stochastic demand. University of Iowa, Iowa

    Book  Google Scholar 

  • Goodson JC, Thomas BW, Ohlmann JW (2016) Restocking-based rollout policies for the vehicle routing problem with stochastic demand and duration limits. Transp Sci 50(2):591–607

    Article  Google Scholar 

  • Goodson JC, Thomas BW, Ohlmann JW (2017) A rollout algorithm framework for heuristic solutions to finite-horizon stochastic dynamic programs. Eur J Oper Res 258(1):216–229

    Article  Google Scholar 

  • Ichoua S, Gendreau M, Potvin J-Y (2006) Exploiting knowledge about future demands for real-time vehicle dispatching. Transp Sci 40(2):211–225

    Article  Google Scholar 

  • Kim S-H, Nelson BL (2006) On the asymptotic validity of fully sequential selection procedures for steady-state simulation. Oper Res 54(3):475–488

    Article  Google Scholar 

  • Klapp MA, Erera AL, Toriello A (2018a) The dynamic dispatch waves problem for same-day delivery. Eur J Oper Res 271(2):519–534

    Article  Google Scholar 

  • Klapp MA, Erera AL, Toriello A (2018b) The one-dimensional dynamic dispatch waves problem. Transp Sci 52(2):402–415

    Article  Google Scholar 

  • Li H, Womer N (2015) Solving stochastic resource-constrained project scheduling problems by closed-loop approximate dynamic programming. Eur J Oper Res 246:20–33

    Article  Google Scholar 

  • Meisel S (2011) Anticipatory optimization for dynamic decision making, operations research/computer science interfaces series, vol 51. Springer, New York

    Book  Google Scholar 

  • Powell WB (2011) approximate dynamic programming: solving the curses of dimensionality, Wiley series in probability and statistics, vol 842. Wiley, New York

    Book  Google Scholar 

  • Powell WB, Meisel S (2016) Tutorial on stochastic optimization in energy-part II: an energy storage illustration. IEEE Trans Power Syst 31(2):1468–1475

    Article  Google Scholar 

  • Powell WB, Simao HP, Bouzaiene-Ayari B (2012) Approximate dynamic programming in transportation and logistics: a unified framework. EURO J Transp Logist 1(3):237–284

    Article  Google Scholar 

  • Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York

    Google Scholar 

  • Rosenkrantz DJ, Stearns RE, Lewis PM (1974) Approximate algorithms for the traveling salesperson problem. In: IEEE conference record of 15th annual symposium on switching and automata theory. IEEE, pp 33–42

  • Sarasola B, Doerner KF, Schmid V, Alba E (2016) Variable neighborhood search for the stochastic and dynamic vehicle routing problem. Ann Oper Res 236(2):425–461

    Article  Google Scholar 

  • Savelsbergh M, Van Woensel T (2016) 50th anniversary invited article–city logistics: challenges and opportunities. Transp Sci 50(2):579–590

    Article  Google Scholar 

  • Schilde M, Doerner KF, Hartl RF (2014) Integrating stochastic time-dependent travel speed in solution methods for the dynamic dial-a-ride problem. Eur J Oper Res 238(1):18–30

    Article  Google Scholar 

  • Speranza MG (2018) Trends in transportation and logistics. Eur J Oper Res 264(3):830–836

    Article  Google Scholar 

  • Thomas BW (2007) Waiting strategies for anticipating service requests from known customer locations. Transp Sci 41(3):319–331

    Article  Google Scholar 

  • Tirado G, Hvattum LM (2017) Improved solutions to dynamic and stochastic maritime pick-up and delivery problems using local search. Ann Oper Res 253(2):825–843

    Article  Google Scholar 

  • Ulmer MW, Goodson JC, Mattfeld DC, Hennig M (2018a) Offline-online approximate dynamic programming for dynamic vehicle routing with stochastic requests. Transp Sci. https://doi.org/10.1287/trsc.2017.0767

    Article  Google Scholar 

  • Ulmer MW, Goodson JC, Mattfeld DC, Thomas BW (2017) Dynamic vehicle routing: literature review and modeling framework (under review)

  • Ulmer MW, Mattfeld DC, Hennig M, Goodson JC (2015) A rollout algorithm for vehicle routing with stochastic customer requests. Logistics Management, pp 217–227

  • Ulmer MW, Thomas BW, Mattfeld DC (2018d) Preemptive depot returns for dynamic same-day delivery. EURO J Transp Logist. https://doi.org/10.1007/s13676-018-0124-0

    Article  Google Scholar 

  • Ulmer MW (2017) Approximate dynamic programming for dynamic vehicle routing, operations research/computer science interfaces series, vol 61. Springer, New York

    Book  Google Scholar 

  • Ulmer MW, Soeffker N, Mattfeld DC (2018c) Value function approximation for dynamic multi-period vehicle routing. Eur J Oper Res 269(3):883–899

    Article  Google Scholar 

  • Ulmer MW, Mattfeld DC, Köster F (2018b) Budgeting time for dynamic vehicle routing with stochastic customer requests. Transp Sci 52(1):20–37

    Article  Google Scholar 

  • Voccia SA, Campbell AM, Thomas BW (2017) The same-day delivery problem for online purchases. Transp Sci. https://doi.org/10.1287/trsc.2016.0732

    Article  Google Scholar 

  • Vonolfen S, Affenzeller M (2016) Distribution of waiting time for dynamic pickup and delivery problems. Ann Oper Res 236(2):359–382

    Article  Google Scholar 

  • Wen M, Cordeau J-F, Laporte G, Larsen J (2010) The dynamic multi-period vehicle routing problem. Comput Oper Res 37(9):1615–1623

    Article  Google Scholar 

  • Zhang S, Ohlmann JW, Thomas BW (2018) Dynamic orienteering on a network of queues. Transp Sci 52:691–706

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marlin W. Ulmer.

Additional information

The author is grateful to Justin C. Goodson, Dirk C. Mattfeld, Warren B. Powell, Ninja Soeffker, and Barrett W. Thomas for their valuable advice. The author further thanks the Editor and Reviewers for their detailed and thorough reviews. They were significantly helpful for the revision of content and presentation of this paper.

Appendix

Appendix

figure a

In the Appendix, we present describe the routing heuristic and the detailed algorithm for the V-LHRA. We further describe how the customer distributions are generated and finally provide the results of the tuning and runtimes in detail.

1.1 Routing heuristic

The routing is conducted with a cheapest insertion procedure (CI). The tour is initialized and updated regarding the following procedure: In \(\delta =1\) and \(t=0\), CI starts with a pre-decision tour \(\theta _0=(D,D)\) only consisting of the depot. In every decision point k, a pre-decision tour \(\theta ^{\delta }_k\) and a set of candidate subsets \(\hat{{\mathcal {C}}}^a_k \subseteq {\mathcal {C}}^r_k, i=1,\dots ,2^{|{\mathcal {C}}^r_k|}\), i.e., the power set of new requests is given. For every candidate subset, CI subsequently selects the cheapest request of the subset regarding the insertion time and adds it at the cheapest insertion position in the tour. The same procedure is applied for \(\theta ^{\delta +1}_k\) with the postponed requests \({\mathcal {C}}^r_k\backslash \hat{{\mathcal {C}}}^a_k\). The routing is induced by the accepted subset \({{\mathcal {C}}}^a_k\) and the according post-decision tour \(\theta ^{\delta ,x}_k\). If the selected routing decision of x is not waiting, the first customer in the tour \(C_{\text {next}}=\theta _1\) is the next customer to visit. After the travel to the next customer, the visited customer is removed from \(\theta ^{\delta ,x}_k\) resulting in pre-decision tour \(\theta ^{\delta }_{k+1}\). Waiting is applied if no customers are left to serve, i.e., \(\theta ^{\delta }_k=({\mathcal {P}}_k,{\mathcal {D}})\), but free time budget is left. In this case, the pre-decision tour \(\theta {\delta }_{k+1}\) is identical to \(\theta ^{\delta ,x}_k\). If no time is left, the vehicle returns to the depot. A candidate subset and the according decision are considered feasible if the resulting candidate post-decision tour does not exceed the time limit. Because the decision space is reduced, CI may reject candidate subsets, which are feasible regarding a different routing approach, e.g., an optimal traveling salesman solution.

1.2 V-LHRA algorithm

Algorithm 1 describes the procedure of decision making for a VFA-based limited horizon post-decision RA. Let a state \(S_k\), decisions \(x_1,\dots ,x_n\), the reward-function R, the PDSs \({\mathcal {P}}_k=(S_k^{x_1},\dots , S_k^{x_n})\), a set of sampled realizations \(\{\omega _1,\dots ,\omega _m\}\), the VFA base policy \(\pi _{\text {VFA}}\), and the VFA-values \(\tilde{V}\) be given. Then, for every PDS \(S_k^x\), the RA simulates the next h decision points of every realization \(\omega _i\). In the decision points, the decision \(X^{\pi _{\text {VFA}}}(S_{k+j})\) induced by the VFA base policy \(\pi _{\text {VFA}}\) is applied and the rewards \(R(S_{k+j},X^{\pi _{\text {VFA}}}(S_{k+j}))\) are accumulated. After the h decision points, the value \(\tilde{V}(S^x_{k+h})\) is added to the accumulated rewards. The overall reward-to-go of a PDS \(\hat{V}(S_k^{x})\) is the average of the single rewards per realization. The RA selects the decision \(x^*\) leading to the maximum sum of immediate reward \(R(S_k,x^*)\) and expected future rewards \(\hat{V}(S_k^{x^*})\).

1.3 Customer distributions

In this section, we describe the spatial customer distributions for small service area size. For the large service area, the quantities are multiplied by 1.5. Given U, a spatial realization (xy) is defined as \((x,y) \sim U[0,10] \times U[0,10]\). For 2C, the customers are equally distributed to each cluster. The cluster centers are located at \(\mu _1=(2.5,2.5), \mu _2=(7.5,7.5)\). The standard deviation within the clusters is \(\sigma =0.5\). For 3C, the cluster centers are located at \(\mu _1=(2.5,2.5), \mu _2=(2.5,7.5), \mu _3=(7.5,5)\). \(50\%\) of the requests are assigned to cluster two, \(25\%\) to each other cluster. The standard deviations are set to \(\sigma =0.5\).

1.4 Results

In this section, we present the best simulation horizons and the maximal runtimes per decision point per instance setting.

See Tables 4567 and 8.

Table 4 Results: improvement compared to Myopic (in %)
Table 5 Results: best horizon, \(\delta _{\text {max}}=1\)
Table 6 Results: best horizon, \(\delta _{\text {max}}=3\)
Table 7 Results: maximal runtime per decision point (in s), \(\delta _{\text {max}}=1\)
Table 8 Results: maximal runtime per decision point (in s), \(\delta _{\text {max}}=3\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ulmer, M.W. Horizontal combinations of online and offline approximate dynamic programming for stochastic dynamic vehicle routing. Cent Eur J Oper Res 28, 279–308 (2020). https://doi.org/10.1007/s10100-018-0588-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10100-018-0588-x

Keywords

Navigation