Abstract
Stochastic and dynamic vehicle routing problems gain increasing attention in the research community. In these problems, routing plans are dynamically updated based on realizations of stochastic information. Due to the complexity of the corresponding Markov decision processes (MDPs), the calculation of optimal policies for these problems is usually not possible and researchers draw on heuristical methods of approximate dynamic programming (ADP). These methods use simulation to approximate the value of a state and decision in the MDP. The simulations are either conducted offline or online. Offline methods such as value function approximations (VFAs) generally neglect the full detail of the state space due to aggregation. Online methods such as rollout algorithms (RAs) are often not able to capture decision and transition space sufficiently due to runtime limitations. In this paper, we alleviate this tradeoff by combining two methods of ADP, an online RA and an offline VFA in two ways. In addition to the integration of the VFA as a base policy into the online RA to strengthen the RA’s simulations, we also limit the RA’s simulation horizon, estimating the remaining reward-to-go again via the VFA. For two stochastic dynamic routing problems from the literature, we show how this combination outperforms state-of-the-art solutions while simultaneously reducing the required time for online calculations.
Similar content being viewed by others
References
Albareda-Sambola M, Fernández E, Laporte G (2014) The dynamic multiperiod vehicle routing problem with probabilistic information. Comput Oper Res 48:31–39
Angelelli E, Bianchessi N, Renata Mansini M, Speranza G (2009) Short term strategies for a dynamic multi-period routing problem. Transp Res C Emerg Technol 17(2):106–119
Archetti C, Ola Jabali M, Speranza G (2015) Multi-period vehicle routing problem with due dates. Comput Oper Res 61:122–134
Azi N, Gendreau M, Potvin J-Y (2012) A dynamic vehicle routing problem with multiple delivery routes. Ann Oper Res 199(1):103–112
Bent RW, Van Hentenryck P (2004) Scenario-based planning for partially dynamic vehicle routing with stochastic customers. Oper Res 52(6):977–987
Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of monte carlo tree search methods. IEEE Trans Comput Intell AI Games 4(1):1–43
Chen X, Thomas BW, Hewitt M (2017) Multi-period technician scheduling with experience-based service times and stochastic customers. Comput Oper Res 82:1–14
Ferrucci F, Bock S, Gendreau M (2013) A pro-active real-time control approach for dynamic vehicle routing problems dealing with the delivery of urgent goods. Eur J Oper Res 225(1):130–141
Gendreau M, Jabali O, Rei W (2016) 50th anniversary invited articlefuture research directions in stochastic vehicle routing. Transp Sci 50(4):1163–1173
Ghiani G, Manni E, Quaranta A, Triki C (2009) Anticipatory algorithms for same-day courier dispatching. Transp Res E Logist Transp Rev 45(1):96–106
Goodson JC (2010) Solution methodologies for vehicle routing problems with stochastic demand. University of Iowa, Iowa
Goodson JC, Thomas BW, Ohlmann JW (2016) Restocking-based rollout policies for the vehicle routing problem with stochastic demand and duration limits. Transp Sci 50(2):591–607
Goodson JC, Thomas BW, Ohlmann JW (2017) A rollout algorithm framework for heuristic solutions to finite-horizon stochastic dynamic programs. Eur J Oper Res 258(1):216–229
Ichoua S, Gendreau M, Potvin J-Y (2006) Exploiting knowledge about future demands for real-time vehicle dispatching. Transp Sci 40(2):211–225
Kim S-H, Nelson BL (2006) On the asymptotic validity of fully sequential selection procedures for steady-state simulation. Oper Res 54(3):475–488
Klapp MA, Erera AL, Toriello A (2018a) The dynamic dispatch waves problem for same-day delivery. Eur J Oper Res 271(2):519–534
Klapp MA, Erera AL, Toriello A (2018b) The one-dimensional dynamic dispatch waves problem. Transp Sci 52(2):402–415
Li H, Womer N (2015) Solving stochastic resource-constrained project scheduling problems by closed-loop approximate dynamic programming. Eur J Oper Res 246:20–33
Meisel S (2011) Anticipatory optimization for dynamic decision making, operations research/computer science interfaces series, vol 51. Springer, New York
Powell WB (2011) approximate dynamic programming: solving the curses of dimensionality, Wiley series in probability and statistics, vol 842. Wiley, New York
Powell WB, Meisel S (2016) Tutorial on stochastic optimization in energy-part II: an energy storage illustration. IEEE Trans Power Syst 31(2):1468–1475
Powell WB, Simao HP, Bouzaiene-Ayari B (2012) Approximate dynamic programming in transportation and logistics: a unified framework. EURO J Transp Logist 1(3):237–284
Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Rosenkrantz DJ, Stearns RE, Lewis PM (1974) Approximate algorithms for the traveling salesperson problem. In: IEEE conference record of 15th annual symposium on switching and automata theory. IEEE, pp 33–42
Sarasola B, Doerner KF, Schmid V, Alba E (2016) Variable neighborhood search for the stochastic and dynamic vehicle routing problem. Ann Oper Res 236(2):425–461
Savelsbergh M, Van Woensel T (2016) 50th anniversary invited article–city logistics: challenges and opportunities. Transp Sci 50(2):579–590
Schilde M, Doerner KF, Hartl RF (2014) Integrating stochastic time-dependent travel speed in solution methods for the dynamic dial-a-ride problem. Eur J Oper Res 238(1):18–30
Speranza MG (2018) Trends in transportation and logistics. Eur J Oper Res 264(3):830–836
Thomas BW (2007) Waiting strategies for anticipating service requests from known customer locations. Transp Sci 41(3):319–331
Tirado G, Hvattum LM (2017) Improved solutions to dynamic and stochastic maritime pick-up and delivery problems using local search. Ann Oper Res 253(2):825–843
Ulmer MW, Goodson JC, Mattfeld DC, Hennig M (2018a) Offline-online approximate dynamic programming for dynamic vehicle routing with stochastic requests. Transp Sci. https://doi.org/10.1287/trsc.2017.0767
Ulmer MW, Goodson JC, Mattfeld DC, Thomas BW (2017) Dynamic vehicle routing: literature review and modeling framework (under review)
Ulmer MW, Mattfeld DC, Hennig M, Goodson JC (2015) A rollout algorithm for vehicle routing with stochastic customer requests. Logistics Management, pp 217–227
Ulmer MW, Thomas BW, Mattfeld DC (2018d) Preemptive depot returns for dynamic same-day delivery. EURO J Transp Logist. https://doi.org/10.1007/s13676-018-0124-0
Ulmer MW (2017) Approximate dynamic programming for dynamic vehicle routing, operations research/computer science interfaces series, vol 61. Springer, New York
Ulmer MW, Soeffker N, Mattfeld DC (2018c) Value function approximation for dynamic multi-period vehicle routing. Eur J Oper Res 269(3):883–899
Ulmer MW, Mattfeld DC, Köster F (2018b) Budgeting time for dynamic vehicle routing with stochastic customer requests. Transp Sci 52(1):20–37
Voccia SA, Campbell AM, Thomas BW (2017) The same-day delivery problem for online purchases. Transp Sci. https://doi.org/10.1287/trsc.2016.0732
Vonolfen S, Affenzeller M (2016) Distribution of waiting time for dynamic pickup and delivery problems. Ann Oper Res 236(2):359–382
Wen M, Cordeau J-F, Laporte G, Larsen J (2010) The dynamic multi-period vehicle routing problem. Comput Oper Res 37(9):1615–1623
Zhang S, Ohlmann JW, Thomas BW (2018) Dynamic orienteering on a network of queues. Transp Sci 52:691–706
Author information
Authors and Affiliations
Corresponding author
Additional information
The author is grateful to Justin C. Goodson, Dirk C. Mattfeld, Warren B. Powell, Ninja Soeffker, and Barrett W. Thomas for their valuable advice. The author further thanks the Editor and Reviewers for their detailed and thorough reviews. They were significantly helpful for the revision of content and presentation of this paper.
Appendix
Appendix
In the Appendix, we present describe the routing heuristic and the detailed algorithm for the V-LHRA. We further describe how the customer distributions are generated and finally provide the results of the tuning and runtimes in detail.
1.1 Routing heuristic
The routing is conducted with a cheapest insertion procedure (CI). The tour is initialized and updated regarding the following procedure: In \(\delta =1\) and \(t=0\), CI starts with a pre-decision tour \(\theta _0=(D,D)\) only consisting of the depot. In every decision point k, a pre-decision tour \(\theta ^{\delta }_k\) and a set of candidate subsets \(\hat{{\mathcal {C}}}^a_k \subseteq {\mathcal {C}}^r_k, i=1,\dots ,2^{|{\mathcal {C}}^r_k|}\), i.e., the power set of new requests is given. For every candidate subset, CI subsequently selects the cheapest request of the subset regarding the insertion time and adds it at the cheapest insertion position in the tour. The same procedure is applied for \(\theta ^{\delta +1}_k\) with the postponed requests \({\mathcal {C}}^r_k\backslash \hat{{\mathcal {C}}}^a_k\). The routing is induced by the accepted subset \({{\mathcal {C}}}^a_k\) and the according post-decision tour \(\theta ^{\delta ,x}_k\). If the selected routing decision of x is not waiting, the first customer in the tour \(C_{\text {next}}=\theta _1\) is the next customer to visit. After the travel to the next customer, the visited customer is removed from \(\theta ^{\delta ,x}_k\) resulting in pre-decision tour \(\theta ^{\delta }_{k+1}\). Waiting is applied if no customers are left to serve, i.e., \(\theta ^{\delta }_k=({\mathcal {P}}_k,{\mathcal {D}})\), but free time budget is left. In this case, the pre-decision tour \(\theta {\delta }_{k+1}\) is identical to \(\theta ^{\delta ,x}_k\). If no time is left, the vehicle returns to the depot. A candidate subset and the according decision are considered feasible if the resulting candidate post-decision tour does not exceed the time limit. Because the decision space is reduced, CI may reject candidate subsets, which are feasible regarding a different routing approach, e.g., an optimal traveling salesman solution.
1.2 V-LHRA algorithm
Algorithm 1 describes the procedure of decision making for a VFA-based limited horizon post-decision RA. Let a state \(S_k\), decisions \(x_1,\dots ,x_n\), the reward-function R, the PDSs \({\mathcal {P}}_k=(S_k^{x_1},\dots , S_k^{x_n})\), a set of sampled realizations \(\{\omega _1,\dots ,\omega _m\}\), the VFA base policy \(\pi _{\text {VFA}}\), and the VFA-values \(\tilde{V}\) be given. Then, for every PDS \(S_k^x\), the RA simulates the next h decision points of every realization \(\omega _i\). In the decision points, the decision \(X^{\pi _{\text {VFA}}}(S_{k+j})\) induced by the VFA base policy \(\pi _{\text {VFA}}\) is applied and the rewards \(R(S_{k+j},X^{\pi _{\text {VFA}}}(S_{k+j}))\) are accumulated. After the h decision points, the value \(\tilde{V}(S^x_{k+h})\) is added to the accumulated rewards. The overall reward-to-go of a PDS \(\hat{V}(S_k^{x})\) is the average of the single rewards per realization. The RA selects the decision \(x^*\) leading to the maximum sum of immediate reward \(R(S_k,x^*)\) and expected future rewards \(\hat{V}(S_k^{x^*})\).
1.3 Customer distributions
In this section, we describe the spatial customer distributions for small service area size. For the large service area, the quantities are multiplied by 1.5. Given U, a spatial realization (x, y) is defined as \((x,y) \sim U[0,10] \times U[0,10]\). For 2C, the customers are equally distributed to each cluster. The cluster centers are located at \(\mu _1=(2.5,2.5), \mu _2=(7.5,7.5)\). The standard deviation within the clusters is \(\sigma =0.5\). For 3C, the cluster centers are located at \(\mu _1=(2.5,2.5), \mu _2=(2.5,7.5), \mu _3=(7.5,5)\). \(50\%\) of the requests are assigned to cluster two, \(25\%\) to each other cluster. The standard deviations are set to \(\sigma =0.5\).
1.4 Results
In this section, we present the best simulation horizons and the maximal runtimes per decision point per instance setting.
Rights and permissions
About this article
Cite this article
Ulmer, M.W. Horizontal combinations of online and offline approximate dynamic programming for stochastic dynamic vehicle routing. Cent Eur J Oper Res 28, 279–308 (2020). https://doi.org/10.1007/s10100-018-0588-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10100-018-0588-x