1. Introduction
The vehicle routing problem (VRP) involves the assignment a fleet of capacitated vehicles to serve a set of customers in order to minimize the service cost by minimizing the number of vehicles and the total travel distance. The vast majority of VRP research addresses a static problem in which all of the relevant data is known in advance. Typically, service vehicles commence at some initial location and all customer demands and constraints are known such as the distance and travel times between each pair of customers and between each customer and the depot. The dynamic VRP (DVRP) can be described as follows : A daily service schedule is predetermined before the start of the working day and drivers start visiting customer with planned routes. During the service, if new customers call for service, routes must be revised. The revised planning must be done such that appropriate vehicles can visit the new customers in efficient ways. <Figure 1> shows an example of a DVRP with 7 known orders and 1 new order. As shown, some customer orders are already known and an initial route is generated to service these customers. On receipt of new customer order, the existing routes are reconfigured to accommodate the new order. DVRP was first introduced by Kilby et al. [21] and Montemanni et al. [23] introduced some benchmark instances for DVRP.
Practical applications of DVRP can be classified depending on the degree of dynamism proposed by [22]. Let T be the length of the planning horizon; the total number of dynamic orders n_{tot} ; Ʀ the set of dynamic orders, and t_{i} the disclosure time of request i∈Ʀ. Time window is the interval [start (t_{i} ), end (l_{i} )] within which a customer must be serviced. The effective degree of dynamism, ${\delta}_{TW}^{e}$ is defined as follows :
If all the orders are extremely urgent such that l_{i} = t_{i} , then the effective degree of dynamism becomes 1. On the other hand, if all the orders are not urgent such that t_{i} = 0 and l_{i} = T , ${\delta}_{TW}^{e}$ becomes 0. Larsen et al. [22] developed threecategory framework for DVRP classification namely weakly (low level), moderately (medium level), and strongly dynamic (high level) based on the value of the effective degree of dynamism. The values of ${\delta}_{TW}^{e}$ according to the levels above are as follows : < 0.3, between 0.3 and 0.8, and > 0.8 respectively. The cable/ telephone repair service or dial aride/ taxicab service that includes doortodoor services with a great number of dynamic customer requests are sometypical examples of low level dynamism problems. A second application is the appliance repair and courier services which involve single/multiple insertions into a planned day’s tour with time windows are examples of medium level dynamism problems. A third application is the ambulance service where a patient contacts the service center, followed by an evaluation of case severity, and a visit by a practitioner/ ambulance is scheduled accordingly. This we consider as the high level of dynamism problem.
A typical centralized algorithm for DVRP assumes the existence of a dispatcher e.g., headquarter of the company where the tours are recalculated. The dispatcher can communicate to vehicles in order to gather information and assigns the new visits to them. The decision problem faced by a dispatcher is more complicated than the static VRP which is NPhard. In addition, due to the uncertain and dynamically changing information at routes and vehicles, such a centralized problemsolving approach is not appropriate to quickly find a highquality solution for strongly dynamic DVRP. Another barrier for the application of the centralized approach for strongly dynamic DVRP is the applicability of information sharing among the dispatcher and vehicles. In the US, the ambulance service providers are private companies and they are reluctant to reveal the private information of their companies while generating a visiting tour plan with dispatcher (i.e., 911 call center in this case) for urgent patients.
Each entity in the DVRP instance is treated as a collaborator. Each vehicle collaborator knows the information on the customers in its route and exchange information with the depot. A depot collaborator (i.e., dispatcher) is in charge of all the vehicles and knows all the information of old and new customers it is going to serve. Each of these entities has its own objective. The objective of each vehicle collaborator is the traditional minimization of travel time. The objectives of vehicle collaborators are often in competition with each other since the operations of a vehicle collaborator tend to be local based on its local view of the current state. Therefore, there is a need for the depot collaborator who has the global picture of the current state to be able to coordinate the operations of the vehicles. In addition, the objective of the depot collaborator is to minimize the total distance traveled in all the routes.
This paper focuses on developing decentralized cooperation mechanism among vehicle collaborators and the depot collaborator such that the total traveling distance is minimized while the information sharing among collaborators are minimal in a strongly dynamic DVRP environment. The work is organized as follows : in Section 2, we survey previous studies on DVRP. The problem considered in this paper is described in Section 3. The proposed decentralized architecture is described in Section 4. We validate the proposed model via comparison with existing centralized algorithms in Section 5. Finally, some concluding remarks are given in Section 6.
2. Literature Review
In this section, we survey the traditional centralized approaches for DVRP. The model for the dynamic and deterministic VRP can be categorized into two : periodic reoptimization and continuous reoptimization.
The periodic reoptimization models consider initial routes at start of a working day with subsequent reoptimization, either on update of the available data, or at fixed time intervals referred to as time slices [21] or decision epochs [15]. Kilby et al. [21] introduced the specific DVRP model investigated in this research discussing some of the issues created by dynamic problems. They derived benchmark DVRP problems from wellknown classic VRP benchmarks. Montemanni et al. [23] refined the benchmarks introduced by [21] and solved the DVRP using a centralized Ant Colony System (ACS). They considered a DVRP as a periodic standard VRP by decomposing a DVRP into a sequence of static VRPs over time slices. Chen and Xu [5] developed a dynamic column generation (DYCOL) approach for the DVRP with time window (DVRPTW) and introduced the concept of decision epochs over a planning period, i.e. dates to run the optimization process. Hanshar and OmbukiBerman [15] addressed the DVRP using genetic algorithm (GA) and further compared it to a Tabu Search (TS) algorithm. Like [21, 23], they adopted the concept of time slice and evaluated their algorithm based on the benchmark data suggested by [21, 23]. The results showed that the GA outperforms the ACS and TS.
Most recent studies based on the periodic optimization (time slices) scheme that also considered the benchmark data provided by [21. 23] include : Khouadjia et al. [20] applied a dynamic adaptive Particle Swarm Optimization (DAPSO) to the DVRP and further compared the performance of DAPSO to a Variable Neighborhood Search (VNS) algorithm they implemented. Elhassania et al. [10] developed a hybrid algorithm by combining an Ant Colony Optimization (ACO) with a Large Neighborhood Search (LNS) algorithm to solve the DVRP. In their study, a pure ACO was initially adopted as a “control experiment” to the hybrid algorithm (ACOLNS). Demirtas et al. [9] proposed a Particle Swarm Optimization (PSO) which also involved the utilization of local search procedures and a new pathinsertion based heuristic to repair infeasible solutions. Euchi et al. [11] proposed an Artificial Ant Colony based on the 2_Opt local search (AAC_2_Opt) for the DVRP.
Continuous reoptimization models consider optimization throughout the working day while keeping information on good solutions in memory [24]. For any change in available data, a decision procedure collects the information from the memory to update the existing scheduled routes. Note that because the current routing is tentative; vehicles are unaware of their next destination until completion of service of a request [13]. Gendreau et al. [13] applied the parallel Tabu Search framework proposed by [16] to a DVRPTW associated with long distance express courier services. The approach maintains a pool of good routes (in adaptive memory) which are used to generate initial solutions for parallel TS. This framework was also implemented for the DVRP [18, 31]. GAs have also been used for DVRP recently [17]. GAs in dynamic contexts only differ from those designed for static problems in the constant adaptation of solutions to the changes made to the input.
Different variants of DVRP studies exist in the literature. Schilde et al. [25] studied the integration of stochastic timedependent travel speed in solution methods for the dynamic dialaride problem (DARP). A dynamic nearest neighbor policy for the pickup and delivery problem with multiple vehicles was introduced by [26]. The policy is for the operation of a fleet of vehicles used to serve customers within a Euclidean service area according to a Poisson process. Yang et al. [32] also studied a multiobjective distribution model that considers the customers’ satisfaction degree as well as cost for online shopping express logistics and proposed a modified PSO which enhances the particle evolution quality. Dan et al. [8] considered the Emergency Materials Dispatch (EMD) as a typical DVRP and developed a model as well as an ACO algorithm to solve the problem. Barkaoui et al. [2] introduced an adaptive evolutionary approach that uses genetic algorithm for realtime vehicle routing and dispatching. Azi et al. [1] proposed a method based on an adaptive large neighborhood search heuristic to solve a DVRP with multiple delivery routes. Kergosien et al. [19] proposed a tabu search heuristic for the transportation of patients between care units dynamically and Beaudry et al. [3] introduced a twophase heuristic for the dynamic transportation of patients in hospitals. Bent et al. [4] developed a LNS strategy for the DVRP and also considered a multiple scenario approach, which uses past decisions to generate plans. Ichoua et al. [17] studied a similar problem to that in [4] by exploiting information on future events to improve decision making. Zhu et al. [33] considered a realtime DVRP and applied open constraint programming. Coslovich et al. [7] considered an urban dialaride problem and proposed a twophase insertion algorithm. See Gendreau et al. [13] for other neighborhood search heuristics for dialaride problem.
In this research, we study the periodic reoptimization model of DVRP with a decentralized coordination using decentralized system hierarchy.
3. Problem Description
As shown in <Figure 2>, the periodic reoptimization model of DVRP discussed in this research, also studied by [21, 23], can be described as follows : T_{co} is a cutoff time like real world scenarios after which requests arriving are postponed to the next working day. Thus the dynamic orders that arrived in the time window  T_{co, 0}] is called pending orders. At time t = 0, a static VRP is solved. The static VRP can be solved with an assumption that all the relevant information of orders and vehicles are gathered and centered to the depot. This assumption can be applied to both the centralized and decentralized coordination environments. It is reasonable to assume that the relevant information on vehicles are available to the depot since all the vehicles are located at the depot at the beginning of the day. Then the starting time of the service of the pending orders are determined and the orders are transformed to planned order. Among the planned orders that can be serviced within $\left[\text{0,}\hspace{0.17em}\frac{\text{T}}{{\text{n}}_{\text{ts}}}\hspace{0.17em}\text{+}\hspace{0.17em}{\text{T}}_{\text{ac}}\right]$ is called committed order. n_{ts} is the number of time slices. Thus $\frac{\text{T}}{{\text{n}}_{\text{ts}}}$ represents the length of a time slice. T_{ac} is an advanced commitment time. In practice, an order must be committed to a vehicle at least T_{ac} seconds before departure from the last visited location prior to start of service of the committed order. This advanced commitment time gives an appropriate time to react to committed new orders. The vehicle to which a committed order is assigned must leave from its previous customer (last service location). At $\text{t=}\frac{\text{T}}{{\text{n}}_{\text{ts}}}$, there is a planned order, that is pending orders\committed orders (i.e., pending orders except committed orders) and in addition we have newly arrived dynamic orders during the period $\left[\text{0,}\frac{\text{T}}{{\text{n}}_{\text{ts}}}\right]$. Therefore, at the beginning of this new time slice, we have a new static VRP again. In a centralized coordination environment, we assume that vehicles serve customers along with the planned route, send all the relevant information of its own (i.e., current location of vehicle, current available capacity and so on). However, in this research, we assume only a decentralized coordination is applicable (See section 4 for more details) to solve the static VRP where minimal information is shared among collaborators and decision authority is dispersed among collaborators. That is, there is no decision maker like a dispatcher who has a global view of the entire static VRP. At this point, the decentralized algorithm is applied again for orders, pending order\committed order+dynamic order. Among the newly planned orders, serviceable orders during $\left[\frac{\text{T}}{{\text{n}}_{\text{ts}}}{\text{+T}}_{\text{ac}}\text{,}\frac{\text{2T}}{{\text{n}}_{\text{ts}}}{\text{+T}}_{\text{ac}}\right]$ can be committed and this procedure repeats until the current time reaches T_{co} of today.
4. Decentralized Architecture
<Figure 3> shows the proposed decentralized architecture among depot collaborator and vehicle collaborators. Collaborators are independent and intelligent entities that are loaded with some programs e.g., heuristic algorithm or optimization software such as CPLEX, and can therefore react to events and perform some jobs.
The decentralized interaction is characterized by the optimization stage and the commitment stage. In the optimization stage, the depot collaborator and vehicle collaborators cooperatively solve the reoptimization problem of DVRP. Throughout the time horizon, relevant information regarding received customer orders in a time slice such as customer’s geographic location and traveling distances are verified with GIS by the depot collaborator. The depot collaborator sends information on customers to each vehicle collaborator. The vehicle collaborator on the other hand sends its state information such as current location of vehicle and planned route to depot collaborator. With this information, depot collaborator reoptimizes the vehicle routes by solving its own optimization problem considering customers and vehicle collaborators’ information. Next, customervehicle assignments are announced to vehicle collaborators. Once a vehicle collaborator has received information on new customers, it determines a routing plan by solving its optimization problem and sends the information back to the depot collaborator. This iterative process continues while there is ample time until the best objective value is reached.
Once the optimization terminates, orders are committed to specific vehicles at commitment stage by the depot collaborator. Note that orders must be committed to a driver at least an advanced commitment time, T_{ac} prior to departure from the last location visited. The advanced commitment time gives the drivers appropriate reaction time to new orders before commencement of processing the order. The proposed architecture is purely decentralized since neither the depot or vehicle collaborator has a global view of the entire state of the system. However, this can be compensated by the information exchange among the collaborators.
The proposed decentralized procedure can be formalized as follows :

Step 0 : Set the current time t = 0

Step 1 (Optimization stage) : Let P be the set of pending orders at t = 0. The depot collaborator solves the static VRP with nodes, P . Let P_{k} be the subset of pending orders assigned to vehicle k and R_{k} be the set of pair of orders which represent the corresponding visiting sequence of orders. Let V be the set of vehicles that are assigned orders.

Step 2 (Commitment stage) : Among P_{k}, commit the set of orders, C_{k} that can be served within the time window, $\left[\text{0,}\hspace{0.17em}\frac{\text{T}}{{\text{n}}_{\text{ts}}}{\text{+T}}_{\text{ac}}\right]$ along with the traveling sequence R_{k}.

Step 3 : Let the orders that have not been served within the time window $\left[\text{0,}\hspace{0.17em}\frac{\text{T}}{{\text{n}}_{\text{ts}}}{\text{+T}}_{\text{ac}}\right]$ be unplanned orders H_{k}, which are the set of pending orders\committed orders assigned to vehicle collaborator k, that is P_{k}\C_{k}. Also let D be the set of dynamic orders received during the time slice $\left[\text{0,}\frac{\text{T}}{{\text{n}}_{\text{ts}}}\right]$. Set the current time $\text{t=}\frac{T}{{n}_{ts}}$.
Phase II
Step 4 (Optimization stage)

Step 4.1 (Initial Solution) : The depot collaborator solves the Linear Assignment Problem (LAP) for $\leftD\right$ objects (i.e., dynamic orders) and $\leftV\right$ persons (i.e., vehicles) with the assignment cost ${c}_{i}^{k}={\text{min}}_{j\in {H}_{k}}\left[{d}_{ij}\right]$ for all i∈D when a dynamic order i is assigned to a vehicle k.

Step 4.2 (Vehicle collaborator) : The vehicle collaborator k solves the Vehicle collaborator Problem VAP (${C}_{k},{R}_{k},{\upsilon}_{i}^{k},{H}_{k}$). As a result, the vehicle collaborator k updates R_{k} such that the order ${\upsilon}_{i}^{k}$ is newly inserted. In addition, the existing R_{k} for H_{k} is newly optimized given that orders in C_{k} are already served. The vehicle collaborator k sends the local objective function value $f\left({\nu}_{i}^{k}\right)$ and unplanned orders, H_{k} to the depot collaborator.

Step 4.3 (Improvement stage) : The depot collaborator gathers local objective function values from vehicle collaborators and updates the upper bound, ${\sum}_{k=1}^{m}f\left({\upsilon}_{i}^{k}\right)$. Set α = 1

Step 4.3.1(Depot collaborator) : Randomly select orders from D∪H and generate a sequence of orders. Let i_{[j]} be the j^{th} element in the sequence. Set j = 1. For i_{[j]}, let ${\nu}_{{i}_{\left[j\right]}}^{k}=1$. The depot collaborator sends the assignment to the vehicle collaborator k.

Step 4.3.2(Vehicle collaborator) : The vehicle collaborator k solves VAP(${C}_{k},{R}_{k},{\upsilon}_{{i}_{\left[j\right]}}^{k},{H}_{k}$) and send the corresponding objective function value, $f\left({\upsilon}_{{i}_{\left[j\right]}}^{k}\right)$ to the depot collaborator.

Step 4.3.3(Depot collaborator) : Select the vehicle collaborator whose $f\left({\upsilon}_{{i}_{\left[j\right]}}^{k}\right)$ is minimal among all vehicle collaborators and assign the order i_{[j]} to the corresponding vehicle collaborator.

Step 4.3.4 : If ${i}_{\left[j\right]}\ne \alpha \leftD\cup H\right$ increase j = j + 1 and go to Step 4.3.1.

Step 4.3.5 : If the new objective function value ${\sum}_{k=1}^{m}f\left({\nu}_{{i}_{\left[j\right]}}^{k}\right)$ has been improved from the current upper bound, update the current upper bound as ${\sum}_{k=1}^{m}f\left({\nu}_{{i}_{\left[j\right]}}^{k}\right)$, set α = 0.3 and go to Step 4.3.1. Otherwise terminate the algorithm.

Define LAP as follows :
LAP
s.t.
Step 5 (Commitment stage)

Step 5.1 : For each vehicle collaborator k, commit orders in D_{k} and H_{k} along with the route in R_{k} where the service can be completed within the next time slice plus T_{co}.

Step 5.2 : Increase the current time by $\frac{\text{T}}{{\text{n}}_{\text{ts}}}$. If the time slice reaches to the cutoff time T_{co}, terminate the algorithm otherwise, go to Step 1.
Phase I deals with the optimization and commitment stage at t = 0 and Phase II applies at the beginning of each time slices such as $\frac{T}{{n}_{ts}},\hspace{0.17em}\frac{2T}{{n}_{ts}}.\frac{3T}{{n}_{ts}}$ and so on. The static VRP in Step 1 is heuristically solved using the Ruin and Recreate (R&R) algorithm proposed by Schrimpf et al. [27]. The R&R algorithm has been successfully applied to VRPTW and outperformed existing meta heuristics (see [27] for more details). Once the algorithm at t = 0 terminates, the committed orders are served based on the route determined from the static VRP at Step 1.
In order to solve LAP at step 4.2, $\leftD\right$ must be the same as $\leftV\right$. When $\leftD\right$ > $\leftV\right$, we add $\leftD\right$  $\leftV\right$ new vehicles to the solution in order to handle new orders. In this case, ${c}_{i}^{k}$ are the same for all new vehicles since all the vehicles are located at the depot. If $\leftD\right$ < $\leftV\right$, we add $\leftV\right$  $\leftD\right$ artificial orders with ${c}_{i}^{k}$ = 0 for all vehicles so that we have the equal number of orders and vehicles. The LAP is solved optimally using the wellknown Hungarian algorithm. The vehicle collaborator problem, VAP(${C}_{k},{R}_{k},{\upsilon}_{k}^{k},{H}_{k}$) is formalized in the following section.
Step 4.3 deals with the improvement of the initial solution by the decentralized cooperation among the depot collaborator and vehicle collaborators. Upon receipt of an upper bound of solution, the depot collaborator resets the assignment of the unplanned orders and dynamic orders to vehicle collaborators which was determined at Step 3 and Step 4.1 respectively. The depot collaborator tries to improve the initial solution by solving the General Assignment Problem (GAP) of D∪H objects to K persons where K is the set of uniform fleet of vehicles with same capacity at Step 4.3. Different to LAP, we allow the assignment of multiple orders to one vehicle (i.e. relaxing constraint 2)in GAP which is NPhard problem. Actually, Step 4.3 shows the heuristic and decentralized algorithm to solve the GAP of the depot collaborator. The information flows among the depot collaborator and the vehicle collaborators for Step 4.3 are shown in <Figure 4> At Step 4.3.1, the depot collaborator asks the vehicle collaborator k to send the local objective function value when the vehicle covers the order, ${\upsilon}_{{i}_{\left[j\right]}}^{k}$ in its current route of visiting customers.
Upon receipt of all the local objective function value, the depot collaborator select the most efficient vehicle (i.e., the lowest objective function value) and assign ${\upsilon}_{{i}_{\left[j\right]}}^{k}$ to the corresponding vehicle. The procedure continues until all the orders in D∪H are assigned to vehicles. Step 4.3 is repeated as long as the computation time of algorithm does not exceed $\frac{T}{{n}_{ts}}$ and the best solution is implemented for the next time slice.
The optimization problem of vehicle collaborators VAP (${C}_{k},{R}_{k},{\upsilon}_{{i}_{\left[j\right]}}^{k},{H}_{k}$) can be formalized as follows :
Parameters
Decision variables
Subject to
The objective function of VAP is the minimization of total traveling distance in order to serve orders that are assigned by the depot collaborator and unplanned order that was pending order from the previous day. The constraint (5) and (6) imply that the order must be served once and must be connected as a route. The subtours must be eliminated by the constraint (7). The constraint (8) represents the vehicle capacity. The constraint (9) implies that the visiting sequence of already committed orders must be maintained. Note that the existing route of the vehicle R_{k} is updated only for ${\upsilon}_{{i}_{\left[j\right]}}^{k}$ ∪H_{k}.
The proposed VAP is a constrained VRP where the visiting sequence of some customers is predetermined. We heuristically solve the VAP using the R&R algorithm.
5. Experimental Results
This section presents the experiment conducted and the corresponding results. The decentralized DVRP system was coded in Eclipse 4.4 (Luna) and ran on Intel(R) Core(TM) i5 CPU (760 @ 2.80GHz (4 CPUs), ~2.8GHz) with 4096MB memory. A comparison of the decentralized system’s performance with other algorithms is also presented.
5.1 Benchmark Dataset
As mentioned earlier, the experimental results are based on the instances proposed by [21, 23]. They were derived from three separate already known VRP benchmark data by Taillard [29] (12 instances), Christophides and Beasley [6] (7 instances) and Fisher [12] (2 instances). These instances were extended by [21] and organized into two group of instances, pickup and delivery. This research considered only pickup problems (total of 21 instances) which include the following information :

length of the working day. Referred to as T.

appearance time of each order. This denotes the time when the order becomes known to the dispatcher.

duration of each order. It represents the service time for each order.

number of vehicles. It represents the number of vehicles available for serving the customers (set to 50).
Hanshar and OmbukiBerman [15] further analyzed the topology of the service areas for the benchmark datasets shown in <Table 1>. Generally, the service areas are either uniformly distributed, clustered or a mixture of the two.
In order to compare our results with the other mentioned algorithms, the same experimental settings of [15] has been adopted. The number of time slices in the optimization, n_{ts} = 25. Also, the advanced commitment time T_{ac} was set at 0.01T , where T is the entire length of the working day. Also, the cutoff time T_{co} was set at 0.5T . 30sec is allowed for the running of the algorithm in each time slice. If the algorithm did not complete within 30sec, the best solution found is saved and the subsequent time step begins. Thus a running time of the entire algorithm for a single problem instance is 30×25 = 750s or 12.5 min for simulating an 8 hour day.
5.2 Comparison With Other Algorithms
<Table 2> presents the numerical results for the decentralized compared to other centralized DVRP algorithms such as dynamic adaptive Particle Swarm Optimization (DAPSO) and Variable Neighborhood Search (VNS) by [7], a hybrid Ant Colony OptimizationLarge Neighborhood Search (ACOLNS) and ACO by [8] and Artificial Ant Colony with 2_Opt (AAC_2_Opt) by [10].
For each of the centralized algorithms, the objective function values reported is the average value reported by the corresponding author. The results show that the decentralized system comparatively achieved the best solution for 12 out of 21 instances. The AAC_2_Opt recorded 9 best solutions out of the 21 instances.
It seems that the proposed decentralized approach performs especially well when the number of customers is large (i.e., more than 100 customers).
6. Conclusion
In this research, DVRP has been studied. Each entity in the DVRP instance is treated as a collaborator having its own local objective that partially represents the global system objective. The collaborators can execute a set of operations to change their status to align with the overall system objective.
We proposed a decentralized architecture and coordination algorithm for DVRP. Also the proposed method has been tested using problem instances derived from publicly available benchmark data. The results show that the proposed method performed better than AAC_2_Opt based method which utilizes the centralized coordination in 12 out of 21 benchmark problems respectively.
In near future, we will consider the stability of VRP plan as the local objective of a vehicle collaborator. With the revised route, we may ignore the current VRP plan and accept a new plan which is completely different route from the old one. However, driver is usually nervous about the change in the route plan. A driver may be reluctant to have changes in lunch break due to an inserted customer in his/her route. Therefore, from the optimization point of view, it is reasonable to accept significant changes in the route. However, from the execution point of view, the stability of route may be more important than the optimization.This way, one can observe some learning capabilities of collaborators and further design artificial intelligence approaches.