1 Introduction

Virtualization of physical servers has gained prominence in enterprise data centers. This is because virtualization offers virtually unlimited resources without any upfront capital investment and a simple pay-as-you-go charging model. Long-term viability of virtualization depends, among other factors, on cost and performance. To attain performance guarantees, application providers can offer requirements for a number of virtual machines, bandwidth/latency requirements between virtual machines, and latency requirements between users of the service and virtual machines. Once a service provider can provide these performance guarantees, an optimized service can be offered to user applications. However, the service provider has to match the requirements of different applications to the placement of virtual machines with the limited bandwidth links between geographically separated data centers while minimizing its cost.

Unfortunately, today’s public cloud platforms such as Amazon EC2Footnote 1 do not provide any performance guarantee, which in turn affects tenant cost. Specifically, the resource reservation model in today’s clouds only provisions CPU and memory resources but ignores networking completely. Because of the largely oversubscribed nature of data center networks (e.g., Greenberg et al. (2009)), network bandwidth is a scarce resource shared across many tenants. In order to meet reliability and demand requirements, data centers have to be located all around the world. For instance, a teleconference call connects people from all over the world, and a data center within a reasonable distance to the end users is needed.

For distributed data centers, networking cost is one of the major costs. Given the limited bandwidth links between data centers, communication-intensive phases of applications collide and compete for the scarce network resources, which leads to unpredictable running times. The uncertainty in execution time further translates into unpredictable cost as tenants need to pay for the reserved virtual machines (VMs) for the entire duration of their jobs.

Placement of virtual machines within a data center has been widely explored (Guo et al. 2010; Piao and Yan 2010; Ballani et al. 2011; Xie and Hu 2012; Alicherry and Lakshman 2012; Biran et al. 2012; Guo et al. 2017). These papers account for the networking needs in addition to the CPU and memory needs within a data center. For example, Guo et al. (2010) propose bandwidth reservation between every pair of VMs. Ballani et al. (2011) propose a simpler virtual cluster (VC) model where all virtual machines are connected to a virtual switch with links of bandwidth B. Xie and Hu (2012) extend these approaches to consider time-varying network requirement. Alicherry and Lakshman (2012) develop resource allocation algorithms for distributed cloud systems where the primary objective is to minimize the maximum latency in communication between the virtual machines allocated for a user request. However, in comparison with our work, all these papers account for a single data center or do not ensure minimum network requirements, or consider different allocation resource objectives. Instead, this paper deals with the placement of virtual machines across geo-separated data centers, ensuring network resource requirement guarantees on interconnections, and communication cost minimization. To the best of our knowledge, this problem with these requirements has not been accounted for in the prior work.

In this paper, we consider multiple data centers that are connected with limited bandwidth links. The latency between every pair of data centers is known. Each data center provides a finite number of resources to users, called virtual machine (VM). A virtual machine is part of the data center resource requested by users to execute user-defined tasks that can exchange data with other tasks placed in others VMs. We assume that all VMs have identical data center resource consumption. In order to meet the application’s quality of service guarantees, there is an imposed minimum bandwidth and maximum latency between each pair of virtual machines. We assume that there are multiple users who would use these services, and users are connected to some data center. In order to meet the overall application performance, there is an additional requirement of maximum latency between users and the virtual machines. Intuitively, if there is a set of VMs needed by a user and the set does not have any requirement with any other user or VM, it can be placed in a single data center. However, a VM interacts with multiple VMs which may be needed by other users, thus increasing the set of options for placement. There is a cost of transferring data between data centers and the placement minimizes this cost thus preferring placement of all VMs in a single data center which may not be feasible due to the quality of service requirements. Note that in this paper we consider a simplified version of a real problem. Some characteristics as virtual machine CPU and memory requirement, virtual machine relocation and many others are not considered here.

This problem has similarity with the problem of Virtual Network Embedding (VNE), since the set of virtual machines and the relation between then can be considered as the Virtual Network (VN), while the set of data centers and the links can be mapped as the Substrate Network (SN). Fischer et al. (2013) describe a general framework for VNE. The VMPlacement problem can be described using this framework, but the problem has some differences from other works since we consider a complete graph connecting the data center, a quadratic cost function for the communication cost, and latency constraints that usually are ignored from other works.

This problem is a generalization of the NP-hard Generalized Quadratic Placement Problem (GQAP) given in Lee and Ma (2004). Solving this problem optimally is possible only for very small instances, which may not represent the sizes found in real-world applications. Thus, we propose two heuristic approaches to solve the Virtual Machine Placement Problem (VMPlacement). We extend the BRKGA (Gonçalves and Resende 2011) proposed in Stefanello et al. (2015a) by including new neighborhood structures in the local search and proposed a new GRASP algorithm, both coupled with a path-relinking (Glover 1989) and an extensive local search procedure. We test the performance of both algorithms on problems from an artificially generated dataset available for benchmark, comprised of instances of sizes ranging from small to large. Both algorithms have similar performance, although a slight advantage for BRKGA was observed in small instances while GRASP has a slight advantage in larger instances. We show that the algorithms are able to quickly find feasible solutions and find high-quality final solutions for the VMPlacement problem, especially when the path-relinking procedure is used. Furthermore, the results obtained with the proposed algorithms outperforms the previous state-of-the-art results for GQAP.

This paper is an extension of Stefanello et al. (2015a, b) has several additional contributions not found in the previous papers: (i) We propose a path-relinking strategy as an intensification method, and a local search method based on a large exploration of the neighborhood; (ii) We extend the BRKGA proposed in Stefanello et al. (2015a) to include new neighborhood structures in the local search, and propose a new GRASP algorithm; (iii) We integrate a path-relinking strategy to BRKGA; (iv) We provide experimental results for CPLEX, BRKGA and GRASP applied to a set of instances for VMPlacement and GQAP.

The remainder of the paper is organized as follows. In Sect. 2, we present mathematical models for the Virtual Machine Placement Problem in multiple data centers. Two heuristic approaches are described in Sect. 3. Computational results are presented in Sect. 4. Finally, conclusions are drawn in Sect. 5.

2 Virtual machine placement problem

In the Virtual Machine Placement Problem (VMPlacement), the objective is to place a set K of virtual machines (VM) in a set N of data centers (DC) in order to minimize the communication cost among virtual machines over the network.

In this problem, each data center has a capacity \(a_i\), which represents the number of virtual machines that can be placed in DC i. Also, between two data centers i and j, there is a bandwidth capacity (\(B_{ij}\)), a latency (\(L_{ij}\)), and a cost \(C_{ij}\) to transfer a data unit between the pair of data centers. In order to meet the reliability and demand requirements of the applications, certain bandwidth and latency requirements can be imposed on the different VMs that are placed on the data centers. Each pair of virtual machines v and w has a required bandwidth (\(b_{vw}\)) whose sum overall VMs placed between DCs i and j cannot exceed \(B_{ij}\). Furthermore, VMs v and w cannot be placed in DCs i and j if the required latency \(l_{vw}\) is less than the respective DCs latency. Finally, there is a set U of users who access the system. Each user u is located at a data center d(u) and has a maximum latency requirement \(t_{uv}\) for each VM v.

Figure 1 from Stefanello et al. (2015a) shows a representation of the input data components: the data centers (Fig. 1a), and the virtual machines (Fig. 1b). The first component is composed of three data centers (rounded rectangles). Each data center has a number of users and a capacity (represented as a number of spots where VMs can be placed). The connection between each pair of DCs represents the bandwidth capacity, latency, and cost. The second component is composed of eight virtual machines, where each link represents the required bandwidth and the required latency.

Fig. 1
figure 1

Input data representation

In the next subsections, we present three mathematical formulations for the VMPlacement problem. We first present a quadratic formulation, followed by two mixed integer linear formulations. Since VMPlacement is a generalization of the NP-hard Generalized Quadratic Assignment Problem, we extend the linear mathematical models proposed for the GQAP in Lee and Ma (2004) to VMPlacement. The models in Lee and Ma (2004) were also extended from the mixed integer linear programming formulation from Kaufman and Broeckx (1978) and Frieze and Yadegar (1983) to the Quadratic Assignment Problem, all of them based on the formulation for the QAP described in Koopmans and Beckmann (1957).

2.1 Quadratic mathematical model

A natural formulation for VMPlacement is based on a quadratic formulation as a generalization of GQAP. In what follows we summarize the parameters and present a quadratic mathematical model for VMPlacement (QMVMP) introduced in Stefanello et al. (2015a).

Parameters

  • N: set of data centers;

  • K: set of virtual machines;

  • U: set of users;

  • \(a_i\): capacity in number of VMs that DC i can host;

  • \(B_{ij}\): bandwidth between DCs i and j;

  • \(L_{ij}\): latency between DCs i and j;

  • \(C_{ij}\): cost of transferring a data unit between DCs i and j;

  • \(b_{vw}\): required bandwidth between VMs v and w;

  • \(l_{vw}\): required latency between VMs v and w;

  • d(u): DC that hosts user u;

  • \(t_{vu}\): required latency between user u and VM v;

  • \(c_{iv}\): cost of placing a VM v in a DC i.

Equations (1)–(7) present the quadratic model for the VMPlacement (QMVMP), where the binary decision variable \(x_{iv}=1\) if VM v is located in DC i, and \(x_{iv}=0\) otherwise.

$$\begin{aligned}&\displaystyle \min \sum _{i\in N} \sum _{v\in K} c_{iv} x_{iv}+ \sum _{i\in N} \sum _{j\in N} \sum _{v\in K} \sum _{w\in K} x_{iv}x_{jw} C_{ij}b_{vw} \end{aligned}$$
(1)
$$\begin{aligned}&\displaystyle \text {subject to: } \nonumber \\&\displaystyle \sum _{v\in K} x_{iv} \le a_i \quad \forall \ i\in N, \end{aligned}$$
(2)
$$\begin{aligned}&\displaystyle \sum _{i\in N} x_{iv} = 1 \quad \forall \ v\in K, \end{aligned}$$
(3)
$$\begin{aligned}&\displaystyle \sum _{v\in K}\sum _{w\in K} x_{iv}x_{jw}b_{vw} \le B_{ij} \quad \forall \ i,j\in N, \end{aligned}$$
(4)
$$\begin{aligned}&\displaystyle \sum _{i\in N}\sum _{j\in N} x_{iv}x_{jw}L_{ij} \le l_{vw} \quad \forall \ v,w \in K, \end{aligned}$$
(5)
$$\begin{aligned}&\displaystyle \sum _{i\in N}x_{iv}L_{i,d(u)} \le t_{vu} \quad \forall \ u\in U, \ \forall \ v\in K, \end{aligned}$$
(6)
$$\begin{aligned}&\displaystyle x_{iv} \in \{0,1\} \quad \forall \ i\in N, \ \forall \ v\in K. \end{aligned}$$
(7)

Objective function (1) minimizes the cost of placing each pair of virtual machines v and w at DCs i and j. The cost is composed by an installation cost plus a data transferring cost between virtual machines. Constraints (2) require that the number of VMs in each DC must not exceed the DC capacity. Constraints (3) require that each VM must be assigned to exactly one DC. Constraints (4) require that the given bandwidth between each pair i and j of DCs should not be surpassed by the total sum of bandwidth required among the virtual machines placed in these DCs. Constraints (5) assure that the latency required between each pair of VMs should be respected, i.e, if VMs v and w are placed respectively at DCs i and j, then the latency between DCs i and j should not exceed the maximum required latency between VMs v and w. Constraints (6) require that the latency between a VM v and the DC where the user u is located be respected, i.e, a VM v can be only placed at DC i if the latency between i and d(u) is less than or equal to a given latency between the VM v and the user u. Finally, constraints (7) define the domain of the decision variables.

The performance of mixed integer linear solvers has improved considerably over the last few years. CPLEXFootnote 2 is a general-purpose black-box solver based on simplex and a branch-and-bound algorithm with the state-of-the-art exact algorithms for integer programming and has been successfully applied in many combinatorial optimization problems. To analyze the performance of CPLEX and provide baseline results for comparison of heuristic methods, the following subsections present two linear mathematical models for VMPlacement problem.

2.2 Linear mathematical model I: LMVMP

Based on the model from Lee and Ma (2004) for the GQAP, and from Frieze and Yadegar (1983) for the QAP, and introduced in Stefanello et al. (2015b), we present a mixed-integer linear model for the VMPlacement. Let \(y_{ivjw}=x_{iv}x_{jw},\ \forall \ i,j=\{1,\ldots ,N\}\) and \(v,w=\{1,\ldots ,K\}\), the mathematical model referred to as LMVMP can be formulated as:

$$\begin{aligned}&\displaystyle \min \sum _{i\in N} \sum _{v\in K} c_{iv} x_{iv}+ \sum _{i\in N} \sum _{j\in N} \sum _{v\in K} \sum _{w\in K} y_{ivjw} C_{ij}b_{vw} \end{aligned}$$
(8)
$$\begin{aligned}&\displaystyle \text {subject to: } \nonumber \\&\displaystyle \sum _{v\in K} x_{iv} \le a_i \quad \forall \ i\in N, \end{aligned}$$
(9)
$$\begin{aligned}&\displaystyle \sum _{i\in N} x_{iv} = 1 \quad \forall \ v\in K, \end{aligned}$$
(10)
$$\begin{aligned}&\displaystyle \sum _{i\in N} y_{ivjw} = x_{jw} \quad \forall \ v,w\in K, \ \forall \ j\in N, \end{aligned}$$
(11)
$$\begin{aligned}&\displaystyle y_{ivjw} = y_{jwiv} \quad \ \forall \ v,w\in K, \ \forall \ i,j\in N, \end{aligned}$$
(12)
$$\begin{aligned}&\displaystyle \sum _{v\in K}\sum _{w\in K} y_{ivjw}b_{vw} \le B_{ij} \quad \forall \ i,j\in N, \end{aligned}$$
(13)
$$\begin{aligned}&\displaystyle \sum _{i\in N}\sum _{j\in N} y_{ivjw}L_{ij} \le l_{vw} \quad \forall \ v,w\in K, \end{aligned}$$
(14)
$$\begin{aligned}&\displaystyle \sum _{i\in N}x_{iv}L_{i,d(u)} \le t_{vu} \quad \forall \ u\in U, \ \forall \ v\in K, \end{aligned}$$
(15)
$$\begin{aligned}&\displaystyle x_{iv} \in \{0,1\} \quad \forall \ i\in N, \ \forall \ v\in K, \end{aligned}$$
(16)
$$\begin{aligned}&\displaystyle 0\le y_{ivjw} \le 1 \quad \forall \ i,j\in N,\ \forall \ v,w\in K. \end{aligned}$$
(17)

Model LMVMP is obtained by replacing the product \(x_{iv}x_{jw}\) by \(y_{ivjw}\) from QMVMP. In addition, three extra sets of constraints are considered. Constraints (11) and (12) define the relation between variables x and y. Constraints (12) impose the symmetry relation to variables y. Finally, constraints (17) define the domain of variables y.

The model QMVMP has quadratic constraints, while LMVMP. The objective function is also linear. However, the mixed-integer linear problem LMVMP has a considerably larger number of variables, having variables \(y_{ivjw}\) in addition to the previous variables \(x_{iv}\). Thus, the number of variables changes from O(NK) in QMVMP to \(O(N^2K^2)\) in LMVMP. We note that if the optimal solution of LMVMP is \((x^*_{iv},y^*_{ivjw})\), then \((x^*_{iv})\) is the optimal solution of QMVMP. The proof that both models are equivalent can be obtained by extending the proof for QAP provided in Lee and Ma (2004).

2.3 Linear mathematical model II LMVMP-II

Next we present a second linearization for VMPlacement problem (LMVMPII), introduced in Stefanello et al. (2015b). This linear model is derived from Kaufman and Broeckx (1978) for QAP, which the linearization for QAP with a lower number of variables and constraints. In Lee and Ma (2004) the authors extend the formulation for GQAP. In this model, each binary decision variable \(x_{iv}\) is set to one when VM v is located at DC i, and zero otherwise. The auxiliary variables \(y_{iv}\) aggregate the cost for each placed VM v in a DC i, and \(n_{ijv}\) aggregate the bandwidth from VM v between data centers i and j.

$$\begin{aligned}&\min \sum _{i\in N} \sum _{v\in K} c_{iv} x_{iv}+ \sum _{i\in N} \sum _{v\in K} y_{iv} \end{aligned}$$
(18)
$$\begin{aligned}&\text {subject to: } \nonumber \\&\sum _{v\in K} x_{iv} \le a_i \quad \forall \ i\in N, \end{aligned}$$
(19)
$$\begin{aligned}&\sum _{i\in N} x_{iv} = 1 \quad \forall \ v\in K, \end{aligned}$$
(20)
$$\begin{aligned}&\sum _{j \in N}\sum _{w \in K} C_{ij}b_{vw} x_{jw} - y_{iv} \le m_{iv}(1-x_{iv}) \quad \forall \ v\in K, \ \forall \ i\in N, \end{aligned}$$
(21)
$$\begin{aligned}&\sum _{w\in K} b_{vw} x_{jw} - n_{ijv} \le M(1-x_{iv}) \quad \forall \ v\in K, \ \forall \ i,j\in N, \end{aligned}$$
(22)
$$\begin{aligned}&\sum _{v\in K} n_{ijv} \le B_{ij} \quad \forall \ i,j\in N, \end{aligned}$$
(23)
$$\begin{aligned}&L_{ij}x_{iv}\le l_{vw}+L_{ij}(2-x_{iv}-x_{jw}) \quad \forall \ v,w\in K, \forall \ i,j\in N, \end{aligned}$$
(24)
$$\begin{aligned}&\sum _{i\in N}x_{iv}L_{i,d(u)} \le t_{vu} \quad \forall \ u\in U, \ \forall \ v\in K, \end{aligned}$$
(25)
$$\begin{aligned}&x_{iv} \in \{0,1\} \quad \forall \ i\in N, \ \forall \ v\in K, \end{aligned}$$
(26)
$$\begin{aligned}&y_{iv} \ge 0 \quad \forall \ i\in N,\ \forall \ v\in K, \end{aligned}$$
(27)
$$\begin{aligned}&n_{ijv} \ge 0 \quad \forall \ v\in K,\ \forall \ i,j\in N. \end{aligned}$$
(28)

where

$$\begin{aligned} m_{iv} \ge \sum _{j\in N} \sum _{w\in K} C_{ij}b_{vw}, \quad \forall \ i\in N, \ \forall \ v\in K. \end{aligned}$$

Constraints (21) impose the cost between the data centers i and j to the variables \(y_{iv}\). Constraints (22) and (23) impose the bandwidth constraints while constraints (24) impose the latency constraints. The constraints (24) can be replaced by

$$\begin{aligned} x_{iv}+x_{jw}\le 1 \quad \forall \ i,j \in N, \ \forall \ v,w \in K \text { if } L_{ij}>l_{vw}. \end{aligned}$$
(29)

We observe that CPLEX converts (24) into (29) in the pre-processing phase. Finally, constraints (26), (27), and (28) define the domains of the variables.

This model is not used in practice since it uses big-M constraints (22), and the root-node bound is always zero. However, a stronger formulation can be obtained by adding the following cuts

$$\begin{aligned} y_{iv} \ge Y_{iv} x_{iv} \quad \forall \ i\in N, \ \forall \ v\in K, \ \end{aligned}$$
(30)

where \(Y_{iv}\) is defined as the optimal value of the following generalized assignment model:

$$\begin{aligned}&Y_{iv}=\min \sum _{j\in N} \sum _{w\in K} C_{ij}b_{vw} x_{jw} \end{aligned}$$
(31)
$$\begin{aligned}&\text {subject to: } \nonumber \\&\sum _{w\in K} x_{jw} \le a_j \quad \forall \ j\in N, \end{aligned}$$
(32)
$$\begin{aligned}&\sum _{j\in N} x_{jw} = 1 \qquad \forall \ w\in K, \end{aligned}$$
(33)
$$\begin{aligned}&\sum _{w\in K} b_{vw}x_{jw} \le B_{ij} \qquad \forall \ j\in N, \end{aligned}$$
(34)
$$\begin{aligned}&\text { (24), (25) } \end{aligned}$$
(35)
$$\begin{aligned}&x_{iv} = 1 \end{aligned}$$
(36)
$$\begin{aligned}&x_{jw} \in \{0,1\} \quad \forall \ j\in N, \ \forall \ w\in K. \end{aligned}$$
(37)

These additional cuts were applied for the 3-dimensional assignment problem in Mittelmann and Salvagnin (2015). Note that these cuts can also be applied for the GQAP suppressing the constraints (22)–(25) and (28) from LMVMPII, and (34)–(35) from the assignment model, which are the specific constraints for VMPlacement.

Section 4.2 investigates the performance of CPLEX applied to both linear mathematical models to solve small, medium and large size instances. The experiments show that this approach is limited to solve small instances. Thus, heuristic methods are required to solve large size instances.

3 Heuristic approaches

In this section we propose two heuristics approaches to solve the VMPlacement problem. We first describe the local search procedures and two path-relinking strategies used in the proposed metaheuristics. Finally, we describe the greedy randomized adaptive search procedure (GRASP), followed by the biased random-key genetic algorithm (BRKGA).

Placing a virtual machine in a data center can violate some constraints. To deal with this situation, we use a penalization strategy to minimize the number of violated constraints. Thus, the cost of placing a VM v in DC i is calculated by the regular placement cost in addition to a sufficiently large number M for each violated constraint. This penalization strategy is applied whenever a solution is evaluated. In our experiments we use \(M=10^{10}\). Also, the notation for a solution S represents a vector of size |K| with \(S=\{n_1,\ldots ,n_{|K|}\}\), with \(n_v\) representing the label (or index) of the data center where the virtual machine v is placed.

3.1 Local search procedures

Local search is a general approach for finding and improving solutions to hard combinatorial optimization problems. The most basic strategy of local search algorithms is to start from an initial solution and iteratively replace the current solution by a better neighbor solution, until no improvement is possible. A neighbor solution can be obtained by applying moves defined by a neighborhood structure. In this paper we define four neighborhood structures to obtain neighbor solutions, namely shift, swap, chain2L, and chain3L.

Shift A shift operation moves a virtual machine from the current data center to a different data center. In a shift search, we select a virtual machine v in a circular order of their indexes (starting from index zero), and calculate the cost to move it from the current data center to each other data center. A virtual machine v is moved to the data center that produces the greatest improvement in the objective function. This procedure is repeated until reaching a local minimum.

Swap A swap operation interchanges the positions of two virtual machines. In a swap search, we evaluate the cost of all swap moves between two virtual machines v and w in a circular order of their indexes (starting from index zero and one, respectively). When an improvement is reached, the virtual machine positions are interchanged, and the search continues by selecting a next pair of virtual machines. The procedure ends when no swap move can improve the solution. The evaluation of symmetrical movements is avoided.

Chain2L A chain operation is a composition of two shift moves applied sequentially. In a chain2L search, the first shift move selects a virtual machine v from the data center i to move to a data center j. In the second shift move, a virtual machine w from j is moved to a data center k. We evaluate all compositions of two shift moves. When an improvement is reached, the moves are applied to the solution and the process is restarted with the new solution. Note that this neighborhood includes the swap search when k is equal to i.

Chain3L The last neighborhood proposed extends the concept of the chain moves to a composition of three shift moves applied sequentially. A chain3L search involves four virtual machines and up to four data centers. This search also comprises three shift moves between two data centers, or a circle move among three data centers.

Given a solution S, the worst case time complexity to evaluate the cost to insert or remove a virtual machine v in S is O(|K|). This complexity is reached because it is necessary to evaluate whether the latency requirement is satisfied between v and \(w \in S\). Also, it is necessary to evaluate the bandwidth cost between v and \(w \in S\). Capacity constraints, uniqueness in the placement, and user latency requirements can be evaluated in constant time. Therefore, for the shift search |K| removals and \(|K|*|N|\) insertions are evaluated, resulting in a complexity of \(O(|K|^2|N|)\) (for the case of no improvement during the search). For the swap search the time complexity is \(O(|K|^3)\), while for the chain2L search it is \(O(|K|^3|N|)\). Finally, the chain3L search has complexity of \(O(|K|^4|N|)\).

Naturally, in all the searches described above, the processing time can be reduced avoiding the evaluation of moves that lead to a worse solution. Also, heuristic strategies do not need to explore the whole neighborhood, and only evaluate a subset of the neighbor solutions. This is done in the chain3L search by prohibiting the insertion of a virtual machine in some data centers, thus reducing the search space.

Two heuristic strategies to reduce the amount of explored neighbor solutions in the chain3L search are proposed. The first reduction heuristic strategy is based on the cost \(Y_{iv}\) described in the Model (31)–(37) from Sect. 2.3. Given a parameter \(\alpha _{3L}\in [0,1]\), we allow to insert a virtual machine v in a data center i only if i belongs to the \(\lfloor N*\alpha _{3L} \rfloor \) data centers with the lowest cost \(Y_{iv}\). A low value of \(\alpha _{3L}\) indicates a restricted search only in data centers with low values of \(Y_{iv}\). A high value of \(\alpha _{3L}\) indicates a more broad search. An analysis for the instances described in Sect. 4.1 with their respective best known solutions corroborates this for this heuristic strategy. In these solutions, more than 50% of the virtual machines are placed in the 30% data centers with lowest cost \(Y_{iv}\), and approximately 75% are placed in the 50% data centers with the lowest cost.

The second reduction heuristic strategy is based on the communication cost assigned to the data centers. Given a solution \(\mathcal {S}\), let \(C_i\) be the sum of the communication cost from i to all other data centers. We only evaluate compositions of movements that start from a virtual machine that belongs to the \(\beta _{3L}\) data center with the highest cost \(C_i\). In summary, the idea is to limit the search to movements that reduce the communication cost of the most required data centers. Combining both reduction heuristics, the search space is considerably smaller, and the heuristics are still able to visit solutions in a promising search space. By default, we used \(\alpha _{3L}=0.3\) and \(\beta _{3L}=3\) based on empirical observation of preliminary results. These values present a good balance between running time and solution quality.

The neighborhoods are integrated based on the idea of Variable Neighborhood Descent (Hansen et al. 2010). Every neighborhood is applied sequentially starting from the lowest to highest complexity neighborhood (shift, swap, chain2L, and chain3L). However, the search is restarted with the first neighborhood if an improvement is reached. We name each local search strategy using two characters. The first character is a number that indicates the highest complex neighborhood applied. The character V indicates that we use the VND strategy to integrate the neighborhoods. For example, the name 3V indicates that we consider local search procedures shift search, swap search, and chain2L search.

Note that using a penalization strategy described in the previous subsection, the local search is also applied to infeasible solutions. In this case, the local search also works as a repair procedure.

3.2 Path-relinking

Path-relinking (PR) is an approach to integrate intensification and diversification in the search. It consists in exploring trajectories that connect high-quality solutions. Starting from an initial solution, the scheme moves from one solution to another until the target solution is reached. The objective consists in finding a solution that is better than both the initial and target solutions.

Path-relinking was first suggested for tabu search in Glover (1989) and then formalized in Glover and Laguna (1993). Since then, this strategy was applied on a large number of combinatorial optimization problems and related studies as Glover (1997), Glover et al. (2000), Resende and Ribeiro (2005), Festa et al. (2006), Resende et al. (2010), Festa and Resende (2013), Glover (2014), and Ferone et al. (2016), just to name a few.

Algorithm 1 shows a pseudo-code for the path-relinking operator between solutions S and T, where S is the initial solution and T is the guiding solution. In line 2 the incumbent solution is initialized. The loop between lines 3 and 10 is repeated while the distance between both solutions is greater than an input parameter \(\delta _{lim}\). The distance \(\varDelta \{S,T\}\) is the minimum number of moves needed to transform S into T or vice-versa. We use \(\delta _{lim}=\varDelta \{S,T\}/2\) defined at the beginning of the procedure, since we observed that the probability of finding an improved solution is greater in the first steps of the path. In line 4 a movement is applied to the solution S in the direction of solution T. Basically, we analyze all changes in S to T that reduce the distance \(\varDelta \{S,T\}\) by one, and apply in S the change with the least cost.

Several studies have experimentally found that it is convenient to add a local search exploration from some of the generated solutions in the path (Martí et al. 2006). Since two consecutive solutions obtained by a relinking step are often very similar, it is generally not efficient to apply the local search at every step of the relinking process. Thus, in line 7 the local search is applied at each n iterations, where \(n=(|K|*0.5)/(4+|K|/50)\). The parameter n is rounded up to the nearest even number. This ensures that the local search is applied to S and T each time, since, in line 10, the solutions are interchanged to use a back-and-forward strategy (Festa and Resende 2013). Finally, in line 8 the incumbent solution is updated and returned in line 11.

figure a

Algorithms 2 and 3 describe two frameworks of how to manage the elite set \(\mathcal {E}\), and how solutions are selected for the path-relinking operator. In the first approach, the path-relinking operator is applied between a provided solution S and a selected solution T from the elite set. In the second approach, the path-relinking operator is applied multiple times, between a provided solution S and every solution from the elite set.

To simplify the notation, we refer to path-relinking the general process to manage the elite set and apply the path-relinking operator between the two solutions. With respect to the specific elite-set management procedures, we call the first approach PRS and the second approach PRM. The first two letters indicate that the algorithm is related to the path-relinking and the last letter indicates the framework for managing the path-relinking operator. In the first, S indicates the application of a single path-relinking operator and in the second, M indicates the application of a multiple path-relinking operator.

In the beginning of Algorithm 2, the elite set \(\mathcal {E}\) is empty. The maximum size \(\delta _{max}\) of the elite set is an input parameter. A solution S is added to \(\mathcal {E}\) if it improves the best solution or is better than the worst elite solution and is sufficiently different from each solution in \(\mathcal {E}\) (lines 3, 6, and 11). We define that two solutions are sufficiently different if \(\varDelta \{S,T\} > \delta _{dis}\). We denote by \(S\not \approx \mathcal {E}\) if \(\varDelta \{S,T'\} > \delta _{dis}, \forall \; T' \in \mathcal {E}\). In our case, we use \(\delta _{dis}=\lfloor |K|*0.1 \rfloor \). If the elite set has a minimum number of solutions (\(\delta _{min}=2\) in line 2), and S is sufficiently different, a solution T is selected for the path-relinking operator between S and T.

In line 4, solutions are ordered by a linear rank \(r(T_i),\; \forall \; i=1\ldots |\mathcal {E}|\), where the best solution has rank \(r(T_1)=|\mathcal {E}|\) and the worst solution has rank \(r(T_{|\mathcal {E}|})=1\). A solution \(T \in \mathcal {E}\) is selected using the roulette wheel criterion, i.e, T is selected from the elite set with probability \(r(T)/\sum _{i\in r(T_i)}\). With this criterion, better solutions have more chance to be chosen. After T is selected, the path-relinking operator is applied between S and T.

Lines 6 to 9 update the elite set after the path-relinking operator is applied. Line 6 evaluates whether S is sufficiently different or improves the best solution. If this is the case, S is added to \(\mathcal {E}\). Line 8 maintains the cardinality of the elite set. \(T'\) is the most similar solution to S with worst solution than S, i.e, \(T'\leftarrow \{T'' \in \mathcal {E}\; |\; f(T'')>f(S) \text { and } \varDelta \{T'',S\} \le \varDelta \{T'',S'\}, \; \forall \;S' \in \mathcal {E} \}\). Finally, in line 13, solution S is returned.

figure b

Another approach is outlined in Algorithm 3. The main difference from the previous approach is that when a solution S is added to \(\mathcal {E}\), the path-relinking operator is applied between all pair of solutions from \(\mathcal {E}\) to which the operator was not previously applied. This is done in the loop in lines 5 to 13. The flag hasPairsST can be easily updated using memory structures, and it indicates whether there are \(S'\) and \(T'\) in the elite set for which the path-relinking operator was not applied. In line 6 two solutions are selected. Pairs with better objective functions are selected first. The insertion operation (line 11) and the removal operation (line 13) follow the same rules as in the previous algorithm. In line 14 a solution is removed from the elite set in case it is full. We also use a roulette wheel criterion by rank, where the best solution has rank equal to zero, and the worst solution has rank \(|\mathcal {E}|-1\). This ensures that the best solution are kept in the pool, and the worst solution has the highest probability of being removed. Finally, in line 18 solution S is returned.

figure c

Note that Algorithms 2 and 3 are designed with independent components, requiring only few configuration parameters (as local search strategy and elite size), and a procedure that provides different solutions. For this reason, these path-relinking strategies can be easily included in different heuristic frameworks. In the next subsections, we describe the GRASP and BRKGA metaheuristics which use the path-relinking strategies.

3.3 Greedy randomized adaptive search procedure—GRASP

GRASP (greedy randomized adaptive search procedure) is a multi-start metaheuristic for combinatorial optimization problems (Feo and Resende 1989, 1995; Resende and Ribeiro 2010; Martí et al. 2013; Resende and Ribeiro 2014, 2016). The algorithm is basically composed of two phases: construction and local search. The construction phase builds a feasible solution following a greedy randomized criterion scheme, whose neighborhood is investigated until a local minimum is found during the local search phase. These phases are repeated over the iterations and the best local optimum found is returned as the heuristic solution. Its basic implementation is memoryless, because it does not make use of information collected in previous iterations. One way to add memory to GRASP is its hybridization with path-relinking. A large review of this technique is presented in Festa and Resende (2013).

A high-level description of GRASP is presented in Algorithm 4. GRASP iterations are carried out in lines 3 to 8. In line 4, the procedure attempts to build a greedy randomized solution (see Algorithm 5). A solution S is used as the starting solution for the local search in line 5. In line 6 path-relinking is applied if a hybrid approach is used. If the local optimum S is better than the incumbent solution, then, in line 8 the incumbent solution is updated. In line 9, the best solution found overall GRASP iterations is returned as the GRASP solution.

figure d

Algorithm 5 shows a general pseudo-code for the greedy randomized constructive heuristic used in the GRASP. The construction builds a solution S, one element at a time. In line 2, solution S is initialized empty. In line 3, the set of candidate elements is initialized. In line 4, the cost of placing each candidate element in a data center is calculated. The solution is built in the loop in lines 5 to 10. In line 6, a restricted candidate list (RCL) is built, and in line 7 a candidate is selected. In line 8 this element is added to the partial solution. In line 9, the candidate is removed from the set of candidates  \(\mathcal {C}\). In line 10 the placement costs for each candidate are re-evaluated. Finally, in line 11, solution S is returned.

figure e

Based on this framework, four constructive heuristics were developed to generate initial solutions in line 4 of GRASP.

Random In the first constructive heuristic, at each placement step a yet unplaced virtual machine is randomly selected and placed in a random data center. Note that this constructive approach is not a greedy heuristic. We decided to evaluate this approach since it is one of the simplest constructive methods, and we use it as a baseline for comparing more sophisticated constructive methods.

Greedy In the second constructive heuristic, at each placement step a yet unplaced virtual machine is randomly selected and placed in the data center that produces the least increase in the objective function. In this case, the RCL is composed of only one candidate that is the best data center for a specific randomly selected virtual machine.

DC-Greedy In the third constructive heuristic, on each placement step a yet unplaced virtual machine is randomly selected and placed in one of the n data centers from the RCL. The RCL contains the candidates that produce the lowest incremental placement cost. By default, we use \(n=\max \{3,|N|*0.2\}\). The value of n can vary at each iteration since we consider some additional conditions. First, candidates with the same cost as the first n candidates are also added to the RCL. Second, if at least one candidate can be added to the solution keeping it feasible, then candidates that generate infeasibilities are ignored.

VM-Greedy In the last constructive heuristic, at each placement step, all possible placements of virtual machines to data centers are evaluated, and one of the n best candidates is randomly selected. In this constructive heuristic, we use \(n=\max \{5,|K|*0.1\}\) for the \( RCL \) size, but the effective value of n can vary since we also include both additional conditions described in the previous constructive heuristic.

In these constructive heuristic, any random selection is based on a uniform distribution. Note that these constructive heuristic can generate infeasible solutions, and the repair process is done by the local search procedure.

3.4 Biased random-key genetic algorithm—BRKGA

A biased random-key genetic algorithm (BRKGA) is a populational metaheuristic that recently has been successfully applied to several hard combinatorial optimization problems (Gonçalves and Resende 2011; Festa 2013; Andrade et al. 2015; Stefanello et al. 2017). A BRKGA is a class of genetic algorithms in which solutions are encoded as vectors of random keys, i.e. randomly generated real numbers from uniform distribution in the interval [0, 1). A vector of random keys is translated into a solution of the optimization problem by a decoder. A decoder is a deterministic algorithm that takes as input a vector of random keys and returns a solution of the optimization problem as well as its cost (or fitness).

Algorithm 6 presents a general scheme of a BRKGA. The algorithm starts with a set of p random vectors of size n (population). Parameter n depends on the encoding while parameter p is user-defined. Starting from the initial population, the algorithm generates a series of populations. Each iteration of the algorithm is called a generation. The algorithm evolves the population over the generations by combining pairs of solutions from one generation to produce offspring solutions to the following generation.

figure f

At each generation g, the decoder is applied to all newly created random keys, and the population is partitioned into a smaller set of \(p_e\) elite solutions, i.e., the best fit \(p_e\) solutions in the population, and another larger set of \(p-p_e>p_e\) non-elite solutions (line 6). Population \(g+1\) is generated as follows. All \(p_e\) elite solutions of population g are copied without changing to population \(g+1\) (line 7). This elitist strategy maintains the best solutions on hand. To ensure that mutation is present in the evolution, \(p_m\) mutant are added to population \(g+1\). A mutant is simply a vector of random keys, generated in the same fashion as the initial solutions (line 8).

With \(p_e + p_m\) solutions accounted for population \(g+1\), other \(p-p_e-p_m\) additional solutions must be generated to complete the p solutions that make up population \(g+1\). This is done through mating or crossover (line 9). A parent-A is selected randomly from the elite solutions, and a parent-B is selected randomly among the set of non-elite solutions. A child C is produced by combining the parents using parameterized uniform crossover (Spears and DeJong 1991). Let \(\rho _A > 1/2\) be the probability that the offspring solution inherits the key of parent-A and \(\rho _B=1-\rho _A\) be the probability that it inherits the key of parent-B, i.e. \(c_i = a_i\) with probability \(\rho _A\) or \(c_i=b_i\) with probability \(\rho _B=1-\rho _A\), where \(a_i\) and \(b_i\) are, respectively, the i-th key of parent-A and parent-B, for \(i=1,\ldots ,n\).

Random key genetic algorithms were first introduced in Bean (1994). BRKGA differs from the original approach in the way parents are chosen from the population and how the child inherits the key (Gonçalves and Resende 2011). In that work, the authors describe a BRKGA as having problem-independent and problem-dependent components. The independent components are applied without any knowledge of the problem. The problem-dependent components are the only connection with the problem. Thus, to describe a BRKGA, one needs only to show how solutions are encoded and decoded, what choice of parameters p, \(p_e\), \(p_m\), and \(\rho _A\) was made, and how the algorithm stops. We next describe the encoding and decoding procedures for the proposed algorithm, and attribute values for parameters and stopping criterion in Sect. 4.

3.4.1 Decoders

Solutions of the optimization problem are encoded as a vector \(\mathcal {X}\) with \(n=|K|\) random keys. To translate this vector into the solution of the VMPlacement problem, we propose two decoders, as described next.

Greedy Ordered Decoder - D1: In this decoder, the keys provide the order of placement. Following this order, a greedy strategy is used, placing each VM at the DC which produces the least increase in the objective function.

The decoder starts with a list in which each element is composed of the random key and the index of the virtual machine. The list is sorted by the keys. Now, the sorted list of indices of virtual machines is used as an order in which virtual machines are placed. Following this order, the next step is to place each virtual machine v at a DC. This is done by placing virtual machine v at the DC i which produces the least increase in the objective function. Note that the cost to insert VM v in each DC considers the previous virtual machines placed. When all VMs are placed, the decoder returns the fitness value for the vector \(\mathcal {X}\) of random keys.

Location Decoder—D2 In this decoder, each key is decoded as the data center in which the virtual machine should be placed. Let \(k_i\) be the key corresponding to the VM of index i in \(\mathcal {X}\), then this decoder simply places the VM of index i at DC \(\lfloor k_i*|N|\rfloor \). Figure 2 shows an example of decoding for decoder D2 for three data centers, eight virtual machines and a vector \(\mathcal {X}\) of random keys.

Fig. 2
figure 2

Example of decoding for decoder D2

Since both decoders are deterministic, we always obtain the same solution \(S'\) from a vector of random keys \(\mathcal {X}\). In the case where a deterministic local search strategy is applied to \(S'\), a solution \(S''\) can be obtained using the decoders and the local search, and thus, \(S''\) can be associated with \(\mathcal {X}\). However, depending on the decoder, it is possible to use a process called recode in \(\mathcal {X}\), to obtain \(S''\) only by using the decoder (without local search). In decoder D1, the recode can be computationally expensive so we decided to maintain the local search as part of the decoder. In decoder D2, the recode can be applied by calculating the value of the key that will correspond to a data center in \(S''\). Let \(i'\) the index of DC i, \(lb=i'/|N|\) and \(lu=(i+1)'/|N|\), a key that corresponds to DC can be given by \(lb+(lu-lb)/2\) or any number in the interval [lbub). Experiments including the recode process are presented in the Sect. 4.4. To simplify the notation, we denote by D3 the decoder D2 with recode process.

3.4.2 Hybrid BRKGA and path-relinking

Path-relinking is an approach to integrate intensification and diversification in the search and can be incorporated on many metaheuristic frameworks. Path-relinking is frequently used with GRASP algorithms (Laguna and Martí 1999; Oliveira et al. 2004; Mateus et al. 2010; Festa and Resende 2013), but references can be found for other metaheuristics as Scatter Search (Glover et al. 2003), Tabu Search (Armentano et al. 2011), VNS (Festa et al. 2002). Regarding GAs, we find several papers where the path-relinking technique is applied, as for example Basseur et al. (2005) and Vallada and Ruiz (2010), just to name a few.

In this paper, we propose a new approach that hybridizes path-relinking with BRKGA. Considering Algorithm 6, in our approach, at every n iteration of the main loop of this algorithm, we include the path-relinking management framework PRS or PRM (Algorithms 2 or 3) after the decoder step (10). By default we use \(n=1\), i.e., we apply the path-relinking each time a new generation is produced. Since both path-relinking procedures require a solution S, we randomly select S with a uniform distribution among all elite solutions of the BRKGA population. In the case that the path-relinking returns a solution \(S'\) that is better than S, then the corresponding vector \(\mathcal {X}\) of S is recoded to be adjusted to \(S'\).

In Sect. 4, experiments with both path-relinking strategies in comparison with the standard approach are presented in detail. Since the path-relinking uses the recode process, experiments are performed only with decoder D3.

4 Computational results

The experiments were conducted on a computer with an AMD FX-8150 Eight-Core 3.6 GHz CPUs, with at least 32 GB of RAM running GNU/Linux, except for experiments with CPLEX which we used a computer with quad-core Intel Xeon E5530 2.4 GHz CPUs, with at least 48 GB of RAM running GNU/Linux. Algorithms are implemented in C++, with optimization flag -O3. For BRKGA we use the API described in Toso and Resende (2015). The commercial solver IBM ILOG CPLEX Optimizer version 12.6.0.0 (C++ API) was used to evaluate the mathematical models. All experiments used a single thread but multiple experiments were run in parallel.

Experiments were conducted with two main objectives. The first was to investigate the performance of CPLEX considering the different mathematical models described in Sect. 2. The second was to evaluate the performance of the heuristic approaches described in Sect. 3 and some hybridized variants, including heuristic approaches, local search strategies, and path-relinking procedures.

Initially we describe the method to generate the dataset used in the experiments. Following that, we report results for CPLEX, GRASP, BRKGA, and then we report the best results of each heuristic method. Finally, we use our best approaches on a set of instances of GQAP, comparing our results with the state-of-the-art results for the problem described in Mateus et al. (2010).

4.1 Data set

In this subsection we present a problem-instance generator used to create the data set used in the computational experiments reported by this paper. Since we do not have access to real data, we randomly generate a set of instances with some empirically defined parameters to study as many different scenarios as possible.

For each instance the generator receives as input four parameters: |N|, |K|, |U|, and P (the last parameter represents the percentage of the overall data center occupation).

To ensure the generator creates instances that admit feasible solutions, we generate the data for each instance based on n sets of pre-placed virtual machines to data centers (by default \(n=3\)). Given the capacity of each data center, each set of pre-placed \(s\in \mathcal {S}\) is generated by randomly placing each virtual machine at a data center with probability proportional to the data center capacity, ensuring the capacity is not violated. Biased on these pre-placements, we generate the remaining data respecting the constraints of the problem, ensuring at least n feasible solutions for each instance.

Considering a random number generator using uniform distribution, and \(M'\) a sufficiently large number, we generate the parameter data for each instance using the following steps.

Data center capacity The total number of available virtual machines is given by \(n'=\max \left( |N|, \left\lceil \frac{|K|}{|P|} \right\rceil \right) \). Thus, to define the values of \(a_i\) for each DC i, we start with all \(a_i=0\) and select \(n'\) times a random data center i, and increase \(a_i\) by one. We also ensure that each data center has a capacity \(a_i\) greater or equal to one. At this step, the n pre-placements described before are generated.

Required virtual machine bandwidth For each pair of virtual machines \(v \in K\) and \(w \in K\) the bandwidth \(b_{vw}\) is a random number in the interval [0 : 9]. This matrix is symmetric, i.e, \(b_{vw}=b_{wv}\), and \(b_{vv}=0\).

Data center bandwidth Having defined the bandwidth between each pair of virtual machines in the previous step, we generate the values of data center bandwidth based in the pre-placements \(\mathcal {S}\). Let \(b_{ij}^s\) be the sum of bandwidth between all virtual machines pre-placed to i and j in \(s \in \mathcal {S}\). For each pair of data centers ij, we associate a bandwidth \(B_{ij}=\max \{b_{ij}^s \},\ \forall s \in {S}\). This matrix also is symmetric, i.e \(B_{ij}=B_{ji}\), with \(B_{ii}=M'\).

Data center latency For each pair of data center ij, the latency \(L_{ij}\) is a random number selected in the interval [5 : 20]. This matrix is symmetric, i.e, \(L_{ij}=L_{ji}\), with \(L_{ii}=0\).

Required virtual machine latency Let \(l_{vm}^s\) be the latency \(L_{ij}\) between the data centers ij where v is placed in i and w is placed in j in the pre-placement \(s \in \mathcal {S}\). We randomly select \(n=|K|*2\) distinct pairs vw to associate a required latency \(l_{vw}=\max \{l_{vm}^s\},\ \forall s \in \mathcal {S}\). The remaining latency \(l_{vw}\) is defined as \(M'\), indicating that no latency is required. We also ensure that \(l_{vw}=l_{wv}\), and \(l_{vv}=0\).

Users in data centers Users are allocated at random to data centers chosen with probability proportional to their capacity. More than one user can be located at the same data center.

Required user latency For each user, we randomly select a virtual machine v to define a required latency. Let i(s) be the data center where v is placed in \(s \in \mathcal {S}\), thus the required latency between u and v is given by \(t_{vu}=\max \{L_{d(u),i(s)}\}, \forall s \in \mathcal {S}\). The remaining user required latency t is set to \(M'\).

Transferring data center cost For each pair of data center ij, the cost \(C_{ij}\) is a random number in the interval [10.00 : 100.00]. This matrix is symmetric, i.e \(C_{ij}=C_{ji}\), and \(C_{ii}=0\).

The parameters |N| and |K| can be used to define the instance size, while parameters |U| and P can be used to adjust how the problem should be restricted. Finally, we generate a set of instances by combining values from \(N=\{5,10,25\}\), \(K=\{15,20,25,50,100,150,200\}\), \(U=\{K_i*0.5,K_i,K_i*1.5\}\), and \(P=\{70,90\}\). Tables 1 and 2 list all instances we generated, and the values of |N|, |K|, |U| and P are encoded in the name of instance in this respective order. All instances and their best known solutions are available at www.inf.ufsm.br/~stefanello/instances/vmplacement.

The cost \(c_{iv}\) is a generic user-defined parameter that can be used to consider a placement. This cost can be related to a simple requesting virtual machine or a setup time to start a virtual machine in a data center. We consider this parameter in the objective function for two reasons: (i) To keep the problem as generic as possible, considering these additional costs. (ii) Since we define the VMPlacement problem as a generalization of GQAP, we included this cost inherited from the generic definition of GQAP. However, in our experiments, without any loss of generality, we disregard this cost by simply set the cost to zero.

Table 1 CPLEX detailed results for small instances
Table 2 CPLEX detailed results for medium and large instances

4.2 CPLEX results

In this section we investigate the performance of CPLEX for both linear mathematical models described in Sect. 2. We carried out an empirical study to analyze which size of instances the CPLEX solver can handle and prove optimality, and which model provides better integer solutions and lower bounds when CPLEX is terminated with a time limit. The objective is also to obtain baseline results for comparison of heuristic methods. We first analyze the results for small-size instances, followed by a set of the medium-size instances and large-size instances. We run CPLEX for the QMVMP model applied to small instances. For example, for the instance 05_015_007_70, CPLEX spent more than 680 s to find a feasible solution, and its gap was of 162%. Since the performance of the linear models was considered better than for the quadratic model, we report only results for the mixed-integer linear model.

4.2.1 Results for small size instances

In the first experiment, we evaluated the performance of CPLEX with the mathematical models described in Sect. 2. We used the standard CPLEX 12.6 solver for models LMVMP and LMVMPII. The time limit was set to 3 h per run (10,800 s) and the number of threads to one. The remaining parameters were kept to their default values.

In Fischetti and Monaci (2014) the authors exploited erraticism in search and how to take advantage of this behavior. Also, the authors show that CPLEX can have a wide contrast in its behavior due randomized initial conditions. Thus, to better evaluate the CPLEX performance we perform ten runs for each instance, each one with a different random seed defined by the CPLEX parameter RandomSeed.

Table 1 shows CPLEX results. The first column shows the name of the instances. Instances are generated and named as described in Sect. 4.1. The second column (BKS) shows the objective function of the best known solution value found for each instance (optimal solutions are shown in boldface). The next two columns show the average running times of CPLEX to solve each instance for each mathematical model. Numbers in parenthesis denote the number of runs that the solver did not prove optimality within the time limit. The next two columns show the average running times that CPLEX spent to find the BKS value reported in columns 3 and 4. Observe that when BKS presents an optimal solution, the difference on times reported in columns 5 and 6 to their corresponding columns 3 and 4, represent the times CPLEX spent to prove optimality after had found the optimal solution. The average is over ten runs or the number of times that CPLEX found BKS (indicated by the number in parenthesis). A signal ‘−’ indicates that no run found a solution as good as BKS within the time limit. Finally, the last two columns show the best running times over all runs that CPLEX spent to find the BKS.

We can draw three main observations from this experiment. First, for both models, the solver tends to increase significantly the time spent to prove optimality for instances with 5 data centers, and 25 virtual machines as shown in the last six rows of Table 1. This indicates a baseline for the size of instances that this version of CPLEX can handle and prove optimality using these models in this computer. The second observation is that CPLEX performance for both models was relatively similar. Comparing the number in parenthesis of the third and fourth columns, it can be seen that CPLEX was able to prove or not the optimality for the same instances, except for instance 05_025_037_90 which CPLEX proved optimality in three runs for LMVMP, while for LMVMPII CPLEX has an average gap of \(8\%\) to the lower bound solution. However, considering the time that each model spent to find a solution with objective function equal to BKS, in most cases CPLEX with LMVMPII spent less time than with LMVMP. Finally, the last observation is about the variance of time to find the BKS. With the randomized initial conditions, CPLEX presented a large variance in its behavior. For example, for the instance 05_025_012_70 and LMVMPII, in the best case CPLEX found a BKS in 2.2 s, while in the worst case it took 3458.01 s to find the same solution. For this instance, the time to prove optimality also has large variance, alternating between a minimum of 3685.67 and a maximum of 4828.05 s.

We also evaluate the model LMVMPII without including the additional set of cuts (30). In comparison with the model with the cuts, we observed a worse performance of CPLEX. For six instances CPLEX was not able to prove optimality in any of the ten runs. For these cases, the gap of CPLEX is high. For example, for the instance 05_025_012_90, the average gap was \(53\%\), confirming that the relaxation quality of this model is low. Furthermore, the time to find a BKS increased by a factor of about 8, and the number of nodes explored within the time limit or to prove optimality increased about 7 times in comparison with the model that includes the set of cuts (30).

4.2.2 Results for medium and large size instances

In this subsection we present results for CPLEX when applied to a set of medium and large size instances. In this experiment, we evaluate CPLEX for each model and each instance with one run for a time limit of one day (86,400 s).

Table 2 shows for each instance the BKS obtained over all experiments, including the value reported in this paper for heuristic methods and additional non-reported experiments used to evaluate the previous version of the algorithms. The column FO shows, for each model, the objective function value of the best integer solution found by CPLEX. Column CPLEX GAP shows the gap returned by CPLEX (calculated as \(|bestbound-bestinteger|*100/bestinteger\)). The last column shows the gap from BKS to the best integer solution (FO) returned by CPLEX (calculated as \(|FO-BKS|*100/BKS\)).

The first observation from this experiment is that for instances with 10 data centers (first block in Table 2), CPLEX was able to start to solve the models, but still with a high gap even after 24 h of computation. However, we observed that the gap returned by CPLEX as well as the gap to BKS are lower for LMVMPII in comparison with the values of LMVMP. The average gap returned by CPLEX with LMVMPII was around half of the average gap for LMVMP. The average gap to BKS was \(2.19\%\) for LMVMPII, while for LMVMP the gap was \(4.96\%\).

The second observation is that for instances with 25 data centers and LMVMP model, CPLEX spent the whole time in the presolve phase without solving the root relaxation node. On the other hand, for LMVMPII, CPLEX was able to start a node exploration and found feasible solutions. For instances with 25 data centers, CPLEX explored an average of approximately 34800 nodes for instances with 100 VMs, 5700 nodes for instances with 150 VMs, and 1000 nodes for instances with 200 VMs. Even though CPLEX maintained a large gap, the gap with respect to the BKS was relatively small, considering the size of the instance and the difficulty to find feasible solutions (Stefanello et al. 2015a).

4.3 GRASP results

This subsection presents results and analyzes feasibility for different constructive heuristics combined with different local search procedures embedded in the GRASP framework. Also, we report experiments with the path-relinking procedure.

The main goals of the following experiments are show how difficult is to find feasible solutions using a constructive heuristic, and explore the effect of embedding a local search procedure to obtain a feasible solution. Finally, the last main objective is to evaluate the impact of each component added to the framework, i.e., constructive heuristic, local search method, and path-relinking strategy.

In the first part of our experiment, we evaluate the performance of each constructive heuristic described in Sect. 3.3, embedded in the GRASP. We performed one run for each instance, and for each combination of constructive heuristic and local search method. The stopping criterion of each run was a time limit of \(|K|*|N|*\theta \) seconds, where \(\theta =0.8\). We use this stopping criterion in all experiments in order to provide a fair comparison for all evaluated strategies.

Table 3 shows the average of the percentage of feasible solutions found on each run of the algorithm for all instances and combinations of local search (rows on first column) and constructive heuristic (columns 2 to 5).

Table 3 Percentage of feasible solutions found by GRASP for different constructive heuristic and local search strategy

The first observation from Table 3 is that the probability of finding a feasible solution using only a constructive heuristic without local search is low (row NoLS). On average, the percentage of feasible solutions was less than 1%, with an average of less than 2% for the constructive heuristic DC-Greedy. However, the main observation is that the constructive heuristic has a lower impact on the percentage of feasible solution in comparison with the impact of any local search method. When a local search procedure is considered (also used as repair procedure), the percentage of feasible solutions increases considerable. For example, using 2V the percentage of feasible solutions is higher than 77% and higher than 82% for 3V. This occurs even with the random constructive heuristic, showing that the local search has a performance that is not highly dependent of the initial solution. We also analyze the quality of the feasible solutions found with each strategy. The conclusions are similar to the percentage of feasible solutions, where the solution quality are similar for all constructive heuristic and tends to be better when we consider a more complex neighborhood search. Thus, in the remaining experiments we adopted the DC-Greedy constructive heuristic as default since it presents the highest percentage of feasible solutions.

Table 4 shows the percentage of feasible solutions for each instance for the constructive heuristic DC-Greedy.

Table 4 Percentage of feasible solutions found by GRASP with the DC-Greedy constructive heuristic

Even with a significant increase in the percentage of feasible solutions found when the local search is applied, some instances still have a low percentage of feasible solutions. This shows that the set of instances used to evaluate the algorithms is diverse and not trivial to solve. We also observed that, as expected, most instances with a higher percentage of data center occupation have a low percentage of feasible solutions in comparison with the respective instance with a low percentage of occupation.

In the next set of experiments, the main objective is to evaluate the path-relinking component in the GRASP procedure. For each experiment, we performed five independent runs totalling an amount of 180 samples for each experiment. We also use \(\theta =0.8\) as stopping criterion The local search strategy used in the path-relinking is always the same one used after the construction phase in GRASP procedure. Experiments are chosen to analyze the performance for different combination of algorithms and to show the impact of each component.

To compare the results with respect to the best solution found on each run, we first scale the objective function value to the range [0, 1] since each instance can have very different values. The scale is a simple transformation where for each instance, the largest cost over all analyzed experiments is scaled to 1 and the lowest is scaled to 0.

Figure 3 shows the distribution of the scaled cost for each algorithm. The box plots show the location of the minimum value, lower quartile, median, upper quartile, and maximum value of each experiment. The dots are the outliers. Experiments are represented on the horizontal axis and are labelled as composition of main algorithm name, path-relinking type, local search strategy, and size \(\delta _{max}\) of elite set parameter. Experiments ended with “\(*\)” use a full exploration on the neighborhood chain3L, i.e, \(\alpha _{3L}=1\) and \(\beta _{3L}=|N|\). Also, note that the first three experiments are a simple GRASP heuristic and do not include a path-relinking component.

Fig. 3
figure 3

Dispersion of scaled cost for each algorithm

The first observation regards the relation between the intensification of the neighborhood exploration and restarting the search. On the one hand, from Tables 3 and 4 we observe that the local search that includes more complex neighborhoods overcomes in most cases the searches with less complex neighborhoods in the percentage of feasible solutions. Since the strategies are executed for the same time, is natural that a less complex search has a higher number of iterations and a higher numbers of restarts. This is an indication that for this problem it is preferable to invest in the exploration of the neighborhood of a high-quality solution rather than making a full restart to a new point of the search space. On the other hand, even with a higher ability of the local search 4V in producing feasible solutions and exploring a large neighborhood, when we consider the best solution found in the experiments, local search 3V produces better results than 4V. This indicates that for the proposed neighborhood strategies, there is a trade-off between intensification of the neighborhood exploration and time spent on each local search strategy. This can be used to guide the search and select the best strategies for a general proposed method.

The second observation is that the path-relinking component improves the results significantly in comparison with the case without this strategy (case GRASP-2V, GRASP-3V, and GRASP-4V). We also observe that both path-relinking strategies have similar performance. Furthermore, the results are not influenced heavily by the values of elite size in the tested interval.

To confirm the results presented in Fig. 3, we use the R package to test the normality of these distributions using the Shapiro–Wilk test and apply the Mann–Whitney–Wilcoxon U test. For all tests, we assume a confidence interval of 99%. Shapiro–Wilk tests indicate that no cost distribution fits a normal distribution since the p values for all tests are less than 0.01. Therefore, we applied the U test which assumes as null hypothesis that the location statistics are equal in both distributions. We also use a p value correction procedure based on false discovery rate (FDR) to minimize the number of false positives.

Table 5 shows U test results for each pair of algorithms. The structure of this table is as follows: Each row and column is indexed by one algorithm. Each element in the diagonal (bold) is the median of the scaled cost of the corresponding algorithm. The upper-right diagonal elements are the differences in location statistics for each pair of algorithms. A negative difference indicates that the “row algorithm” has its location statistics lower (better) than the “column algorithm”, and the positive difference is the opposite. The bottom-left diagonal elements are the p values of each test. Math signals indicate when \(p<0.01\) for a U test between “row algorithm” and “column algorithm” for the respective signal alternative hypothesis. The case “less” indicates that the “row algorithm” overcomes the “column algorithm”, or the opposite in the case “greater”.

Table 5 Values of medians, p values, and difference in median location for cost distributions using a confidence interval of 99% for GRASP algorithm

Table 5 confirms that the difference between the results of the standard GRASP procedure and the procedure that includes the path-relinking component are statistically significant for a confidence interval of 99%. Tests between cases with local search 3V and different elite sizes indicate that the difference is not significant, even for both path-relinking strategies. However analyzing the values of median and median location, we conclude that there is a low advantage for the PRM strategy with elite size equal to 8.

A report for the solution quality obtained with this strategy is shown in Sect. 4.5. Next we report results for the BRKGA.

4.4 BRKGA results

This subsection presents results for the biased random-key genetic algorithm. The objective is to analyze different procedures, decoders, the recode process, and the path-relinking procedure introduced in the algorithm. The experiments follow the rules and the objectives explained in the previous subsection.

BRKGA parameters was set to the same parameters as described in Stefanello et al. (2015a) which are similar to the values suggested by Gonçalves and Resende (2011), i.e., the elite size \(p_e = 0.24p\), the set of mutants \(p_m=0.12p\), and the probability of inheriting \(\rho _A=0.6\). The restart parameter was disabled and the population size \(p=75\). The stopping criterion for all runs was a time set to \(|K|*|N|*\theta \) seconds (where \(\theta =0.8\)).

We run experiments that include different local searches, decoders and path-relinking procedures. In each experiment, we performed five independent runs with different random seeds for each instance, given a total of 180 runs. To compare the results we scale the values of the objective function for the best solution found on each run to the range [0, 1], as described in the previous subsection.

Figure 4 shows the box plot for the distribution of the scaled cost for each algorithm. Algorithms are represented on the horizontal axis and are labelled as a composition of main algorithm name, path-relinking strategy, decoder method, local search strategy, and size of elite set parameter.

Fig. 4
figure 4

Dispersion of scaled cost for each algorithm

The first observation is that the recode process helps to improve the results, since all experiments with decoder D3 have medians lower than the respective corresponding algorithm with decoders D1 and D2. Note for example that without the recode, one modification in a random key can have influence in one or more placements (since the local search can apply many changes). Using the recode, when one key is changed, the modification is in only one placement. This well-defined behavior helps to keep a more accurate information of each individual. The second observation is that for the local search strategies, the same conclusions from the GRASP can be taken for BRKGA. Experiments with local search 3V provides better results than the other local search strategies. Finally, strategies that include the path-relinking procedure are more effective to producing better results than the standard counterpart. However, different from what was observed with GRASP, the difference between PRM and PRS is significant. We attribute the worst performance of PRS due to the small number of applications of the path-relinking since it is limited to only a few generations. Also, after some iterations of BRKGA, the solutions in the elite set of BRKGA tend to be similar, and fail in the similarity check with the solution in the elite set of path-relinking, and thus, the path-relinking operator is not performed.

Table 6 shows U test results for each pair of algorithms. We present the values of medians, p values, and the difference in median location for the scaled cost distributions using a confidence interval of 99% for all experiments. The organization of this table follows the description of Table 5.

Table 6 Values of medians, p values, and difference in median location for cost distributions using a confidence interval of 99% for BRKGA algorithm

Table 6 confirms the considerations described above showing that the best approaches use the path-relinking component (PRM). Also, as mentioned for GRASP in the previous subsection, the experiments for different elite sizes indicate that the differences are not statistically significant for a confidence interval of 99%. However, analyzing the values of the median, and median location we conclude that with a elite set of size 8, the procedure tends to produce better results.

In Stefanello et al. (2015a) this problem was first introduced, and the first version of BRKGA is presented. From the original approach, three main improvements are made. First, we added a new neighborhood search. In the original approach, we described a local search that includes only shift and swap moves (namely LSW that correspond to 2V in this paper). We observe from the results in Fig. 4 and in Table 6 that the inclusion of this new neighborhood strategy (chain search) helps to significantly improve the results (comparison between BRKGA-D2-2V and BRKGA-D2-3V). Second, the inclusion of the recode procedure in the decoder D2 (defined as decoder D3), also significantly improves the results (comparison between BRKGA-D2-2V and BRKGA-D3-2V, as well as BRKGA-D2-3V and BRKGA-D3-3V). Finally, we included a path-relinking component that, to the best of our knowledge, is the first experiment that considers this hybrid approach between BRKGA and path-relinking.

A final observation about running times of BRKGA is that results are reported using single-thread processing to provide a fair comparison with the GRASP and CPLEX. However, the BRKGA API of Toso and Resende (2015) provides for efficient multi-thread decoding that could be used to substantially reduce the running times when multiple processors are available.

4.5 Additional comparison

In this subsection we report an overview over the best strategies for BRKGA and GRASP, providing addition comparisons and information for both algorithms. Figure 5 shows an overview for the experiments for both strategies reported in the previous subsections.

Fig. 5
figure 5

Dispersion of scaled cost for each algorithm

Table 7 Comparison of percentage gap and last improve time for BRKGA-PR and GRASP-PR

As depicted in Fig. 5 and reported in Sects. 4.3 and 4.4, the best approach for both algorithms is consider the local search 3V and to inclusion of the path-relinking component with elite size equal to 8. Table 7 presents supplementary information for the experiments with both strategies. The second column shows the best-known solution value (BKS) for each instance. The next columns show the minimum, average and maximum percentage gap for both algorithms. The percentage gap is calculated by the formula (FO-BKS)*100/BKS, where FO is the corresponding value of the objective function for the respective algorithm. Finally, columns Time show the minimum and average time in seconds in which each algorithm obtained the last improvement in the objective function.

From this table, we observe that the percentage gap is relatively small for both algorithms, showing that the approaches are efficient in finding good quality solutions. For instances with 10 data centers, in most cases both algorithms found the BKS. Also, considering that each run was stopped by the time limit (\(|N|*|K|*0.8\)), from columns Time we observe that a significant portion of the execution was spent without improvements. This indicates that the algorithm has a fast convergence, and will probably provide a good solution if stopped before the used time limit. Furthermore, we observe that BRKGA-PR is faster than GRASP-PR for the instances set with 10 data centers. However, the opposite happens for the instances with 25 data centers.

Finally, the last experiment uses the Time-To-Target (TTT) plots to display the running time distribution for the algorithm to find a solution at least as good as a given target value. TTT plots were proposed by Feo et al. (1994), Aiex et al. (2007) and have been advocated by Hoos and Stützle (1998b, 1998a) as a way to characterize the running times of stochastic algorithms for optimization problems. The experiment consists in performing 200 runs of BRKGA-PR and GRASP-PR algorithms for two instances until a target is reached. The target was set to be the worst solution found for both strategies in the previous experiment, i.e, 1,468,973 for 010_100_100_70 and 9,077,672 for 025_200_100_90. We choose one instance with 10 and one instance with 25 data centers, both with the lowest difference of average percentage gap between both approaches (excluding equal values).

Figure 6 illustrates the cumulative probability plot obtained by using BRKGA-PR and GRASP-PR for two instances. For instance 10_100_100_70, we observe that BRKGA-PR has a higher probability of finding a solution at least as good as the target in less time than GRASP-PR. For instance 25_150_225_70 likelihood is reverse.

In summary, both approaches have good performance according to specific scenarios. The best performance of BRKGA-PR is on small and median size instances while GRASP-PR has a better performance on large instances. The path-relinking component provides an improvement for both strategies. We attribute this performance to a rugged landscape for this problem and the fact that is hard to find feasible solutions. Since in general path-relinking is performed between two feasible solutions, this component explores a search space where solutions are feasible or has fewer infeasibilities. Thus, when local search is applied to a solution in the path, the search tends to be faster and leads to a new local minimum that can be better than the initial and guiding solutions.

Finally, we evaluate our algorithm considering a set of instances of sizes not address before in the literature. We test for instances with up to 25 data centers and 200 virtual machines. Even though in real world application 200 VMs can be considered small-scale, previous work on related problems addressed instances with up to 50. A major contribution of our work is to consider four times larger than those previously considered. Furthermore, we provide a new set of benchmark instances for a future experimental analyses of algorithms.

Fig. 6
figure 6

Cumulative probability distribution

In the next subsection we evaluate our proposed algorithms on an adapted set of instances from a similar problem and compare our algorithm with a method described in the literature.

4.6 Results for the generalized quadratic assignment problem

VMPlacement is a generalization of GQAP and our proposed method can be easily adapted for GQAP. Since we evaluate the infeasibilities in the objective function, we only need to change the cost evaluation function. However, for convenience, we adapted the instances of GQAP to make them instances of the VMPlacement. To do this, we give a sufficiently large value for bandwidth capacity between each pair of data centers as well as for the required latency between each pair of virtual machines. Finally, we also define an empty set of users.

The main objective of this experiment is to provide a brief comparison with a method described in the literature to show that our approach has a competitive performance. We first perform an experiment with CPLEX to provide a baseline comparison with exact methods. Later, we compare our algorithms with a GRASP with path-relinking proposed by Mateus et al. (2010).

Table 8 CPLEX results for GQAP instances

We use a set of instances proposed by Cordeau et al. (2006). These instances have 20 to 50 facilities (virtual machines) and 6 to 20 locations (data centers). Instances are listed in Table 8 and are labelled, respectively, with the number of facilities (virtual machines), locations (data centers), and a parameter that controls the tightness of the capacity constraints. The higher the value of this parameter, the higher the tightness of the capacity constraints. Since this set comprises the largest instance set available for GQAP, we evaluate our strategies only for this set of instances. These experiments are carried out on a computer with an Intel(R) Core(TM) i7 CPU 930 with 2.80 GHz and 12GB of main memory.

Cordeau et al. (2006) evaluate CPLEX (version 8.1) exploring the emphasis parameter on the branch-and-bound tree for a time limit of 2 h. CPLEX was able to prove optimality only for instance 30-08-55. Pessoa et al. (2010) proposed exact algorithms that combine branch-and-bound with a new Lagrangean decomposition and the Reformulation-Linearization Technique and prove optimality for another 13 instances of this instance set.

In the first experiment we evaluate the performance of CPLEX on models LMVMP and LMVMPII, described in Sect. 2. We run CPLEX for each instance with a time limit of one day (86,400 s). We also set the tree memory parameter (TreLim) to 5000 to stop execution when this memory limit is exceeded. The remaining parameters are kept to their default values.

Table 8 shows the results for each instance listed in the first column. The second column shows the best-known solutions for each instance (optimal values are given in boldface). The next two sets of columns show the results for each linear mathematical model. Column Nodes shows the number of search tree nodes visited by CPLEX. Column Integer Sol shows the objective function of best solution found during the execution. Column gap shows the percentage gap between the lower bound and the best integer solution found by CPLEX. Column Time show the running time in seconds for CPLEX to prove optimality or satisfy the stopping criterion. Instances that CPLEX found the optimal solution or proved optimality are highlighted in boldface.

Observing the results in Pessoa et al. (2010) and comparing then with the results in Table 8, we note that the branch-and-bound approach in Pessoa et al. (2010) tends to be more efficient than CPLEX. In general, the running time tends to be smaller than the required by CPLEX. However, both methods are not strictly comparable since the experiments use different stopping criteria, and are reported over different computers. In the end, the main observation from these results is that even with a high increase in the computational power and the development of many algorithms and improvements over each CPLEX version, this set of instances is still hard to be solved by exact methods.

Table 9 Comparison algorithms for GQAP

In a second experiment, we compare our strategies with the GRASP-PR proposed by Mateus et al. (2010), the state-of-the-art heuristic for GQAP. For each instance we performed 200 independent runs. Each run stopped when a solution value as good as the best-known solution was found (column BKS in Table 8).

Two main differences are observed between our approaches and GRASP-PR proposed by Mateus et al. (2010). First, we performed a wide exploration with the local search strategy. Mateus et al. (2010) uses an approximate local search using shift and swap moves. In our approach, we explore all neighbors using three neighborhood structures. Second, we use a penalization strategy to deal with infeasible solutions. This allows visiting infeasible solutions to reach new feasible solutions that can be difficult to be reached only exploring a feasible search space.

Table 9 shows a comparison of GRASP-PR of Mateus et al. (2010) and our best two approaches described in the previous subsection. Columns Min, Avg, Max, SD give the minimum, average, and maximum times, as well as the standard deviation of these runs to find a solution with values equal to BKS. Finally, column 0.95 shows the time in which 95% of the runs find the BKS.

The results described in Mateus et al. (2010) are reported on a Dell PE1950 computer with dual quad core 2.66 GHz Intel Xeon processors. Unfortunately, there is no specific information about the processor model. We assume the model Intel(R) Xeon(TM) X5355 2.66 GHz, based on server model, the number of cores and clock frequency. We run our algorithms for this set of experiments on a computer with an Intel(R) Core(TM) i7 CPU 930 2.80 GHz. The machine that we use is approximately 1.2 times faster than the one used in Mateus et al. (2010) (based on Single Thread Rating data from wwww.cpubenchmark.net). The execution times for the two implementations are not strictly comparable since the languages and compilers used are different. However, at a high level, this comparison provides an idea of the behavior of the algorithms.

Even considering that the running times are measured on different hardware, the data from Table 9 shows that our version of GRASP-PR, as well as the BRKGA-PR approach have a significant lower running time in comparison with the results reported in Mateus et al. (2010). The average minimum time is nearly two orders of magnitude smaller. This evidence that our proposed strategies have competitive performance since they are able to find the BKS in all runs in times that are significantly shorter than that those reported in Mateus et al. (2010). We also observe that our strategies are robust in the sense the algorithms were developed for a specific problem and they also perform well in a general application without any modification. Finally, statistical tests similar to the one described in previous subsections comparing our two approaches shows that GRASP-PR is slightly faster than BRKGA-PR, especially for instances with 35 or more facilities. For instances with up to 30 facilities, the behavior is the reverse.

Figure 7 shows the time-to-target plots depicting the run time distributions for BRKGA-PR and GRASP-PR with target solution defined as BKS. We choose the same instances used to show the time-to-target for GRASP-PR in Mateus et al. (2010). We observe that the behavior, in general, is similar for BRKGA-PR and GRASP-PR, but the performance of one algorithm can be better than the other depending on the instance.

Fig. 7
figure 7

Cumulative probability distribution for BRKGA-PR and GRASP-PR running times for instances 20-15-75, 30-20-35, 35-15-95, and 40-09-95

5 Concluding remarks

In this paper, we presented the problem of minimizing the cost of placing virtual machines across geo-separated data centers. A quadratic and two linear mathematical programming formulations were presented. Moreover, we presented several heuristics for solving this problem. In the experiments, we evaluate the performance of CPLEX using the proposed mathematical programming formulations, and the proposed heuristic methods.

The results of CPLEX show that by adding the set of cuts the solver significantly improves the quality of the lower bounds for the LMVMPII model. We also observed that for this model, CPLEX can handle larger instances than considering LMVMP model, obtaining better lower bounds and feasible solutions in less computational time. However, as an exact method, CPLEX is limited to solve small instances, showing that heuristic methods are required on larger size instances.

Two heuristics approaches are used to solve the problem, GRASP and BRKGA. In both methods, we use a path-relinking procedure as an intensification mechanism. Also, we use a local search that explores many neighborhood strategies. Both heuristic approaches had similar performance, with the best performance achieved using local search 3V and path-relinking PRM. A slight advantage for BRKGA was observed in small instances while GRASP has a slight advantage on larger instances. The same performance was observed when our strategies were applied to GQAP instances. In this case, our methods outperforms the previous state-of-the-art algorithm.

Finally, considering the high cost involved in this kind of problem, and the difficulty to obtain feasible solutions when taking into account several constraints and limited resources, the proposed algorithms are good alternative to reduce costs, while maintaining the reliability and the demand requirements of data centers.