Introduction

The job shop scheduling problem (JSSP) is a complicated combinatorial optimization problem (Sharma et al., 2018) that schedules a set of n jobs (\(n\ge 1\)) on a set of m machines (\(m\ge 1\)) under a set of constraints (Cruz-Chávez et al., 2019). Although there are many specific performance measures for JSSP (e.g. throughput time, tardiness, earliness, makespan and due-date), makespan is the most commonly used (Zhang et al., 2019). The JSSP has been proven in the literature to be a challenging NP-hard problem (Ibrahim & Tawhid, 2022) with \((n!)^m\) possible solutions, reflecting a high computational complexity (Zhao et al., 2018).

An extensive amount of research has been conducted to find the optimal or near-optimal solutions of this complex JSSP (Çaliş & Bulkan, 2015); however, the relatively small-sized 10\(\times \)10 JSSP developed remained unsolved for a quarter of a century (Mishra et al., 2017). The realization of complexity and the incapability of exact approaches to solve even medium-sized JSSP has meant that research has focused on approximation approaches (Zhao et al., 2016). Many of these approximation approaches include classical and emerging meta-heuristic approaches to be implemented to solve the JSSP, such as particle swarm optimization (PSO) and genetic algorithm (GA) (Zhao et al., 2016), differential evolution (DE) (Wisittipanich & Kachitvichyanukul, 2012), bacterial foraging algorithm (BFA) (Zhao et al., 2015), teaching-learning based optimization (TLBO) (Baykasoğlu et al., 2014), and grey wolf optimization (GWO) (Liu et al., 2020). However, these stand-alone algorithms could not find optimal solutions for many well-known JSSP instances (if known). The potential reason is that the application of these class of meta-heuristics in combinatorial optimization is challenging due to the lack of a fine-tuning property around optima and premature convergence (Zarandi et al., 2020), which encourages researchers to develop hybrid approaches for concurrently exploiting complementary features from two or more algorithms (Cruz-Chávez et al., 2019).

The DE algorithm is a population-based meta-heuristic approach that has successfully solved both continuous and discrete optimization problems. The most significant benefit of DE is that it has multiple mutation operators, and each has individual characteristics (Liu et al., 2020). However, Ponsich and Coello (2013) first implemented DE for JSSP and claimed that the selection operator of DE is essentially greedy, resulting in premature convergence and losing the opportunity to reach global optima. Ren and Wang (2012) also reported that DE prematurely converges because of filling similar individuals in the population over the evolution process, causing a lack of diversity in the population. Moreover, Ponsich et al. (2009) verified that DE alone is incapable of producing promising results over simple GA, greedy randomized adaptive search procedure (GRASP) or Tabu search (TS) in combinatorial optimization domains. On the other hand, PSO has been implemented successfully in different problem domains. However, Sha and Hsu (2006) conducted a comparative study for solving JSSP and concluded that stand-alone PSO could not produce optimal results for large-sized problems, but hybrid PSO could. The authors employed the TS in PSO, which results in better performance. The initial solution of PSO was improved by amalgamating a dispatching rule and a constructive heuristic, and the variable neighbourhood search was added to attain optimal solutions consuming less computational time (Marichelvam et al., 2020)–the authors claimed that hybrid PSO performs much better than PSO.

The population updating strategy of DE and PSO is distinct. To illustrate, the new swarm in PSO does not depend on whether the particle has been improved or not from the previous swarm (Wang et al., 2019) and, therefore, there is a chance of allowing the worst particle in the new generation. This concept is the opposite of DE, where only the improved individual is allowed for the next generation, which is why it is called a greedy selection process. This process can create a population with lower divergence, leading to premature convergence (Ponsich & Coello, 2013). These open problems are addressed by designing the DE with three popular mutation operators in which “DE/rand/1” and “DE/best/1” are well-known respectively for the exploitation property and exploration property, respectively and “DE/current-to-pbest/1” possesses both properties (Liu et al., 2020). In addition, this multi-operator DE (MODE) can be integrated with PSO, which may further lead to higher population diversity and ensure an improved evolutionary process, increasing the chance of achieving an optimal solution. An evolutionary process may perform poorly with the progression of generations, and it is inappropriate to allow the search method to continue the evolution (Elsayed et al., 2011). Thus, a population switching strategy (Wisittipanich & Kachitvichyanukul, 2012) based on the performance of an algorithm is employed in hybrid evolutionary algorithms (HEAs) to emphasize higher-performing algorithms. The lack of a fine-tuning property in meta-heuristics encourages the embedding of TS into this hybrid scheme to improve the local search property. TS is one of the strong local search techniques which has a strong exploitation property and can avoid becoming trapped in local optima (Sha & Hsu, 2006). Moreover, the initial population of this proposed algorithm is generated by implementing a mixed selection strategy (MSS) which considers both fitness value and the diversity of an individual while selecting schedules, since the initial population impacts the solution quality and the computational time (Cheng et al., 2016).

The major contributions of this research paper are:

  • Diversity of the initial population is ensured by implementing an MSS considering both solution quality and diversity of a schedule.

  • The proposed hybridization schemes of both MODE and PSO produce better-diverged populations with a better evolutionary process–to our knowledge, this is the first application of hybridizing MODE and PSO for these NP-hard problems.

  • A performance-driven switching mechanism (PDSM) is implemented in the HEAs to escape the evolution process from the poorly performing search algorithm.

  • Instead of ensuring diversity in personal best (pbest) and global best (gbest) only by considering makespan value as proposed by Sha and Hsu (2006), diversity in solution order of pbest and gbest is also ensured with makespan value to reduce unnecessary evaluations.

The performance of the predictive sequence HEA (sHEA), random sequence HEA (rHEA) and bi-population inspired HEA (bHEA) are evaluated along with stand-alone TS, DE and PSO for solving standard benchmark instances taken from the link http://jobshop.jjvh.nl/index.php. These are also assessed against 26 published algorithms, including the most popular DE, GA and other recently developed algorithms. Parametric analysis is performed to set the best combination of parameters for the HEAs, and then statistical analysis is conducted for better insights into the results. The comparative analysis shows the dominance of sHEA over all other algorithms.

The remainder of this article is organized as follows: “Literature review” section describes the relevant literature review. “Constituent algorithms” section illustrates the fundamental of the constituent algorithms, after which the problem description of JSSP is given in “Problem description of a JSSP” section. The proposed algorithms and strategies developed in this study are explained in “Proposed algorithms” section, and “Experimental design and result analysis” section describes the experimental design and result analysis. Finally, concluding remarks and future directions of the research are presented in “Conclusion and recommendation” section.

Literature review

A range of influential studies can be found in the literature, concentrating on designing algorithms to solve the challenging NP-hard JSSPs. In early periods, classical techniques of operations research, which include, but are not limited to, the shifting bottleneck procedure (Adams et al., 1988), branch-and-bound technique (Brucker et al., 1994) and the dispatching rule approach (Kannan & Ghosh, 1993), were employed to solve JSSPs. These techniques can effectively solve small-sized instances. However, the complexity analysis of those techniques revealed that finding polynomial-time algorithms were challenging. Therefore, scholars and practitioners gradually began to pay attention to meta-heuristics, such as swarm intelligence (SI) and evolutionary algorithms (EAs). Recently, there has been a trend of exploiting and improving the SI, EAs and their hybridization to address JSSPs. This section aims to review the most relevant solution techniques of the JSSPs.

Wisittipanich and Kachitvichyanukul (2012) developed two DE algorithms for solving JSSP, considering two single objectives (i.e., makespan and total weighted tardiness). The effectiveness of these two algorithms was enhanced by dynamically balancing exploration and exploitation ability to avoid premature convergence. The first approach employed simultaneously different mutation strategies to compensate for the weakness of the individual strategy, whereas the second approach changed the search behaviour whenever the solutions did not improve. A local search was embedded to promote exploitation in the search space. These algorithms yielded promising results using less computational times and fewer function evaluations than the two-stage PSO (Pratchayaborirak & Kachitvichyanukul, (2011)). Investigating other mutation mixes and their robustness limit further implications to a broader range of scheduling problems. Baykasoǧlu et al. 2014 implemented the TLBO algorithm to be testified its search mechanism in solving combinatorial optimization problems, adopting a random key based approach for obtaining a job permutation and Giffler and Thompson (G&T) algorithm (Giffler & Thompson, 1960) for active schedules. The authors claimed that this algorithm produced better results for a few test problems than some algorithms, such as hybrid GA (Ren & Wang, 2012), two memetic algorithms (Gao et al., 2011; Hasan et al., 2009) . Thus, further exploration is required to enhance its effectiveness in this domain. Qiu and Lau (2014) developed a hybrid artificial immune system (HAIS), where a modified PSO was employed to improve the antibody hypermutation process to accelerate the search procedure in solving 25 small-sized JSSPs. Thus, the algorithm’s capabilities for solving complex JSSPs are required for further investigation. Wang and Duan (2014) claimed that the classic bio-inspired computational method, such as biogeography-based optimization (BBO), is incapable of solving the challenging NP-hard JSSPs, especially the large-sized instances. Thus, the authors developed a hybrid BBO (HBBO), employing the chaos theory and “searching around the optimum” strategy with the basic BBO. This hybridization expedited the convergence towards global optimum solutions. The hybridization’s effectiveness was ensured compared with 14 popular algorithms, including modified PSO (Lin et al., 2010) in solving 44 instances. This study was limited to comparatively simple JSSPs, and thus the authors advised to consider novel techniques to enhance HBBO to handle more complicated problems. However, Lin et al. (2010) claimed that many algorithms based on heuristic algorithms, GAs, and PSO algorithms were implemented to solve JSSPs. Unfortunately, their results are not yet satisfactory.

Asadzadeh (2015) proposed an agent-based local search GA (aLSGA) for JSSPs by claiming that hybridization can enhance the performance and effectiveness of GAs. A multi-agent system that contains various agents, each with unique behaviours, was developed to implement the aLSGA. The experimental results showed its expedited convergence speed and improved solution quality. Zhao et al. (2015) developed a chemotaxis-enhanced bacterial foraging optimization (CEBFO) and added a DE operator, aiming at solving the tumble failure problem and accelerating the convergence speed of the original algorithm. An employed local search boosted exploitation capability. This algorithm proved its effectiveness in solving 38 instances against the popular hybrid PSO (Sha & Hsu, 2006) and TS guided shifting bottleneck (Pezzella & Merelli, 2000) and other classical algorithms. The authors claimed that transformation from continuous to discrete space requires much time and affects the searching procedure. Akram et al. (2016) hybridized the fast simulated annealing (FSA) with quenching (HFSAQ), where the FSA performed global search and the quenching run for trajectory search. These dual roles prevented the algorithm from becoming trapped in local optima. This algorithm’s effectiveness was established by solving 88 instances, among which 45 were solved optimally. This algorithm outperformed the 18 existing algorithms reviewed, including GA and TS (GATA) (Meeran & Morshed, 2014) that claimed that different features of many techniques are required to handle this challenging NP-hard problem. Kurdi (2016) developed an effective new island model GA (NIMGA) for JSSPs, minimizing the makespan. This study proposed a nature-inspired evolutionary model and a migration selection mechanism to improve search diversification and delay premature convergence. The evolutionary model employed different evolutionary methods while islands performed self-adaptation. The migration selection mechanism migrated worst individuals hoping to find a better chance to live in a more suitable environment. The author tested this algorithm’s effectiveness in solving 52 instances; however, the global search capability can be further testified in solving complex JSSPs, such as Taillard instances (Taillard, 1993). Zhao et al. (2016) proposed a hybrid DE and estimation of distribution algorithm (EDA) based on a neighbourhood search (NS-HDE/EDA). The chaotic strategy was enhanced the searching ability, whereas the neighbourhood search improved the solution quality. EDA contributed to enhancing the global exploitation capability of the hybridization. Although this algorithm improved solution quality compared to standard GA and PSO, it was computationally expensive due to local search. Thus, designing an efficient local search is challenging but meaningful. Moreover, the greedy selection process reduces the population diversity (Ponsich & Coello, 2013).

Dao et al. (2018) developed a parallel bat algorithm (PBA), where random-key encoding scheme and communication strategy were employed. The PBA aimed to correlate individuals in swarms and share the computational load. The authors claimed that the communication strategy ensured the diversity-enhanced bats among the split population’s groups to speed up solutions. The algorithm’s effectiveness was tested against the classical bat algorithm and PSO (Ge et al., 2008) in solving 43 instances. Notably, Ge et al. (2008) improved the PSO’s search capability using an artificial immune system (AIS). Jiang and Zhang (2018) improved the GWO algorithm by designing a discrete crossover operator, embedding an adaptive mutation method to keep population diversity and a local search for exploitation. The analysis showed that the discrete GWO outperformed the varied GWOs and many existing algorithms, including agent-based parallel GA (Asadzadeh & Zamanifar, 2010). Zhao et al. (2018) proposed a hybrid differential-based harmony search (DHS) algorithm as the population-based harmony search (HS) algorithm lacks local search capability. The authors employed the best individual in the pitch-adjustment process to expedite convergence and maintained a diversity of the population using the differential-based enhanced mechanism. Moreover, a modified variable neighbourhood search was employed to find solutions around the current harmony vector. Although this algorithm outperformed many state-of-the-art algorithms, the effectiveness of DHS is not very obvious while dealing with large-scale JSSPs. Thus, the authors suggested improving the DHS in this regard. Moreover, the proposed local search was computationally expensive, and thus further investigation was recommended. Although the artificial bee colony (ABC) algorithm efficiently solved many optimization problems, it required further improvement to solve the complex JSSP (Sharma et al., 2018). The authors maintained a proper harmony amid exploration and exploitation capabilities of the ABC by incorporating position update inspired by the beer froth (BeF) phenomenon, calling the algorithm BeFABC. The effectiveness of this algorithm was established against a list of existing algorithms, including parallel ABC (Asadzadeh, 2016); however, performance could be tested in solving complex machine scheduling problems.

Liu et al. (2020) claimed that WOA has already proved its effectiveness in solving a range of optimization problems. However, its performance was further enhanced with Lévy flight (LF) and DE (WOA-LFDE) to solve JSSP. The LF improved the abilities of global search and convergence in iteration, whereas the DE algorithm enhanced the local search and exploitation capabilities, keeping the diversity of solutions to escape local optima. The experimental results showed its superior performance against state-of-the-art algorithms, including HBBO (Wang & Duan, 2014), HAIS (Qiu & Lau, 2014), HFSAQ (Akram et al., 2016) and BeFABC (Sharma et al., 2018), in solving 88 instances. The authors directed to test the algorithm’s performance in solving other combinatorial optimization problems. Mahmud et al. (2021) developed a two-step communication based EA, where MODE was employed at the beginning of the search to exploit the exploration capability of the population-based algorithm. The TS was employed later to utilize a local search capability. This study developed a decoding heuristic for an active schedule and enhanced the evolutionary process via mixed selection and communication strategies. This algorithm showed incapability in solving complex problems due to a lack of global search capability.

The articles reviewed above considered the JSSP as a challenging NP-hard problem in the combinatorial optimization domain, solving by developing a range of algorithms. The well-known and recently emerging algorithms were implemented since those algorithms have better exploration properties due to a multi-point searching capability (Zhao et al., 2018). However, the exploitation incapability leads to higher computational expenses with poor solution quality (Zarandi et al., 2020) and the faster decline of diversity causes a premature convergence (Ren & Wang, 2012). In contrast, single-point search methods offer better exploitation; however, poor exploration and computational complexity limit their applications (Zhao et al., 2018; Liu et al., 2020). Thus, many hybrid approaches were developed, combining the first-rate features of multiple algorithms to optimize the trade-off between global search and local search capabilities (Mahmud et al., 2021), as reviewed above, explaining their strengths and drawbacks. However, the integration of multiple meta-heuristics into a single algorithmic framework can not be found in the production scheduling domain. Thus, this integration to enhance evolutionary performance can open up a new direction in combinatorial optimization problems. To fill the gap, this research proposes a unified framework of multiple meta-heuristics with the integration of switching strategy to emphasize the best performing meta-heuristic to continue the search process. This algorithm also includes a mixed selection strategy for the better initial population, a local search for exploitation property and a mechanism for ensuring population diversity over the evolutionary process. Finally, an extensive experiment is conducted by solving a wide range of standard JSSP instances.

Constituent algorithms

This section discusses the fundamentals of constituent algorithms, i.e., DE and PSO, of our proposed algorithms as a single-objective optimization problem under study.

Fundamental of a differential evolution (DE)

DE, which is one of the leading EAs in the literature, has already been applied in combinatorial optimization domains such as flow shop scheduling (Onwubolu & Davendra, 2006) and JSSP (Mahmud et al., 2021; Liu et al., 2009). This algorithm is prominent in the research community as it owns multiple variants, the ability to fast convergence, the simplicity to implement and more importantly, the capability of solving various optimization problems using the same parameter values (Sallam et al., 2020). Moreover, in literature, DE performed better than many other EAs such as GA for solving different optimization problems (Elsayed et al., 2012), and its mutation and crossover operators mainly control its performance (Ponsich & Coello, 2013). Although DE and GA are both variants of EAs, GA uses different solution updating mechanisms, including crossover, mutation, and elitism-preserving techniques. Crossover generates new solutions by exchanging genes among chromosomes, while mutation maintains diversity through small perturbation into the solutions to escape local optima, and elitism ensures the theorem “survival of the fittest”. Although GA has been successfully implemented to handle many complex problems, it faces more difficulties than DE for handling multi-modal problems (Sallam et al., 2020).

Since, in this study, DE is the one main component algorithm, this section focuses on explaining DE’s main steps. DE starts with initializing population and in the evolution process, includes mutation, crossover and selection. A minimization problem is considered in Eq. 1 to explained the DE, in which \(C_{max}\) denotes makespan of solution vector \(\mathbf {x}\) = {\(x_1,x_2,\ldots ,x_D\)}, D is number of decision variables, and \(x_{k}^{min}\) and \(x_{k}^{max}\) indicate lower and upper bound of \(k^{th}\) decision variable.

$$\begin{aligned}&\min C_{max}=\{f(\mathbf {x})|x_{k}^{min}\le x_{k}\le x_{k}^{max}, \nonumber \\&\forall {k=1,2,\ldots ,D}\} \end{aligned}$$
(1)

Population Initialization This algorithm initializes population of sized NP of D-dimension vector at generation \(G=0\), such as \(Pop^{o}\) = \([\mathbf {x}_{l,1}^{o}, \mathbf {x}_{l,2}^{o},\ldots ,\mathbf {x}_{l,k}^{o},\ldots \mathbf {x}_{l,D}^{o}]|l=1,2,\ldots ,NP\) . Each \(\mathbf {x}_{l}^{G}\) is known as target vector and the \(k^{th}\) position in \(l^{th}\) vector is filled using Eq. 2, where \(rand(0,1)\in \) [0,1] is a random probability distribution.

$$\begin{aligned}&x_{l,k}=x_{k}^{min}+rand(0,1)\times (x_{k}^{max}-x_{k}^{min}), \nonumber \\&\forall l=\{1,2,\ldots ,NP\}; \forall k=\{1,2,\ldots ,D\} \end{aligned}$$
(2)

Mutation In the evolution process, the first step is to generate a mutant vector (\(\mathbf {M}\)), where in variant DE/rand/1 as an example, three mutually exclusive candidate solutions (\(\mathbf {x}_{r_{1}}^G \ne \mathbf {x}_{r_{2}}^G \ne \mathbf {x}_{r_{3}}^G\)) are randomly picked from the current generation G and a mutant vector (\(\mathbf {M}_{l}^G\)) is generated by multiplying a scaling factor (F) to the differential vector of two target vectors (\(\mathbf {x}_{r_{2}}^G\) and \(\mathbf {x}_{r_{3}}^G\)) and resultant is added to the third target vector(\(\mathbf {x}_{r_{1}}^G\)) chosen, as presented in Eq. 3.

$$\begin{aligned} \mathbf {M}_{l}^G&=\mathbf {x}_{r_{1}}^G+F\times (\mathbf {x}_{r_{2}}^G-\mathbf {x}_{r_{3}}^G)&\end{aligned}$$
(3)

F is a scale factor that leads to convergence speed and the population diversity (Liu et al., 2009).

Crossover In general, crossover is applied once mutant vector is generated. It generates an offspring solution known as a trial vector (\(\mathbf {T}\)) and could generally be binomial and exponential. Each decision variable k is considered in binomial operator if a generated random number is less than the crossover rate (\(C_r\)), as presented in Eq. 4.

$$\begin{aligned} \mathbf {T}_{l,k}^{G}&= {\left\{ \begin{array}{ll} \ \mathbf {M}_{l,k}^G &{}\quad \text {if } rand(k)\le C_r\ \text {or}\ k=k_{rand}\\ \ \mathbf {x}_{l,k}^G &{}\quad \text {otherwise} \end{array}\right. }&\end{aligned}$$
(4)

In the above equation, \(rand(k)\in [0,1]\) and \(k_{rand} \in \, [1 , 2,3,\ldots ,D]\).

In exponential, a random integer d that defines the starting point taken from a target vector for crossover is chosen from [1, D] and another integer b is chosen from [dD]. It defines how many decision variables are selected from the donor solution. However, the trial vector is generated by Eq. 5, when both d and b are chosen.

$$\begin{aligned} \mathbf {T}_{l,k}^{G}&= {\left\{ \begin{array}{ll} \ \mathbf {M}_{l,k}^G &{} \quad \text {for } k=\langle d \rangle _D, \langle c+1 \rangle _D, \langle c+d-1\rangle _D\\ \ \mathbf {x}_{l,k}^G &{}\quad \text {otherwise} \end{array}\right. }&\end{aligned}$$
(5)

In equation, \(\langle c \rangle _D\) represents a function of modulus D with starting point of d (Sallam et al., 2020).

Selection The produced trial vector (\(\mathbf {T}_{l}^G\)) is evaluated and compared with the target vector (\(\mathbf {x}_{l}^G\)) based on the fitness value. The trial vector is assigned to the target vector (\(\mathbf {x}_{l}^{(G+1)}\)) for the upcoming generation if the fitness value is lower in the trial vector for the minimization problem; otherwise, the current trial vector is mathematically copied into the next generation, as presented in Eq. 6.

$$\begin{aligned} \mathbf {x}_{l}^{(G+1)}&= {\left\{ \begin{array}{ll} \ \mathbf {T}_{l}^G,&{} \quad \text {if } f(\mathbf {T}_{l}^G)< f(\mathbf {x}_{l}^G)\\ \mathbf {x}_{l}^G, &{} \quad \text {otherwise} \end{array}\right. }&\end{aligned}$$
(6)

Fundamental of a particle swarm optimization (PSO)

PSO is the population-based algorithm developed based on the social behaviour of the flocks of birds or the schools of fish, and it comprises a set of particles collectively called a swarm. A particle (which represents a job sequence) moves toward the personal best (\(\mathbf {pbest}\)) obtained so far by itself and the global best (\(\mathbf {gbest}\)) obtained by the swarm so far. The particle movement toward the \(\mathbf {pbest}\) and \(\mathbf {gbest}\) depends on the velocity, which is calculated using Eq. 7 while the position of a particle is subsequently updated using Eq. 8. \(\mathbf {v}_{l}^{G}\) stands for velocity of particle l in generation G and \(\mathbf {x}_{l}^{G}\) is the position of a particle l in generation G. w is the inertia and \(c_{1}\) and \(c_{2}\) control the movement of a particle towards the \(\mathbf {pbest}\) and \(\mathbf {gbest}\), respectively. The \(rand_{1}\) and \(rand_{2}\) are variables having value in between 0 and 1.

$$\begin{aligned} \mathbf {v}_{l}^{(G+1)}= & {} w\times \mathbf {v}_{l}^{G}+c_{1} \times rand_{1} \times (\mathbf {pbest}_{l}^G-\mathbf {x}_{l}^G)\nonumber \\&+c_{2}\times rand_{2}\times (\mathbf {gbest}_{l}^G-\mathbf {x}_{l}^G) \end{aligned}$$
(7)
$$\begin{aligned} \mathbf {x}_{l}^{(G+1)}= & {} \mathbf {x}_{l}^{G}+\mathbf {v}_{l}^{(G+1)} \end{aligned}$$
(8)

Because of the simplicity and effectiveness of PSO, its many variants have been proposed to solve a range of changeling optimization problems, including in continuous (Wang et al., 2018) and discrete optimization (Sha & Hsu, 2006). Moreover, the PSO ensures a supreme quality of solution convergence (Qian & Li, 2018) and diversity (Islam et al., 2021), which is essential for such NP-hard problems under study.

Problem description of a JSSP

A JSSP comprises a finite set \(J=\{J_{1}, J_{2},J_{2},\ldots ,J_{n}\}\) of n non-homogeneous jobs indexed by j and a finite set \(M=\{M_{1}, M_{2},M_{3},\ldots ,M_{m}\}\) of m machines indexed by i. Each job j comprises a finite set \(O_{j}=\{O_{j1}, O_{j2},\ldots ,O_{jm}\}\) of m tasks/operations indexed by t and can be processed by the m machines to complete the assigned work. Thus, sequencing and processing the n jobs to the m machines are the typical aim of any \(n\times m\) JSSP, considering different objective functions (Wang et al., 2018; Zhang et al., 2019). However, the sequence of operations on one job should be predefined. Each operation \(O_{jt}\in O_j\) can be processed by one of m machines and neither this operation nor the assigned machine will be interrupted by any other job until the current job’s work is finished. It means that no preemption is allowed and the assigned machine are considered available (Liu et al., 2020). Total operations are \(n\times m\), excluding the additional two dummy operations at the start and end with zero processing time. \(C_j=max\{C_{j1}, C_{j2},\ldots ,C_{jm}\}\), \(\forall {j}\) indicates job completion time and the makespan is calculated by setting \(C_{max}=max\{C_{1}, C_{2},C_{3},\ldots ,C_{n}\}\).

The constraints which must be followed while building a JSSP are as follows:

  • A job cannot be processed more than once on the same machine.

  • A job must satisfy its precedence relations, if any.

  • Each machine can process only one job at a time.

  • A job cannot be processed in multiple machines at the same time.

  • Release time and due dates are not specified.

The generated solution of this problem is a schedule, which is a sequence of operations to machines with deterministic processing times. This complex combinatorial NP-hard problem is optimized in this study, minimizing the makespan value.

Proposed algorithms

As previously described, since no single algorithm and/or search strategy can solve a wide range of optimization problems, multi-operator and/or multi-method based approaches have been emerging to handle this difficulty. In this work, MODE and PSO algorithms are proposed and implemented in a single algorithmic framework with distinct multi-populated strategies to solve a wide range of JSSPs. This section presents the proposed algorithms (i.e., sHEA, rHEA and bHEA).

bHEA

The idea of simultaneous multi-search is implemented in the proposed hybrid optimization framework bHEA, in which the equally divided sub-populations evolve in MODE and PSO and exchange information to strengthen the searching power (Cruz-Chávez et al., 2019). The quality of the evolution process of both algorithms are measured, and if one algorithm performs better than another, the best performing algorithm continues the search process until both become under-performing. Then, the sub-populations are switched between these two algorithms after enhancing the quality of \(\mathbf {pbest}\) and \(\mathbf {gbest}\) through evolving in a local search. The Algorithm 1 presents the main steps of the proposed bHEA.

figure a

Initially, a population of \(4\times NP\) candidate solutions are generated using the Latin Hypercube Design (LHD), as it efficiently covers the search space (Sallam et al., 2017), as described in the Sect. 5.3. Then, the generated continuous random solution strings are converted to integer strings as explained in the Sect. 5.4 and then, a population of size NP is chosen with giving priority to both diversity and fitness value in the selection procedure, known as MSS, to ensure better-searching experience and to avoid premature convergence (see the Sect. 5.5). The population is then divided into two sub-populations, i.e., SP\(_{1}\) and SP\(_{2}\), to receive benefits from multi-method. The SP\(_{1}\) is further grouped into three, i.e., SSP\(_{op|\{op=1,2,3\}}\), and searches in parallel with chosen operators of DE. The choices are made meticulously to ensure diversity and convergence simultaneously, as illustrated in Sect. 5.6.1. In order to avoid premature convergence, a regrouping is introduced in each generation (Tasgetiren & Suganthan, 2006). The performance of each evolution method is measured based on the capability of updating \(\mathbf {gbest}\) (line 18 and 19 in Algorithm 1 and Eq. 20). If either MODE or PSO fails to improve the \(\mathbf {gbest}\) to a specified number of times (PreMax), the corresponding evolution process is halted until the same scenario arises in another one. Then, a proportion of \(\mathbf {pbest}\) is sent to a local search to overcome the problem created in MODE-PSO since local search is more capable of producing solutions around local optima than MODE and PSO. It works by creating neighbour solutions of the candidate pool. The \(\mathbf {pbest}\) and \(\mathbf {gbest}\) are updated in every generation in such a way that each of \(\mathbf {pbest}\) will have a distinct makespan and solution order to assure better evolution of PSO, described in Algorithm 5. Whenever the local search terminates, the sub-populations of MODE and PSO switch between them. The population switching between algorithms solely depends on the performance of the search process, and therefore it is named PDSM. This HEA is named bHEA since it is designed based on the bi-population concept. The bHEA terminates when the current number of fitness evaluations (FES) reaches the maximum number of fitness evaluations (\(MAX_{FES}\)).

rHEA and sHEA

Population is commonly believed to be one of the most crucial features of EAs and enables explorations to different parts of the search space via a set of individuals. Thus, dividing the population into smaller sub-populations may lose search uniformity and exploration. However, having a larger population size may not always be helpful (Chen et al., 2012). Thus, this study, as shown in Algorithm 2, integrates both scenarios in a single algorithmic framework through a population switching strategy. If any algorithm shows an adequate performance over the generations, no other algorithms evolve to solve the problem. Otherwise, the population among the algorithms switches depending on the performance measure.

figure b

In Algorithm 2, the randomly generated population is filtered based on the fitness value and the diversity of a schedule by employing a developed MSS, as described in Sect. 5.5. The initial population is then divided into three smaller sub-populations, i.e., SP\(_{op|\{op=1,2,3\}}\) of each sized \(\frac{NP}{3}\) before evolving in the MODE, in which each sub-population uses their mutation and the defined crossover. Over the evolution process, the algorithm may perform poorly, and it is inappropriate to carry out the search process (Elsayed et al., 2011). Thus, the population is switched from an under-performing algorithm to another, emphasizing the better performing algorithm. A performance indicator that counts the inability of the current search process to improve the global best (\(\mathbf {gbest}\)) solution over the evolution process is designed to control the switching decision and compared with the PreMax. The algorithm with low performance quickly reaches PreMax, and then the population switches to another algorithm. If the MODE can satisfy the population switching condition, a certain proportion of \(\mathbf {pbest}\) is sent to a local search. Similarly, the local search performance is measured and compared with Tabu termination criteria. The inferior performance of the local search allows the entire population to select the PSO that evolves using its operator and follows the same termination criteria of the MODE. The population switching between algorithms solely depends on the performance of the search process. This HEA is called sHEA as it follows a specific sequence. Instead of switching population between algorithms sequentially, the implemented PDSM with a random algorithm selection is developed, named rHEA and highlighted in Algorithm 2.

Population initialization

The LHD is employed to generate an initial distinct continuous-based population as it is capable of producing more efficiently scatter points across the solution space (Sallam et al., 2017).

$$\begin{aligned}&x_{l,k}=x_{k}^{min}+(x_{k}^{max}-x_{k}^{min})\times lhd(1,NP) \nonumber \\&\forall {l}=\{1,2,\ldots ,NP\}, \forall {k}=\{1,2,\ldots ,D(=n\times m)\} \end{aligned}$$
(9)

lhd in Eq. 9 is a function of the LHD to produce random real numbers.

Solution representation, encoding and decoding schemes

Solution representation has a significant impact on designing algorithms and achieving the highest degree of performance (Cheng et al., 2016). Ponnambalam et al. (2001) compared among several solution representations (e.g. operation-based representation, job-based representation, preference-based representation, and priority rule-based representation) and concluded that the preference-based representation provides the best solution quality, which is therefore implemented in this study. An example of \(n \times m\) JSSP is considered to illustrate the representation, where a string consists of m sub-strings, each for one machine. Each sub-string length is n, and each number in a sub-string identifies an operation that has to be processed on the relevant machine. The sub-string only prescribed the preference list of the corresponding machine, not the actual operation sequence.

Fig. 1
figure 1

The mapping of the algorithm from continuous to discrete

To illustrate the preference-based representation with encoding, a random string is [(0.92, 0.75, 0.25), (0.75, 0.82, 0.53), (0.44, 0.62, 0.55)], where each string consists of \(m=3\) sub-string of dimension \(1\times n\) (\(n=3\)), as depicted in Fig. 1. The values in each sub-string are real numbers generated at random in interval [0,1]. These real numbers are then transformed into an integer series based on ranked-order-value (ROV) (Zhao et al., 2018), which results in integer string [(1, 2, 3), (2, 1, 3), (2, 3, 1)] known as the preference list. The first sub-string (1, 2, 3) is the preference sequence of machine \(M_{1}\), the second sub-string (2, 1, 3) is for machine \(M_{2}\), and the third sub-string (2, 3, 1) is for machine \(M_{3}\). An active schedule, where no operation can be drawn into without delaying any other operations (Ahmadian et al., 2021), is deduced from this preference list. This schedule can only provide the optimal and feasible solution (Pongchairerks, 2019). Therefore, the most popular algorithm, known as G&T (Giffler & Thompson, 1960), is employed to decode a preference list of jobs to an active schedule. The Algorithm 3 explains the procedure and the corresponding notations are as follows:

(ij): the operation of job j that need to be processed on machine i,

S: the partial schedule that contains scheduled operations,

\(\text{\O}mega \): the set of operations without predecessors,

\(S_{ij}\): the earliest start time at which \((i,j)\in \text{\O}mega \) could be started,

\(P_{ij}\): the processing time of (ij),

\(C_{ij}\): the earliest time at which \((i,j)\in \text{\O}mega \) could be completed:\(C_{ij}\) = \(S_{ij}+P_{ij}\)

figure c

The decoding algorithm is applied on the exemplified preference list with the JSSP data set of Table 1, and thus an active schedule with makespan 22 is achieved, as depicted in Fig. 2i (each step is explained from Fig. 2a–i).

Table 1 Data set for 3\(\times \)3 JSSP
Fig. 2
figure 2

Decoding to an active schedule of 3\(\times \)3 JSSP

Mixed selection strategy (MSS)

The selection operator in each algorithm is responsible for the searching power of the next generation. Most of the algorithms prioritize the optimization of the fitness value, i.e. the probability of selecting an individual is proportional to the fitness value (Ren & Wang, 2012). This phenomenon leads to faster premature convergence as a result of filling the population with similar individuals (Goldberg, 2006). The MSS based on fitness value and similarity index is proposed to avoid premature convergence. Let us assume two solution orders such as \(\mathbf {S}_{1}^G\) =[(1, 3, 2), (1, 2, 3), (2, 1, 3)] and \(\mathbf {S}_{2}^G\) =[(2, 3, 1), (1, 2, 3), (1, 3, 2)]. The first sub-string of both solution orders are [(1, 3, 2)] and [(2, 3, 1)], which shows that only the second position has a similar operation and the remainder are different. This scenario is defined by the positional similarity degree, and the sum of the similarity degree (SPS) is 1 \((0+1+0)\). The SPS for the rest of the two sub-strings will be 3 and 0, respectively. Therefore, the co-positional similarity degree between \(\mathbf {S}_{1}^G\) and \(\mathbf {S}_{2}^G\) denoted as \(S^G(\mathbf {S}_{1}^G,\mathbf {S}_{2}^G)\) will be 0.44 (\(\frac{1+3+0}{3*3}\)), where the number of jobs (n) and the number of machines (m) are both 3. If the population comprises of NP solution orders, then the concentration value of the \(\mathbf {S}_{1}^G\) is calculated based on Eq. 10.

$$\begin{aligned} c(\mathbf {S}_{1}^G)&=\sum _{l=1}^{\ (NP-1)}\frac{S(\mathbf {S}_{1}^G,\mathbf {S}_{l}^G)}{(NP-1)}&\end{aligned}$$
(10)

The goodness value \(G_s(\mathbf {S}_{1}^G)\) which consists of concentration value and the fitness value \(f(\mathbf {S}_{1}^G)\) of the corresponding solution order is calculated using Eq. 11. The fitness value is a makespan that is calculated using G&T algorithm, as explained in Sect. 5.4. The higher the value of \(G_s(\mathbf {S}_{1}^G)\), the better diversification and fitness value can be achieved. W is for priority between the diversity and fitness value.

$$\begin{aligned} G_s(\mathbf {S}_{1}^G)= & {} W\times \frac{max(f(\mathbf {S}^G))-f(\mathbf {S}_{1}^G)}{max(f(\mathbf {S}^G))-min(f(\mathbf {S}^G))}\nonumber \\&+ (1-W)\times (1-c(\mathbf {S}_{1}^G)) \end{aligned}$$
(11)

Evolution process

The strategies employed in the HEAs (i.e., sHEA, rHEA and bHEA) are discussed in this section.

Multi-operator differential evolution (MODE)

DE is conventionally represented by DE/a/b/c, in which DE is for differential evolution, a denotes the vector to be perturbed, b stands for the number of differential for the perturbation of a, and c stands for the type of crossover to be used such as bin: binomial, exp: exponential. The evolutionary process of this algorithm comprises mutation, crossover and selection. This algorithm has eight types of mutation operators presented from Eqs. 1219, which have individual traits (Liu et al., 2020). The three mutation operators which are selected in this MODE are “DE/rand/1”, “DE/best/1” and “DE/current-to-pbest/1” shown in Eqs. 12, 13, and 19, respectively. “DE/rand/1” is recognized to have the best exploration property (Zhao et al., 2016) whereas, “DE/best/1” has the best exploitation property (Wisittipanich & Kachitvichyanukul, 2012). The balance between these two characteristics in the evolutionary process is ensured by “DE/current-to-pbest/1” (Liu et al., 2020). These three mutation operators are combined in DE and known as MODE in this study.

$$\begin{aligned} \text {DE/rand/1:} \mathbf {M}_{l}^G= & {} \mathbf {x}_{r_{1}}^G+F\times (\mathbf {x}_{r_{2}}^G-\mathbf {x}_{r_{3}}^G) \end{aligned}$$
(12)
$$\begin{aligned} \text {DE/best/1:} \quad \mathbf {M}_{l}^G= & {} \mathbf {x}_{best}^G+F(\mathbf {x}_{r_{1}}^G-\mathbf {x}_{r_{2}}^G) \end{aligned}$$
(13)
$$\begin{aligned} \text {DE/current/1:}\quad \mathbf {M}_{l}^G= & {} \mathbf {x}_{l}^G+F(\mathbf {x}_{r_{1}}^G-\mathbf {x}_{r_{2}}^G) \end{aligned}$$
(14)
$$\begin{aligned} \text {DE/current-to-best/1:}\quad \mathbf {M}_{l}^G= & {} \mathbf {x}_{l}^G+F(\mathbf {x}_{best}^G-\mathbf {x}_{l}^G)\nonumber \\&+F(\mathbf {x}_{r_{1}}^G-\mathbf {x}_{r_{2}}^G) \end{aligned}$$
(15)
$$\begin{aligned} \text {DE/rand/2:}\quad \mathbf {M}_{l}^G= & {} \mathbf {x}_{r_{1}}^G+F(\mathbf {x}_{r_{2}}^G-\mathbf {x}_{r_{3}}^G)\nonumber \\&+F(\mathbf {x}_{r_{4}}^G-\mathbf {x}_{r_{5}}^G) \end{aligned}$$
(16)
$$\begin{aligned} \text {DE/best/2:}\quad \mathbf {M}_{l}^G= & {} \mathbf {x}_{best}^G+F(\mathbf {x}_{r_{1}}^G-\mathbf {x}_{r_{2}}^G)\nonumber \\&+F(\mathbf {x}_{r_{3}}^G-\mathbf {x}_{r_{4}}^G) \end{aligned}$$
(17)
$$\begin{aligned} \text {DE/current-to-rand/1:}\quad \mathbf {M}_{l}^G= & {} \mathbf {x}_{l}^G+F(\mathbf {x}_{r_{1}}^G-\mathbf {x}_{l}^G)\nonumber \\&+F(\mathbf {x}_{r_{2}}^G-\mathbf {x}_{r_{3}}^G) \end{aligned}$$
(18)
$$\begin{aligned} \text {DE/current-to-pbest/1:}\quad \mathbf {M}_{l}^G= & {} \mathbf {x}_{l}^G+F(\mathbf {x}^{G}_{best,p}-\mathbf {x}_{l}^G)\nonumber \\&+F(\mathbf {x}_{r_{1}}^G-\tilde{\mathbf {x}}_{r_{2}}^G) \end{aligned}$$
(19)

It is noted that in Eq. 12, \(\mathbf {x}_{r_{1}}\ne \mathbf {x}_{r_{2}}\ne \mathbf {x}_{r_{3}}\) are randomly selected from the current population and these operators are also different from the target vector \(\mathbf {x}_{l}^G\). \(\mathbf {x}^{G}_{best,p}\) in Eq. 13 is chosen from the top \(15\%\) of individuals belonged to the current population and \({\tilde{\mathbf {x}}}_{r_{2}}^G\) in Eq. 19 is taken from the union of current population and archive of pbest. F is scale factor, which leads to convergence speed and the population diversity (Sallam et al., 2020). Moreover, the trial vector \(\mathbf {T}_{l}^G\) is obtained by recombining a mutant vector \(\mathbf {M}_{l}^G\) and a target vector \(\mathbf {x}_{l}^G\), as shown in Eq. 4. The produced trial vector is evaluated and compared with the target vector based on the fitness value. The trial vector is assigned to the target vector if the fitness value is lower in the trial vector for the minimization problem, as depicted in Eq. 6.

The performance of this algorithm greatly depends on its search operators (mutation and crossover) and control parameters. It is claimed that having a higher value of \(C_r\) is the most suitable, and the setting of F value is quite tricky for all instances (Ponsich & Coello, 2013). However, these parameters are tuned in Sect. 6.2.

Particle swarm optimization (PSO)

Although the PSO algorithm is designed for a continuous search space, this paper employs it in a discrete combinatorial optimization domain by converting the continuous-based population to discrete. The preference-based representation illustrates particles, as discussed in Sect. 5.4, while particle movement is described in terms of a swap operation. For instance, if \(\mathbf {v}_{lik}^{G}=0\) for an operation at position k on machine or sub-string i in a solution order \(\mathbf {S}_{lik}^G\) of a swarm, the operation can move to the \(\mathbf {pbest}\) or \(\mathbf {gbest}\) depending on the value of \(c_{1}\) and \(c_{2}\) respectively. The steps of particle movements are illustrated in Algorithm 4.

figure d

If the \(\mathbf {pbest}\) of all particles becomes the same over the generations, the evolutionary process will become trapped in local optima, and therefore a diversity check mechanism was developed to keep the makespan of \(\mathbf {pbest}\) solutions different (Sha & Hsu, 2006). Besides keeping the makespan of \(\mathbf {pbest}\) different, it is also significant to keep the schedules different to reduce unnecessary movements. Thus, the \(\mathbf {pbest}\) and \(\mathbf {gbest}\) are updated in every generation considering both makespan and their corresponding solution order using Algorithm 5.

figure e

Local search

The local search capability of population-based algorithms such as PSO and DE are sufficiently lower than the global search capability, and thus these algorithms suffer from premature convergence and are easily trapped in local optima (Gao et al., 2019; Lin et al., 2010). Thus, a local search is implemented to improve further the makespan value of a schedule obtained from the MODE-PSO combination. TS has been employed as it can generate solutions around the optima (Peng et al., 2015). This technique improves the solution quality iteratively and escapes the evolution of being trapped in local optima. In this study, the \(\mathbf {pbest}\) solutions, which are found from the MODE and PSO, are passed onto the TS when the evolutionary process in MODE-PSO is incapable of updating the best solution obtained so far. It improves the \(\mathbf {pbest}\) and \(\mathbf {gbest}\), which directly enhance the searching capability of MODE-PSO. The main steps of the implemented local search are explained in Algorithm 6.

figure f

Neighbourhood structure A small perturbation in the \(\mathbf {pbest}\) can produce a set of solutions known as neighbour. In this study, these neighbour solutions are generated by implementing the N7 neighbourhood structure (Zhang et al., 2007), depicted in Fig. 3. In this approach, the randomly selected critical path, as a solution that could have multiple critical paths, is divided into several blocks, making moves in each block for new neighbours.

Fig. 3
figure 3

The outline of the N7 neighbourhood

Tabu list Tabu list helps the algorithm to resist visiting the same solution recurrently, avoiding becoming trapped in local optima. Instead of storing solution attributes such as makespan, the sequence of operations and their corresponding position in a machine are stored in a Tabu list for a certain number of iterations, called Tabu tenure. The Tabu tenure dynamically adjusts based on the problem size of JSSP instances (see Table 3). The aspiration criterion is allowed to accept the best solution obtained so far even after the solution is in the Tabu list.

Performance-driven switching mechanism (PDSM)

As previously mentioned, in the evolution process, three evolution strategies such as MODE, PSO and local search (LS) are used in a single algorithmic framework. So, the proposed algorithms dynamically choose the most appropriate strategy at a different level of evolution depending on the performance of the evolution strategies, named PDSM. It offers two benefits:(1) if any algorithm performs adequately over the generations, no other algorithms evolve to solve the problem. It will increase the flexibility in the recently emerged multi-method approaches to solve complex problems in which a single method is incapable; and (2) Over the evolution process, an algorithm may perform poorly, and it is inappropriate to carry out the search process. This approach overcomes the drawback. Equation 20 controls the switching decision. If any algorithm performs poorly, \(Count_{alg|\{alg=1,2.3\}}\) will reach quickly to PreMax or Tabu termination, and the population switches to another search strategy.

$$\begin{aligned} Count_{alg|\{alg=1,2.3\}}= {\left\{ \begin{array}{ll} \ Count_{alg|\{alg=1,2.3\}}+1 &{} \quad \text {if } f(\mathbf {gbest^{G}})<f(\mathbf {x}_{l}^{G}), \forall {l=1,2,3,\ldots ,NP}\\ \ Count_{alg|\{alg=1,2.3\}} &{} \quad \text {otherwise} \end{array}\right. } \end{aligned}$$
(20)

Experimental design and result analysis

The section illustrates the computing environment, parameter analysis to achieve the best performance of the proposed algorithms, and detailed result analysis.

Experimental setting

The computing environment used in this research is Intel(R) Core (TM) i7-3770 CPU @3.40GHz with RAM 32.0 GB in Windows 10, and the proposed algorithms (i.e., sHEA, rHEA and bHEA) are coded in MATLAB R2020b. The feasibility and effectiveness of these algorithms are investigated through a set of comparative experiments with the TS, DE and PSO and against the 26 popular algorithms by solving 130 standard instances of the JSSP, including rectangular problems and the most difficult square problems (Cruz-Chávez et al., 2019). The selected instances include: 3 from Fisher and Thompson (FT) (Fisher, 1963); 40 from Lawrence (LA) (Lawrence, 1984); 3 from Applegate and Cook (ORB) (Applegate & Cook, 1991); 4 from Adams, Balas and Zawack (ABZ) (Adams et al., 1988); 80 from Taillard (TA) (Taillard, 1993), considering problems with varied sizes. Each instance is executed 30 times independently, and the corresponding results are recorded.

Parameter setting

Parameter calibration is essential to achieve the best performance of an algorithm. This section discusses the parameter tuning of the developed algorithms: the sHEA, rHEA and bHEA. The algorithms rHEA and sHEA are distinct only by population switching decision, as illustrated in Sect. 5.2, which does not imply parameter calibration, and therefore, parameter tuning is performed similarly in each algorithm. However, bHEA, unlike rHEA and sHEA, is designed with a bi-population structure where the initial population is divided into two sub-populations and executed in MODE and PSO independently, as described in Sect. 5.1. Thus, the parameters NP, MaxGen(=\(\frac{MAX_{FES}}{NP}\)), and PreMax may have a different impact on the bi-population structure of bHEA compared to rHEA and sHEA, and therefore, calibration is performed separately for bHEA.

The parameters of MSS, PSO and MODE, which are designed in a similar way for sHEA, rHEA and bHEA, should be appropriately tuned in the first step, which is conducted by choosing a complex square-sized instance LA40. The chosen parameters with their levels are depicted in Table 2. Although the parametric analysis can be performed using trial-and-error technique (Goli et al., 2021), this study adopted the orthogonal array, \(L_{27}3^{5}\), to design the experimental runs as it reduces the computational complexity through lessening the number of experiments (Rahman et al., 2020). However, each treatment is executed for 30 times, keeping the NP, MaxGen and PreMax constant. These parameters significantly impact achieving preferred makespan while the algorithm solves instances. The analyzed results are depicted in Table 2 as a response for means. Figure 4a also shows that the makespan decreases with increasing \(C_r\), and therefore, further conducted experiments shows average makespan of 1260.4 and 1260.8 for \(C_r\) value \(rand_{0.81-1.0}\) and \(rand_{0.85-1.0}\), respectively. Thus, the statistical inference on the optimal setting of these parameters is W = 0.5, w = 1, \(c_{1}\) = 0.5, \(c_{2}\) = 0.5, \(C_r\) = \(rand_{0.81-1.00}\), F = \(rand_{0.91-1.30}\). The values \(rand_{0.81-1.00}\) and \(rand_{0.91-1.30}\) indicate that the parameters adopt randomly generated values in the given range following a uniform distribution.

Table 2 The parameters tuning of the MSS-MODE-PSO

The parameters of the integrated frameworks ( i.e., rHEA and sHEA) are also tuned. The five significant parameters of the rHEA and sHEA are set into three levels, as reported in Table 3, and designed using the orthogonal array, \(L_{27}3^{5}\). Each of the 27 treatments is run 30 times, and the average makespan of LA40 is recorded. The analysed results, including the delta value and rank of parameters, are reported in the above table. Following the table, MaxGen is ranked first, whereas, Tabu termination (i.e., local search termination criteria) is ranked the second most crucial factor. The PreMax (i.e., terminating PSO and MODE at a defined level if the best value is not updated) is ranked third. Tabu tenure and NP are the less significant parameters ranked fourth and fifth, respectively. However, more analysis is needed as makespan improves with increasing NP and MaxGen and with decreasing PreMax as depicted in Fig. 4b, and these three parameters significantly influence the computational expenses. However, Tabu tenure is excluded to lessen the complexity as it is a less significant parameter. In addition, this analysis offers to set optimal parameters for bHEA, as illustrated previously.

Table 3 Parameters tuning of rHEA and sHEA
Fig. 4
figure 4

Main effects plot for means

The further impact of NP, MaxGen and PreMax on the performance of rHEA and sHEA are determined through sensitivity analysis. This experiment uses both rectangular- and square-sized instances (e.g., LA17, LA30 and LA40). For each selected instance, parameters NP, MaxGen and PreMax are varied with possible values NP = {50, 75, 100, 125}; MaxGen = {1000, 1500, 2000, 2500}, and PreMax = {20, 16, 12, 10}. For each parameter combination, 10 independent runs are performed, and the relative percentage error (RPE) of each algorithm’s best solution (EBS) found and RPE of the mean value of the makespan \(f_{mean}\) are calculated based on the reference makespan known as the best known solution (BKS), as depicted in Fig. 5a. The value of RPE and mean RPE are calculated by Eq. 21. RPE = 0 indicates that the instance is solved optimally. From the Fig. 5a, the algorithms sHEA and rHEA show almost consistent performance from parameter combinations 3 to 7 for each of the instances, and therefore, NP, MaxGen and PreMax are set to 100, 1000 and 20, respectively. These settings minimize the number of fitness evaluation meant to less computational expenses. For bHEA, a similar experiment is performed as depicted in Fig. 5b and the performance for each experimental setup and for each instance shows that parameter combinations 3 to 8 offer relatively consistent and better performance. Therefore, the parameter combination 3, which is similar to sHEA and rHEA, is recommended for bHEA for fair comparison. According to this parametric analysis, the optimal parameter setting for sHEA, rHEA and bHEA are as follows: NP = 100; MaxGen = 1000; PreMax = 20, Tabu termination=15, and Tabu tenure= \(L_{min}= [10 + \frac{m}{n}]\) to \(L_{max}= 1.4L_{min}\).

Fig. 5
figure 5

Optimal parameter setting through sensitivity analysis

Comparison with constituent algorithms

This section aims to validate the hypothesis that integrating multiple algorithms’ capabilities into a single algorithmic framework can find the best possible solutions. Thus, several experiments are conducted to verify the hypothesis, implementing the sHEA, rHEA, bHEA, and their constituent algorithms, such as PSO, DE and TS, to solve the computationally intractable NP-hard JSSPs. The statistical convergence curves of the two varied-sized instances (LA36 (\(15\times 15\)) and TA20 (\(20\times 20\))) are depicted in Fig. 6.

Fig. 6
figure 6

Convergence curve of the six implemented algorithms

The convergence of the original PSO and DE is faster at the very beginning for both LA36 and TA21 instances, as illustrated in Fig. 6a and b, respectively. However, the solution is not improved later due to the greedy nature of DE and the poor population quality. TS inspired local search has experienced a fast convergence rate over the iterations and found near-optimal solutions for both instances by avoiding becoming trapped in local optima. However, the single point searching process is computationally expensive. Thus, TS is employed in hybrid MODE-PSO to improve the local search capability. The integrated approaches (sHEA, rHEA and bHEA), where PSO, MODE and TS are merged with the help of the switching strategy, shows that the searching power of the evolutionary process is improved substantially in terms of global search capability at the early stage and local search capability at the later stage, leading to higher solution quality with mature convergence.

Performance of the proposed HEAs (i.e., sHEA, rHEA and rHEA) against original TS, DE and PSO is further evaluated by solving 12 standard JSSP instances. The recorded results are reported in Table 4, where the best performance of each instance is marked as bold. The Table includes instances name, their size, BKS, EBS, RPE, BKS of each algorithm found (BKS found) and the ratio of the number of BKS to the number of benchmark instances solved (NIS) (Ratio(%)). RPE(%) and Ratio(%) are calculated by Eqs. 21 and 22, respectively.

Table 4 Comparison among implemented algorithms (PSO, DE, TS, and HEAs) for JSSP
Table 5 Improvement comparison among implemented algorithms (PSO, DE, TS, and HEAs)

The Table shows that all the algorithms implemented have solved 12 instances with different sizes from \(6\times 6\) JSSP to \(20\times 20\) JSSP. Out of the 12 instances solved, TS, DE and PSO obtain 8, 7 and 8 BKS, respectively, whereas sHEA, rHEA and bHEA obtain 10, 10 and 8 BKS, respectively. Although the implemented algorithms show similar performance for small-sized instances, a significant improvement of hybrid approaches is noticed for large-sized problems. Moreover, the improvement comparison between the HEAs and the other algorithms (i.e., original TS, DE and PSO) are given in Table 5 that presents NIS, average RPE (aRPE(%)) of both other algorithms (OA(%)) and proposed algorithms calculated by Equation 23, and relative performance improvement (RPI(%)) calculated by Eq. 24. If RPI(%) is negative, the proposed algorithm can not outperform the other algorithms. Following RPI, bHEA dominates significantly compared to PSO by \(87.97\%\), DE by \(88.66\%\) and TS by \(82.63\%\), whereas rHEA shows superior performance to PSO by \(96.55\%\), DE by \(96.75\%\) and TS by \(95.02\%\). The sHEA dominates PSO, DE and TS by \(96.47\%\), \(96.67\%\) and \(94.90\%\), respectively. Thus, the considered hypothesis is true, i.e., integrating multiple algorithms’ capabilities into a single algorithmic framework can find the best possible solutions.

$$\begin{aligned} RPE(\%)= & {} \frac{EBS -BKS}{BKS}\times 100 \end{aligned}$$
(21)
$$\begin{aligned} Ratio(\%)= & {} \frac{BKS found}{NIS}\times 100 \end{aligned}$$
(22)
$$\begin{aligned} aRPE(\%)= & {} \frac{\sum RPE}{NIS} \end{aligned}$$
(23)
$$\begin{aligned} RPI(\%)= & {} \frac{OA-aRPE}{OA}\times 100 \end{aligned}$$
(24)
Table 6 Comparison of the 13 algorithms with HEAs for JSSP

Comparison with well-know algorithms

The section assesses the effectiveness of the sHEA, rHEA and bHEA against existing algorithms in the literature by solving small-sized and large-sized JSSP instances. First, 13 popular algorithms in the literature are considered. These algorithms include widely accepted variants of DE, such as DE with strategy switching (DESS) (Wisittipanich & Kachitvichyanukul, 2012), hybrid DE and estimation of distribution algorithm with neighbourhood search known as (NS-HDE/EDA) (Zhao et al., 2016), and whale optimization algorithm enhanced with lévy flight and DE (WOA-LFDE) (Liu et al., 2020). In addition to DE variants, some other considered algorithms are filter-and-fan approach (F&F) (Rego & Duarte, 2009), TLBO (Baykasoğlu et al., 2014), chemotaxis-enhanced bacterial foraging algorithm (CEBFO) (Zhao et al., 2015), parallel artificial bee colony algorithm (pABC) (Asadzadeh, 2016), GWO (Jiang & Zhang, 2018) and parallel bat algorithm (PBA) (Dao et al., 2018). The developed algorithms are further evaluated by comparing with the variants of GA, such as memetic algorithm (MA) (Gao et al., 2011), a local search GA (aLSGA) (Asadzadeh, 2015), modified GA (MGA)(Thammano & Teekeng, 2015) and new island model GA (NIMGA) (Kurdi, 2016).

Table 7 Improvement comparison of HEAs with 13 algorithms

The reported optimization results of these 13 algorithms and the optimization results of the proposed algorithms for solving 50 benchmark instances of JSSP are presented in Table 6. This table reports the average makespan of 30 independent runs of the developed algorithms to show the repeatability and mark the best performance for each instance. At the bottom of Table 6, NIS, BKS found, Ratio (%) are reported for each algorithm. BKS to NIS ratio for sHEA, rHEA and bHEA are \(92.00\%\), \(90.00\%\), \(86.00\%\), respectively, while the highest ratio of finding BKS to NIS is \(94.74\%\) for Asadzadeh (2016); however, the authors have only solved a smaller set of 38 problems. The second best BKS to NIS is \(92.50\%\) for Liu et al. (2020) who have solved 44 of 50 problems. However, sHEA, rHEA and bHEA can optimally solve 46, 45 and 43 out of 50 problems, respectively, and the instances that can not be solved to their BKS values are widely considered hard, and a few researchers were able to solve.

Table 8 Comparison of HEAs with another 9 approaches based on aRPE
Table 9 Comparison of the 4 algorithms with the proposed algorithms for JSSP (Taillard’s instances)

To explicitly demonstrate the proposed algorithms’ performance, the respective RPI(%) for proposed algorithms is calculated by only considering respective EBS for reported instances for each algorithm. The RPI(%), as reported in Table 7, clearly shows that proposed algorithms (i.e., sHEA, rHEA and rHEA) outperform the 13 published approaches, specially pABC (Asadzadeh, 2016) and WOA-LFDE (Liu et al., 2020), which show best-performing algorithms in terms of ratio of finding BKS to NIS.

To reinforce the competitiveness of the HEAs (i.e., sHEA, rHEA and bHEA), additional 9 published approaches are considered. These algorithms are on the basis of heuristics approach such as shifting bottleneck heuristic (SB) (Adams et al., 1988), cultural algorithm (CULT) (Cortés Rivera et al., 2007); adaptive search approach such as greedy randomized search procedure (GRASP) (Binato et al., 2002), parallel version of GRASP (parallel GRASP) (Aiex et al., 2003); strong memetic algorithms (MAs) such as MA\(_1\) (Hasan et al., 2009), MA\(_2\) (Raeesi & Kobti, 2011), MA\(_3\) (Raeesi N & Kobti, 2012); hybrid approach such as PSO with GA operators (GPSO) (Abdel-Kader, 2018); two-level meta-heuristics approach such as upper-level algorithm and lower-level algorithm (UPLA) (Pongchairerks, 2019). The comparison of HEAs with these algorithms are performed for solving 43 problems, and aRPE of the equal-sized problem is reported in Table 8. Boldface indicates the better performance of the algorithms. The proposed approaches also show competitive results over the 9 reported algorithms.

Zhao et al. (2016) tested their proposed NS-HDE/EDA against standalone PSO and GA by solving the 80 Tallard’s instances of JSSP. These results are adopted for testing our proposed algorithms against these baseline algorithms, including a multi-operator communication-based DE with sequential TS (MCDE/TS) algorithm (Mahmud et al., 2021). Table 9 shows that the proposed algorithms, specially sHEA and rHEA, outperform the listed algorithms.

Statistical analysis

The section identifies the statistical significance among the varied population structured-based HEAs (i.e., sHEA, rHEA and bHEA) before comparing between the best performing and existing algorithms. Since according to the Table 6 and Equation 23, the aRPE of the sHEA, rHEA and bHEA for solving 50 problems are 0.044%, 0.040% and 0.063%, respectively, which are pretty similar, no firm evidence can justify the supremacy of one population structure over another. Thus, two non-parametric tests (Friedman test and Wilcoxon sign-test) are first conducted to reinforce the decision, reflecting any statistical distinction in them. The Friedman test results are reported in Table 10, explicitly reflecting through the mean rank of the sHEA (1.84), rHEA (1.96) and bHEA (2.21) that the sHEA dominates bHEA and rHEA (i.e., having lower mean rank compared to the other two). The Wilcoxon signed test is also implemented to analyse the differences between the best and near-best results according to Table 10. The results at \(5\%\) confidence level are reported in Table 11. The obtained Z values are: \(Z_{sHEA-rHEA}\) = \(-0.946\) and \(Z_{sHEA-bHEA}\) = \(-3.063\) and the p-values of these pairs are 0.344 and 0.002, respectively. Since the p-value of pair (sHEA-bHEA) is lower than 0.05, it illustrates the supremacy of sHEA over bHEA. However, no significant statistical difference can be found between sHEA and rHEA (since the p-value is higher than 0.05).

To further compare the robustness among these three algorithms to solve the varied-sized instances, a descriptive statistical analysis can be performed through a boxplot that is adopted to show the five-number summary (i.e., minimum, first quartile, second quartile, third quartile and maximum). The reported average makespan results of 30 replications in Table 6 are taken as samples for this test. The five-number summary of RPE (=\(\frac{Avg.-BKS}{BKS}\)) in Fig. 7 shows that the minimum and the first and second quartiles of sHEA coincide at 0 RPE, i.e., the algorithm can find the BKS in every replication for \(50\%\) of the varied-sized instances considered (standard deviations among the replications are zero). Although the interquartile range (IQR) of sHEA and rHEA are almost equal, bHEA’s results are highly dispersed. The maximum value of sHEA is lower than both algorithms (rHEA and bHEA). Thus, sHEA is more robust in solving varied-sized instances than rHEA and bHEA.

Table 10 Rank measurement of sHEA, rHEA and bHEA using the Friedman test
Table 11 Wilcoxon signed rank test between the best and near-best algorithms (sHEA, rHEA and bHEA)

Since the sHEA performs better than rHEA and bHEA, these two non-parametric tests are conducted again between sHEA and 11 other algorithms, excluding the CEBFO and TLBO algorithms. These algorithms are excluded because solving less than 30 instances is statistically insignificant for these tests. The Friedman test results are reported in Table 12, reflecting that sHEA is highly competitive and consistent among all statistically dispersed results. The Wilcoxon signed- test is performed to determine any substantial dominance of pABC and WOA-LFDE over sHEA. The p-values of sHEA-pABC and sHEA-WOA-LFDE are 0.157 and 0.891, respectively. Both values are significantly higher than 0.05, and thus, neither pABC nor WOA-LFDE is better than sHEA. Meanwhile, the Z-value of these pairs are: \(Z_{sHEA-pABC}\) = \(-1.414\) and \(Z_{sHEA-WOA-LFDE}\) = \(-0.137\). Therefore, sHEA is highly competitive and not out-performed by any other algorithms.

However, this JSSP resembles many real-life problems, such as multi-product assembly of the apparel industries (Guo et al., 2006), vehicle production in automobile industries (Zhang et al., 2013), assembly fixtures in aeronautic industries (Da Silva et al., 2014) and customized products in furniture industries (Vidal et al., 2011). Several schedulers exist and are used in industries that mainly sequence customer orders considering demand, material availability, and due date. However, the proposed algorithm (i.e., sHEA) can optimize workflows for each job/customer order (that comprises a set of operations) by correctly mapping interrelated resources, resulting in a holistic system overview of the production insight. A schedule includes the start and the finish time of each job/customer order, their sequences in machines, and machine idle times. This predictive visibility and insight delivered by the JSSP explanations allow decision-makers adequate time and maneuverability to improve operational decisions to leverage benefits, integrating these JSSP results to material acquisition (Sawik, 2016), due date assignment (Steiner & Zhang, 2011), order acceptance (Sarvestani et al., 2019), and delivery planning problems (Ullrich, 2013). Overall, it can create and foster a responsive relation between supply chain stages.

Conclusion and recommendation

This paper developed HEAs, in which different population structures in a multi-algorithmic framework with a performance-driven switching mechanism are proposed to solve the computationally intractable NP-hard JSSPs. Algorithms may perform poorly over generations, and continuing evolution in those algorithms is unrealistic. Thus, the capabilities of multiple algorithms were integrated into a single algorithmic scheme, and a performance-driven meta-heuristic switching approach was proposed to emphasize the best-performing algorithms. The initial population impacts the evolutionary process, and thus this study employed the MSS with equally prioritizing diversity and solution quality of schedules and the population diversity was maintained over the evolutions by implementing the diversity check mechanism. EAs explore a search space utilizing a population, and dividing the population into sub-populations may destroy the search uniformity and exploration capability. Thus, this study developed three approaches (i.e., sHEA, rHEA and bHEA) to illustrate the impacts of different population structures. These algorithms’ exploitation property was enhanced by employing the TS inspired local search techniques. This local search utilized the N7 neighbourhood structure and recorded the moves for a specific tenure to avoid reevaluating the same solutions and becoming trapped in local optima.

An extensive experiment was conducted to test the performance of our three developed hybrid approaches against their component algorithms (i.e., TS, DE and PSO), as well as relevant 26 popular algorithms. The bHEA could not produce promising results compared to rHEA and sHEA. However, the computational results and the statistical analysis have shown that rHEA and sHEA are highly competitive compared to any algorithms reviewed in this paper. Although there is no statistically significant difference between rHEA and sHEA, the computational results showed that sHEA is \(1.1\%\) better than rHEA based on Equation 24. Thus, sHEA is the most competitive proposed approach in this paper. The strategy implemented in sHEA showed better exploration property at the start of the evolutionary process and better exploitation property at the later stage, increasing the chance of finding global optima. Moreover, the statistical analysis showed that the proposed sHEA ranked first with the mean value of 5.09 compared to the near-best algorithms of the existing 26 algorithms.

Fig. 7
figure 7

Robustness of the developed algorithms

Table 12 Rank measurement between proposed and state-of-the-art algorithms using Friedman test

Although the proposed sHEA produces promising results, being highly competitive with other algorithms, there is scope for improvement. Thus, the main limitations of the current work and some suggestions for future works include the following:

  • MODE variants were selected based on their characteristics, which may not be the best combinations. Thus, mutation operators can be investigated more definitively.

  • The solution quality controlled the switching from an under-performing algorithm to other possibilities, excluding the population diversity, although the population diversity is ensured using the diversity check mechanism. Both population diversity and solution quality can be considered together while switching decision is made.

  • Although the proposed algorithms’ effectiveness was assessed against a list of existing algorithms, the switching strategy’s contribution can be further assessed compared with the most popular reinforcement learning.

  • The Tabu list and diversity check mechanism was introduced to prevent revisiting the identical solutions. Although Tabu list overcame the problem to some extent; however, it can not record all the moves. Thus, the memory-based approach, where every move can be recorded for a few generations, can be an exciting approach to be explored. It would prevent revisiting the same solution and becoming trapped in local optima, reducing computational expenses.

  • The classical JSSP can be extended to supply chain scheduling, including the vehicle routing problems and supply portfolio decisions and then the effectiveness of the proposed algorithm can be assessed.

The authors are currently pursuing a number of these directions in other works.