Next Article in Journal
Soft Polymer Optical Fiber Sensors for Intelligent Recognition of Elastomer Deformations and Wearable Applications
Previous Article in Journal
A Metamaterial-like Structure Design Using Non-uniformly Distributed Dielectric and Conducting Strips to Boost the RF Field Distribution in 7 T MRI
Previous Article in Special Issue
Uncertainty Evaluation of a Gas Turbine Model Based on a Nonlinear Autoregressive Exogenous Model and Monte Carlo Dropout
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Intelligent Scheduling in Low-Carbon Heterogeneous Distributed Flexible Job Shops with Job Insertions and Transfers

1
College of Computer Science and Cyber Security, Chengdu University of Technology, Chengdu 610059, China
2
Sichuan Engineering Technology Research Center for Industrial Internet Intelligent Monitoring and Application, Chengdu 610059, China
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(7), 2251; https://doi.org/10.3390/s24072251
Submission received: 15 March 2024 / Revised: 28 March 2024 / Accepted: 28 March 2024 / Published: 31 March 2024
(This article belongs to the Special Issue Artificial Intelligence and Sensing Technology in Smart Manufacturing)

Abstract

:
With the rapid development of economic globalization and green manufacturing, traditional flexible job shop scheduling has evolved into the low-carbon heterogeneous distributed flexible job shop scheduling problem (LHDFJSP). Additionally, modern smart manufacturing processes encounter complex and diverse contingencies, necessitating the ability to address dynamic events in real-world production activities. To date, there are limited studies that comprehensively address the intricate factors associated with the LHDFJSP, including workshop heterogeneity, job insertions and transfers, and considerations of low-carbon objectives. This paper establishes a multi-objective mathematical model with the goal of minimizing the total weighted tardiness and total energy consumption. To effectively solve this problem, diverse composite scheduling rules are formulated, alongside the application of a deep reinforcement learning (DRL) framework, i.e., Rainbow deep-Q network (Rainbow DQN), to learn the optimal scheduling strategy at each decision point in a dynamic environment. To verify the effectiveness of the proposed method, this paper extends the standard dataset to adapt to the LHDFJSP. Evaluation results confirm the generalization and robustness of the presented Rainbow DQN-based method.

1. Introduction

In recent years, due to the progress of globalization and computer technology, numerous manufacturing enterprises have shifted from the traditional single job shop model to the distributed job shop model. This shift can reduce labor and raw material costs while improving production efficiency. In contrast to the classical flexible job shop scheduling problem (FJSP), the heterogeneous distributed flexible job shop scheduling problem (HDFJSP) surpasses the constraints on the uniqueness of job shops. Each job can be dispatched to multiple job shops in various locations, and each operation can be allocated to more than one candidate machine. As a result, the distributed production mode is more flexible and better suited for the actual production environment.
Furthermore, industry is the world’s second-largest source of carbon dioxide emissions, with total emissions of approximately 870 million tons of carbon dioxide in 2020, and the energy consumption of the manufacturing industry is expected to rise to 16 percent in 2030 [1]. In China, the manufacturing sector accounted for roughly 84 percent of total industrial energy consumption in 2020, with electricity consumption in the sector increasing by 3 percent from 2019, as reported by the U.S. Energy Information Administration [2]. Meanwhile, in the United States, the industrial sector consumes approximately 33.3 percent of energy from various sources, including fossil fuels, renewable energies, and nuclear power, according to the U.S. Energy Information Administration [3]. In the Industry 4.0 era, due to increasing energy costs and environmental pollution, the low-carbon scheduling problem has garnered significant attention from academics and engineers as a new mode of dispatch.
Moreover, the practical production environment encounters more complex and diverse contingencies. In the event of an emergency, rescheduling from scratch is not only time-consuming, but also demands significant expert experience. As a result, it becomes challenging to meet the requirements of a real-time production environment while maintaining superior scheduling quality [4,5].
In summary, the dynamic multi-objective scheduling problem (DMoSP) for low-carbon heterogeneous distributed flexible job shop (LHDFJS) is a novel and significant topic that is relevant to modern supply chain and manufacturing systems. The LHDFJS model represents a multi-factory low-carbon production environment where each factory operates as a low-carbon flexible job shop. Besides that, LHDFJS is characterized by a large-scale, complex, and variable environment with numerous constraints and strong dynamics. These factors can lead to unexpected events such as job insertions and machine faults, which can impact the previously generated scheduling scheme or render it invalid [6]. Notably, job shop scheduling is one of the essential methods to reduce carbon emissions in the manufacturing sector. Traditional job shop scheduling strategies have focused primarily on economic factors such as completion time and machine utilization, while neglecting energy and environmental considerations such as energy consumption during processing and transportation. Therefore, the study of the DMoSP for LHDFJS holds significant theoretical significance and practical value.
To solve the distributed flexible job shop scheduling problem, the majority of existing works restrict that all the operations of a job must be processed in the same job shop [7,8,9,10]. Few works [11,12,13] allow for workpieces to be transferred and processed in different job shops, assuming homogenous job shops with equal transportation times between job shops and machines to simplify the problem. However, the heterogeneity of job shops is an important characteristic of DFJS. In the scheduling of heterogeneous job shops, dynamically balancing economic and environmental objectives becomes a key factor for manufacturing enterprises to enhance competitiveness. With this in mind, this paper leverages a Rainbow deep-Q network (Rainbow DQN), to construct a deep reinforcement learning (DRL) framework to tackle the DMoSP for LHDFJS.
Main contributions are listed as follows:
  • Aiming at the DMoSP for LHDFJS, a mathematical model is established with the objective of minimizing the total weighted tardiness and total energy consumption of the processing process. Heterogeneous job shops with different machine processing capabilities and energy consumption, different transportation times between job shops and machines, and job transfers between job shops are all considered;
  • Seven job selecting rules and six machine assignment rules are designed. By Cartesian quadrature, a total of 42 composite scheduling rules are designed to optimize the total weighted tardiness and total energy consumption in an LHDFJSP;
  • A Rainbow DQN is proposed to address the DMoSP for LHDFJS. State space, action space and reward function are all redesigned. Specifically, 10 state features are extracted to summarize the status of production. Forty-two composite scheduling rules are obtained as the candidate actions. A novel reward function, consisting of immediate and episodic rewards, is designed to balance the economic and environmental indicators;
  • A new dataset is extended from the standard one to adapt to the DMoSP for LHDFJS with job transfers and insertions. This allows for a more realistic representation of the scheduling problem and better evaluation of the algorithms;
  • Based on the newly built dataset, this study compares the Rainbow DQN-based method and dueling double DQN with prioritized experience replay (D3QN with PER), as well as 8 classical heuristic scheduling rules and 42 candidate composite scheduling rules. The results indicate that the proposed composite scheduling rules outperform the classical heuristic scheduling rules, while Rainbow DQN excels over other algorithms in minimizing both total weighted tardiness and total energy consumption.
The remainder of this paper is organized as follows: Section 2 provides an overview of the research and practical applications of the DMoSP for LHDFJS. Section 3 primarily introduces the preliminary of the Rainbow DQN. In Section 4, the mathematical model of the DMoSP for LHDFJS is presented. Section 5 describes the application of the Rainbow DQN algorithm in the DMoSP for LHDFJS. In Section 6, experimental analysis of the proposed algorithm and comparison experiments are presented. Finally, Section 7 concludes this paper.

2. Literature Review

This section introduces an overview of the current research status in terms of distributed flexible job shop scheduling, low-carbon scheduling, dynamic scheduling, and DRL-based scheduling methods.

2.1. Distributed Flexible Job Shop Scheduling

By effectively coordinating multiple workshops and machines, distributed flexible job shop scheduling (DFJS) enables efficient utilization of resources, idle time minimization, production delay reduction, and the overall productivity enhancement. Therefore, distributed manufacturing is gradually emerging as the main production method [9]. De Giovanni and Pezzella [7] first defined the DFJS problem and proposed an improved genetic algorithm to address the problem for small and medium-sized distributed manufacturing units in a single factory. Zhao et al. [10] elaborated on a mixed-integer linear programming (MILP) model of the distributed assembly no-idle flow-shop scheduling problem without job transfers and proposed a water wave optimization algorithm combined with a three-stage variable neighborhood search to minimize assembly completion time. Du et al. [8] and Zhang et al. [14] considered the constraints of crane transportation conditions in DFJS. The former used an optimization algorithm combining estimation of distribution algorithm and variable neighborhood search, while the latter utilized a Q-learning-based hyper-heuristic evolutionary algorithm. Li et al. [15] proposed an improved gray wolf optimizer to solve the DFJS problem without job transfers.
In the field of distributed job shop scheduling, as indicated by Luo et al. [11], most of the existing research dedicated to the DFJS problem assumes that workpieces are only allowed to be processed within a certain job shop, i.e., all the operations of the same job must be processed in the same factory. However, in actual production activities, job transfer between distributed job shops is a key factor to take advantage of different workshops and improve production efficiency. The study of Meng et al. [12] made the first attempt to solve the DFJS problem with transfers using MILP and constraint programming models. Luo et al. [11] proposed a memetic algorithm combining evolutionary algorithms and local search to tackle the DFJS problem with transfers, assuming that the transfer times of all machines and factories are the same. Sang and Tan [13] combined the improved NSGA-III and local search method to solve the HDFJSP with transfers, taking into account the energy factor of the job shop.
The aforementioned works assume either homogeneous job shops or equal transport times between job shops to simplify the problem description and solution. However, job shops are heterogeneous, and the transport time may vary among different job shops. Moreover, dynamic scheduling is not supported in these works.

2.2. Low-Carbon Scheduling

Low-carbon scheduling plays a crucial role in the field of job shop scheduling by addressing environmental concerns and promoting sustainable practices. Dai et al. [16] established an energy-aware mathematical model integrating process planning and scheduling, proposed performance evaluation indexes including energy consumption (i.e., basic power consumption, unload power consumption, and cutting power consumption) and maximum time of scheduling, and developed an improved genetic algorithm to obtain the Pareto optimal solution. Zhang et al. [17] proposed a low-carbon scheduling flexible job shop model that takes into account production factors and environmental effects and designed indexes of processing carbon efficiency, part carbon efficiency, and machine tool carbon efficiency to estimate carbon emissions from parts and machine tools. Jiang and Deng [18] proposed a bi-population-based discrete cat swarm optimization algorithm to solve the low-carbon FJSP. The research mainly studies the energy consumption of processing and idle-load. Yin et al. [19] formulated a low-carbon scheduling model and introduced a multi-objective genetic algorithm to optimize productivity, energy efficiency, and noise reduction. Li et al. [20] proposed a multi-objective low-carbon job-shop scheduling problem with a variable processing speed constraint and developed an improved artificial swarm algorithm to minimize the makespan, machine loading, and total carbon emissions (i.e., processing energy consumption, idle energy consumption, and on/off switching energy consumption).
The main objective of low-carbon scheduling is to improve energy efficiency by strategically optimizing scheduling processes. Current research primarily addresses factors such as processing energy consumption, idle energy consumption, transportation energy consumption, and on/off switching energy consumption. Considering that frequent on/off switching may potentially cause damage to equipment, this paper places emphasis on key environmental factors including processing energy consumption, idle energy consumption, and transportation energy consumption.

2.3. Dynamic Scheduling and Deep Reinforcement Learning Methods

The majority of traditional DFJS methods mainly consider static scheduling issues, neglecting the importance of dynamic scheduling [21]. Since static schedules are fixed, assuming that all data are known beforehand, they are relatively easier to plan and manage, especially in stable and predictable environments. By contrast, dynamic scheduling optimizes resource utilization based on real-time demand and availability. It minimizes idle time and maximizes productivity by dynamically allocating resources to the most critical tasks. Shahgholi et al. [22] proposed a heuristic model for a dynamic FJSP with variable processing time and rescheduling based on the idea of the artificial bee colony algorithm. Li et al. [23] designed a rescheduling method based on the Monte Carlo tree search algorithm for a dynamic FJSP considering four dynamic contingencies, which can shorten the response time to dynamic contingencies. Applications in dynamic scheduling problems can be divided into completely reactive, robust, and predictive–reactive methods. In recent years, scholars have mainly studied the schemes and performance of robust and predictive–reactive methods in dynamic scheduling [24,25].
With the ability to learn from experience and make intelligent decisions, deep reinforcement learning (DRL) can optimize scheduling strategies and improve overall performance. It enables the system to adapt to dynamic environments, handle uncertainties, and optimize various objectives simultaneously. Yan et al. [26] achieved efficient dynamic scheduling by combining a double-layer Q-learning algorithm with a digital twin algorithm, which considers both machine and worker resources in FJSP and involves four cases of dynamic interference. On the other hand, Chang et al. [27] proposed a double deep Q-network (DDQN) algorithm framework for dynamic scheduling to solve an FJSP with random job arrival times, where the state space, action space, and reward function of the agent were designed. Yan et al. [28] designed a deep Q-network (DQN) framework-based greedy rate reduction to solve the distributed permutation flow shop scheduling problem with machine maintenance constraints. Zhang et al. [29] presented a DRL framework using the proximal policy optimization algorithm to tackle unforeseen machine failures. Li et al. [25] proposed a hybrid DQN for a dynamic FJSP with insufficient transportation resources. Based on the conjugate DQN of DRL algorithm, Lee and Lee [30] proposed a novel state, action, and reward optimization scheduling strategy to achieve self-learning and self-optimizing semiconductor manufacturing systems.
DRL is a promising approach to production scheduling, especially in the stochastic production environment [31]. However, the field of scheduling based on DRL is still in its infancy. Challenges such as lack of interpretability and difficulties in practical industrial applications make designing scheduling solutions based on DRL challenging. Developing a solution that is both competitive and reliable in production scheduling using existing methods is a challenging task.

2.4. Summary

Table 1 summarizes the research elements covered in the relevant literature. It can be observed that, regarding the DFJS problem, most studies have overlooked the transfer factors between job shops and have not considered the heterogeneity of job shops. In addition, existing research only addresses partial production requirements, such as the combination of low-carbon considerations and dynamic scheduling, or considers transfer factors combined with multi-objective problems. This is far from covering the complex factors that exist in actual production.
Furthermore, researchers have proposed diverse solutions based on different types of production environments and requirements. Metaheuristic algorithms remain a common solution in various scenarios. Meanwhile, DRL methods are gradually gaining attention and demonstrating potential in solving dynamic problems and achieve real-time scheduling.
In this paper, complicated factors such as low-carbon considerations, dynamics, transfer, and job shop heterogeneity are taken into account, leading to the design of a dynamic scheduling approach that integrates composite scheduling rules with Rainbow DQN. This method aims to provide a comprehensive and effective solution for addressing the complexities of the LHDFJSP.

3. Preliminary

Conventional scheduling methods confront formidable challenges, including intricate problem sets, dynamic and fast-changing environments, and the need for multi-objective optimization. The adoption of Rainbow DQN introduces a pioneering approach to address these challenges. This section delves deep into the theoretical underpinnings of Rainbow DQN. Rainbow DQN stands as a significant milestone in the realm of DRL, presenting a fresh theoretical and technical framework to tackle the intricate scheduling dilemmas found in job shop environments.

3.1. Q-Learning and DQN

The Q-learning algorithm was first proposed by Watkins [35] in his doctoral dissertation. Tesauro et al. [36] combined reinforcement learning (RL) with neural networks, which work by working against themselves and learning from the results. The Q-learning algorithm uses the Bellman equation to update the Q-value and store it in a Q-table to estimate the Q-value of the corresponding state-action. The updated formula of Q-value is as follows:
Q s t ,   a t = Q s t ,   a t + α [ r t + γ m a x Q s t + 1 ,   a t + 1 Q s t ,   a t ]
Here, α is the learning rate and γ is the discount rate. s t + 1 is the next state and a t + 1 represents the action selected in state s t + 1 .
Mnih et al. [37] proposed a Q-learning approach to play an Atari game in conjunction with a deep learning network, which is called a deep Q-network (DQN). The DQN mainly adopts the idea of value function approximation, uses the neural network to approximate the value function, and utilizes the method of target Q network and experience replay to train the network, which improves the training speed and stability of the DQN. The method of calculating the target value in a DQN algorithm is shown in Equation (2):
Y t D Q N = r t + 1 + γ Q π s t + 1 ,   π s t + 1 ; θ ; θ
Here, π refers to a certain strategy, and π s t + 1 ; θ = a r g   m a x a Q s t + 1 ,   a t ; θ is a fixed strategy that leads to limited interactions with the environment. In this way, epsilon greedy is often used to add randomness to exploration. Q π s t + 1 ,   a t + 1 represents the cumulative reward expectation of agent choosing action a t + 1 under state s t + 1 . The error calculation formula (loss function) between the estimated value and the target value in the current state is as in Equation (3):
L θ = E [ ( r t + γ m a x a t + 1 A Q s t + 1 ,   a t + 1 ; θ Q s t ,   a t ; θ ) 2 ]

3.2. Double DQN

Van Hasselt et al. [38] proposed a double DQN method to solve the problem of overestimation of Q-learning by decoupling the choice of action and the calculation of the target Q value. The method of calculating the target value in a double DQN algorithm is shown in Equation (4):
Y t D o u b l e D Q N = r t + 1 + γ Q s t + 1 ,   a r g   m a x a Q s t + 1 , a ; θ ; θ
The double DQN algorithm constructs two action value functions. The agent determines the action with the evaluation network and calculates the value of the action with the target network when estimating the reward.

3.3. Dueling DQN

Wang et al. [39] improved the network structure of DQNs; the method mainly divides the Q-value function to form a dual network. The dueling DQN proposes two value computation branches, one for predicting state values and the other for predicting state-related action advantage values. The state function is only used to predict whether the state is good or bad, while the action advantage function is only used to predict the importance of each action in that state.
Q s t ,   a t ; θ ,   α ,   β = V s t ; θ ,   β + [ A s t ,   a t ; θ ,   α 1 A a t + 1 A s t ,   a t + 1 ; θ ,   α ]
Equation (5) represents the addition of the state function and the action advantage function, but there might be a non-unique solution. Therefore, the unique action advantage value is obtained by subtracting an average value from the action advantage function. Here, θ denotes shared neural network parameters; α and β , respectively, represent the network parameters of state value function and action advantage function.

3.4. Noisy Networks

Fortunato et al. [40] replaced the general neural network with a noisy neural network whose weights and parameters were interfered with by noise functions. The noisy network is re-sampled before each action step and the neural network is updated with noise to improve the action exploration capability of the DRL models. The ordinary linear layer of the neural network is expressed as Equation (6):
y = ω x + b
where x is the input layer, ω is the weight matrix, and b is the deviation. The improved linear layer of noise is defined as Equation (7):
y = def μ ω + σ ω ε ω x + μ b + σ b ε b
Here, μ ω and μ b are the mean values required to be obeyed by parameters ω and b , σ ω and σ b represent the variance brought by noise, and ε is random noise with the same dimension. The noise weight and noise deviation can be expressed as ω = μ ω + σ ω ε ω and b = μ b + σ b ε b , respectively.

3.5. Multi-Step Reinforcement Learning (RL)

DRL models typically use a single-step temporal difference (TD) algorithm to judge the value of the target. The TD algorithm inherits the advantages of the dynamic programming and Monte Carlo methods to predict state value and optimal policy [41]. However, a single-step TD algorithm will lead to a large bias in the estimation of the target value during the initial training period. Hence, De Asis et al. [42] demonstrated that immediate rewards can be accurately obtained through interaction with the environment. The idea of adopting multi-step learning is to replace a single-step return with an N-step return, so that the target value at the early stage of training can be estimated more accurately, thus speeding up the training. A multi-step RL concept is adopted in Rainbow DQN to construct the N-step return, and its loss function is as follows:
L N s t e p = ( k = 0 N 1 γ k r t + k + γ N m a x a t + 1 Q s t + N ,   a t + 1 ; θ Q s t ,   a t ; θ ) 2

3.6. Prioritized Experience Replay (PER)

In a conventional DQN, the experience replay is uniformly sampled from the entire experience buffer, and experience transitions are sequentially stored in the experience buffer and periodically overwritten for updates. However, the values are not the same for different samples; thus, Schaul et al. [43] proposed a method to provide the priority of experience transitions, and sample according to the priority of the samples. In order to rank different experience transitions according to their priority, Schaul et al. calculated the absolute value of TD error using the Q-value of the outputs of two networks, which was used to measure the degree of priority learning. The larger the TD error result, the more the sample needs to be learned, that is, the more high the priority. The sampling distribution is shown in Equation (9):
p t r t + 1 + γ t + 1 m a x a t + 1 Q s t + 1 ,   a t + 1 ; θ Q s t ,   a t ; θ ω

4. Problem Formulation

The LHDFJSP involves multiple smart factories located in different geographical locations, each of which may contain a varying number and types of machines. All operations of a job can be processed in the same job shop or transferred in different job shops according to their predetermined or intrinsic sequence of operations. This section provides a problem description and establishes the mathematical model of the LHDFJSP. Different from previous works that restrict all the operations of a job to the same job shop or assume homogenous job shops, this paper considers heterogeneous job shops with different machine processing capabilities and energy consumptions, different transportation times between job shops and machines, and more flexible job transfers between different job shops.

4.1. Problem Description

A low-carbon heterogeneous distributed flexible job shop (LHDFJ) involves multiple workshops, each of which has heterogeneous machines. Jobs arrive sequentially for processing, and each job has a set of operations that can be processed by more than one machine. Each operation of each job has a sequence of constraints.
As shown in Figure 1, the LHDFJSP fabricates jobs through collaborative production between different LHDFJs. All operations of a job can be completed in the same LHDFJ or be transferred to different LHDFJs for processing. Different LHDFJs exhibit variations in terms of the number of machines, machine technologies, processing energy consumption, and idle energy consumption, among other factors. Efficient scheduling and resource allocation may be required to manage and coordinate the production activities of these heterogeneous workshops to ensure optimal production efficiency and product quality.
To facilitate understanding, an example of processing two jobs in two workshops is exhibited. Table 2 illustrates the processing time and energy consumption of each operation on each machine. For example, the processing time of O 11 executed by machine m 1 is 3, and the corresponding processing energy consumption is 13. The character “-” indicates that the operation cannot be processed by the machine. Table 3 exhibits the transfer time between workshops and machines. For example, the transfer time between machine m 1 in workshop 1 and machine m 1 in workshop 2 is 150, and the transfer time between machine m 1 and machine m 3 in workshop 1 is 15. The transfer of the job is divided into two cases:
  • If the preceding and succeeding operations of a job are processed in different machines of the same job shops, transport time between machines should be considered. For example, as shown in Table 3, the transport time between m 1  and m 2 in workshop 1 is 20 units of time;
  • If the preceding and succeeding operations of a job are processed in different job shops, only the transport time between different job shops is considered, while neglecting the transport time between machines. For example, as shown in Table 3, the transfer time between two workshops is fixed at 150 units of time, regardless of specific machines involved in the transition from workshop 1 to workshop 2.
The processing time, transport time, and energy consumption information are all known a priori. The objective of the LHDFJSP is to find the best processing job combination for each LHDFJ while considering the machine capacity constraints, that is, to select the optimal processing machine for each operation, and determine the optimal processing sequence of jobs on each machine in each LHDFJ, in order to minimize the total weighted tardiness and energy consumption generated during the processing. The problem is based on the following assumptions:
  • The available time of each machine is 0;
  • Loading and unloading time of jobs is not considered;
  • All operations cannot be interrupted/preempted in the processing process;
  • The machine can run continuously and there is sufficient buffer between machines;
  • There are sufficient transport devices to complete the transfer of jobs.

4.2. Mathematical Model

Table 4 presents the symbols used in the model (indexes all start from 1).
For the mathematical model of LHDFJS, an MILP is proposed to minimize the total weighted tardiness and total energy consumption. The MILP model consists of objective functions and constraints. LHDFJS can be formulated as follows:
  • Objective:
    T T = i = 1 N m a x C T i D i , 0
    T E = p r o c E + i d l e E + t r a n s E
Equation (10) computes the total weighted tardiness (TT) of the LHDFJSP model based on the job information; C T i is computed as C T i = f = 1 F k = 1 M f C i , n i x i n i f k . Equation (11) calculates the total energy consumption during processing, including processing energy consumption, idle energy consumption of machine, and transportation energy consumption. The p r o c E , i d l e E , and t r a n s E are formulated as Equations (12)–(14).
p r o c E = f = 1 F k = 1 M f i = 1 N j = 1 n i p t i j f k · p e f k · x i j f k
i d l e E = f = 1 F k = 1 M f i = 1 N j = 1 n i g = 1 N h = 1 n g i e f k · S g , h C i , j y i j , g h x i j f k x g h f k
t r a n s E = f = 1 F u = 1 F i = 1 N t e · T r a n s F f u · b i f u + f = 1 F l = 1 M f k = 1 M f i = 1 N t e · T r a n s M l k · a i l k
2.
The assumptions and constraints are as follows:
f = 1 F k = 1 M f x i j f k = 1 , i , j
S i , 1 A i · x i 1 f k 0 , i , f , k
C i , j = S i , j + p t i j f k · x i j f k
S i , j · x i j f k T r a n s F u f · b i u f T r a n s M l k · a i l k C i , j 1 · x i j 1 u l 0 , i , j , f , u , k , l , j 2 , N
S g , h C i , j · x i j f k · x g h f k · y i j , g h + S i , j C g , h · x i j f k · x g h f k · y g h , i j 0 , i , j , f , k , g , h
a i l k + b i f u 1 , i , l , k , f , u
where Constraint Equation (15) restricts that an operation can be machined in exactly one machine of one job shop. Constraint Equation (16) ensures that each job can only be processed after its arrival. Constraint Equation (17) indicates that the completion time of the operation is equal to the start time plus the processing time. Constraint Equation (18) states that the operations of each job must follow the priority order from front to back. According to Constraint Equation (19), if the operations of different jobs are to be processed by the one machine, they must be processed in sequence. Constraint Equation (20) does not take into account the transport time between machines when considering the transport time between jobs in the job shop. That is, the transportation between job shops and machines are not considered at the same time.

5. Rainbow DQN in LHDFJSP

In this section, a tailored Rainbow DQN applied to the LHDFJSP is explained in detail regarding four main aspects, i.e., the designs of Rainbow DQN architecture, state space, action space, and reward function, which will be introduced in Section 5.1, Section 5.2, Section 5.3, and Section 5.4, respectively.

5.1. Rainbow DQN Architecture

Rainbow DQN, first proposed by Hessel et al. in 2018, incorporates various modified algorithms [44]. According to Table 1, value-based DRL methods for job shop scheduling are typically based on relatively simple DQNs or DDQNs, while Rainbow DQN represents a more powerful version. Currently, Rainbow DQN has not yet been applied in this field and whether it can contribute to solving the job shop scheduling problem is pending. Authors are curious about the application of Rainbow DQN in the field of job shop scheduling and eager to explore its performance and potential in practice. Figure 2 depicts the architecture of Rainbow DQN in LHDFJS.
Rainbow DQN takes the state of the job shop environment as input and maps it to the estimation of Q-values through a deep neural network, enabling the selection of appropriate scheduling strategies. Additionally, Rainbow DQN utilizes a prioritized experience replay mechanism to store the interaction experiences of the agent in the job shop environment. These experiences include state, chosen scheduling strategy, reward, and next state. The agent learns from past experiences to reduce data correlation and improve sample efficiency.
In this paper, Rainbow DQN is applied to a value-based LHDFJ scheduling environment, which integrates the algorithms or concepts of double DQN, dueling DQN, noisy networks, multi-step RL, and prioritized experience replay (PER). Note that in our framework (as displayed in Figure 3), the component of distributional RL is excluded from the Rainbow DQN to accelerate the training process, since it requires more training time according to Väth and Vu [45]. After training the modified Rainbow DQN, a smart agent can make sensible decisions at each time step based on its observations of the current production state, so as to achieve satisfactory scheduling results. The overall training process is exhibited in Algorithm 1.
Algorithm 1: Rainbow deep Q-network
1: Initialize network Q s t ,   a t ;   θ ,   α ,   β with random weights
2: Initialize learning rate, discount factor, network parameters, replay memory
3: For episode n = 0 to N  do
4:   Reset the state s t
5:   For  t = 0  to T  do
6:    Select action a t and execute a t
7:    Observe the reward r t and next state s t + 1
8:    Store and sample transition ( s t , a t , r t , s t + 1 ) with i ~ P i = p i / j p j in replay memory
9:    Calculate TD-error δ = k = 0 N 1 γ k r t + k + γ N m a x a t + 1 Q s t + N ,   a t + 1 ;   θ Q s t ,   a t ;   θ
10:    Update transition priority p i δ
11:    Set θ θ every C steps
12:   End for
13: End for

5.2. State Space

The state space comprehensively reflects the production status of the rescheduled points and contains a total of 10 LHDFJ state information units. To facilitate the understanding of the 10 state information units, it is necessary to clarify the parameters and formulas involved in Table 5 (indexes all start at 1).
The state space of Rainbow DQN in the context of job shop scheduling is described as follows:
  • Estimated delay rate at rescheduling point t , T a r d e t :
    T a r d e t = i T a r d J e t n i N P O i t i U c o m p J t n i N P O i t
Here, T a r d J e t denotes the set of jobs that are expected to be delayed at rescheduling point t and U c o m p J t denotes the set of jobs whose processing is not completed at rescheduling point t . In addition, the estimated delay of job J i at rescheduling point t is judged by N P O i t n i   a n d   E D T i t 0 .
2.
The actual delay rate at rescheduling point t , T a r d a t :
T a r d a t = i T a r d J a t n i N P O i t i U c o m p J t n i N P O i t
Here, T a r d J a t represents the set of jobs that are actually postponed at rescheduling point t .
In addition, the actual delay of job J i at rescheduling point t is judged by N P O i t < n i   a n d   C i , N P O i t D i .
3.
Estimated weighted delay rate at rescheduling point t , W T a r d e t :
W T a r d e t = i T a r d J e t T i t ¯ W i i U c o m p J t T i t ¯ W i
4.
Average utilization rate of all machines in all job shops at rescheduling point t , U R a v e t :
U R a v e t = f = 1 F k = 1 M f U R k f t f = 1 F M f
5.
The standard deviation of machine utilization at rescheduling point t , U R s t d t :
U R s t d t = f = 1 F k = 1 M f U R k f t U R a v e t 2 f = 1 F M f
6.
Average completion rate of all operations at rescheduling point t , C R O a v e t :
C R O a v e t = i = 1 N N P O i t i = 1 N n i
7.
Average completion rate for all jobs at rescheduling point t , C R J a v e t :
C R J a v e t = i = 1 N C R J i t N
8.
The standard deviation of all job completion rate at rescheduling point t , C R J s t d t :
C R J s t d t = i = 1 N C R J i t C R J a v e t 2 N
9.
The energy consumption index of all completed operations at rescheduling point t , E C I t :
E C I t = T E i , j m i d t T E i , j t T E i , j m i d t T E i , j m i n t
Here, T E i , j m i d t = T E i , j m i n t + T E i , j m a x t 2 .
10.
The reduced completion time of the last operation processed on the machine M k f  at the rescheduling point t , R C T M f k t :
R C T M f k t = C T M k f t T c u r

5.3. Action Space

In this paper, the classical composite scheduling rule is used as the action space. Each action consists of a job selecting rule and a machine assignment rule. Based on the state space, 7 job classification rules and 6 machine assignment rules are set, and then a total of 42 composite scheduling rules are obtained by Cartesian quadrature. Among the 42 candidate rules, the first 10 rules with good average results are chosen as the action space. Specifically, the action space of the Rainbow agent dynamically changes at each time step. According to the sizes in the dataset, the agent selects the 10 most effective rules from the 42 candidate composite scheduling rules as the action space.

5.3.1. Job Selecting Rule

This subsection presents seven job selecting rules, which are described as follows:
Job Selecting Rule 1: At the rescheduling point t , if T a r d J a t is not an empty set, the largest EDT i t W i in TardJ a t is chosen as the next scheduling procedure.
If T a r d J a t is null, the next scheduled operation with the smallest average slack time S T i t in U c o m p J t is chosen.
S T i t = D i m a x T c u r , C i , N P O i t n i N P O i t
Job Selecting Rule 2: At the rescheduling point t , if T a r d J a t is not an empty set, the largest E D T i t W i in T a r d J a t is chosen as the next scheduling procedure.
If T a r d J a t is null, the next scheduled operation with the smallest critical ratio C R i in U c o m p J t is chosen.
C R i = D i m a x T c u r , C i , N P O i t T i t ¯
Job Selecting Rule 3: Based on T c u r , the jobs are sorted according to the estimated weighted tardiness E D T i t W i , and the one with the largest E D T i t W i is selected as the next scheduling procedure. If there are multiple identical values, choose one randomly.
Job Selecting Rule 4: Select a random job from U c o m p J t .
Job Selecting Rule 5: At rescheduling point t , if T a r d J a t is not an empty set, the maximum n i N P O i t i = 1 N n i N P O i t T i t ¯ W i (critical ratio of weighted tardiness) in T a r d J a t is chosen as the next scheduling procedure.
If T a r d J a t is null, the next scheduled operation is that with the smallest N P O i t n i D i m a x T c u r , C i , N P O i t in U c o m p J t is chosen.
Job Selecting Rule 6: At rescheduling point t , select the job with the lowest completion rate in U c o m p J t .
Job Selecting Rule 7: At rescheduling point t , assign priority based on the deadline of the jobs. The earlier the deadline, the higher the processing priority.

5.3.2. Machine Assignment Rule

This subsection presents six machine assignment rules, which are described as follows:
Machine Assignment Rule 1: Select the earliest available machine m k .
k = a r g m i n k M i j f , f F i , j m a x C T k t , C i , j 1 + T k f
Here, T k f is the transportation time from the previous operation to machine m k in job shop f .
Machine Assignment Rule 2: Select the available machine m k f with the lowest energy consumption (transportation energy consumption plus processing energy consumption plus idle energy consumption), k = T E i , j m i n t (see state space parameters for the calculation method).
Machine Assignment Rule 3: Select the available machine m k f with the lowest utilization rate of machine U R k f t .
U R k f t = i = 1 n j = 1 N P O i t p t i j f k x i j f k C T k f t
Machine Assignment Rule 4: Select the available machine with the shortest processing time.
Machine Assignment Rule 5: Select the machine that is available and has the shortest processing time for the previous operation.
Machine Assignment Rule 6: Choose the available machine with the minimum number of expected usage in all operations of the next round.

5.4. Reward Function

In the context of the LHDFJSP, conflicting optimization objectives arise. For instance, the scheduling scheme aims to minimize both the total weighted tardiness and the total energy consumption. While minimizing the total weighted tardiness implies reducing the processing time on machines, which results in higher energy consumption, minimizing the total energy consumption requires lower energy consumption during product processing. As a result, the agent’s policy cannot optimize all objectives but rather learns a policy that achieves a better outcome among the conflicting ones.
In this paper, the reward function includes immediate reward function R t and episodic reward function E R ; the formula is presented below.

5.4.1. Immediate Reward Function

The immediate reward function consists of three components: the economic index, the energy consumption index, and the machine index.
The calculation formula of economic index reward e c o t :
e c o t = e c o t a r d a + e c o w t a r d + e c o t a r d e + e c o u r + e c o t a r d c
Here, e c o t a r d a represents the reward given based on the actual tardiness rate T a r d a t of current state and next state. e c o w t a r d represents the reward given based on the estimated weighted tardiness rate W T a r d e t of the current state and next state. e c o t a r d e represents the reward given based on the estimated tardiness rate T a r d e t of current state and next state. e c o u r represents the reward given based on the average utilization rate of machine U R a v e t of the current state and next state. e c o t a r d c represents the reward calculated based on the minimum total weighted tardiness and the current total weighted tardiness during training. The calculation formula is as follows:
e c o t a r d a = 1 ,   T a r d a t + 1 < T a r d a t 1 ,   T a r d a t + 1 > T a r d a t 0 ,   T a r d a t + 1 = T a r d a t
e c o w t a r d = 1 ,   W T a r d e t + 1 < W T a r d e t 1 ,   W T a r d e t + 1 > W T a r d e t 0 ,   W T a r d e t + 1 = W T a r d e t
e c o t a r d e = 1 ,   T a r d e t + 1 < T a r d e t 1 ,   T a r d e t + 1 > T a r d e t 0 ,   T a r d e t + 1 = T a r d e t
e c o u r = 1 ,   U R a v e t + 1 < U R a v e t 1 ,   U R a v e t + 1 > U R a v e t 0 ,   U R a v e t + 1 = U R a v e t
e c o t a r d c = 0 ,   m i n T a r d = c u r r e n t T a r d 1 ,   m i n T a r d < c u r r e n t T a r d 50 ,   m i n T a r d > = c u r r e n t T a r d
where t represents the current rescheduling point or decision point and t + 1 represents the next rescheduling point or decision point.
Calculation formula of energy consumption index reward e n e t :
e n e t = e n e E C I + e n e C E
Here, e n e E C I represents the reward given based on the energy consumption index E C I t of the current state and next state. e n e C E represents the reward calculated based on the minimum total energy consumption and the current total energy consumption during training. The calculation formula is as follows:
e n e E C I = 1 ,   E C I t > E C I t + 1 1 ,   E C I t < E C I t + 1 0 ,   E C I t = E C I t + 1
e n e C E = 0 ,   m i n E n e r g y = c u r r e n t E n e r g y 1 ,   m i n E n e r g y < c u r r e n t E n e r g y 50 ,   m i n E n e r g y > = c u r r e n t E n e r g y
The weighted sum of the economic index and the energy consumption index is used as the immediate reward of the rescheduled point; parameter β 0 , 1 is used to balance economic index and energy consumption index:
R t = β · e c o t + 1 β · e n e t
The reward function of machine index refers to the negative reward given to the reduced completion time R C T M f k t of the last operation on all machines and is fed back to the agent with a strongly correlated negative reward value; then, it facilitates the agent achieving faster convergence and better convergence effect.
R t = f = 1 F k = 1 M f R C T M f k t

5.4.2. Episodic Reward Function

The episodic reward is a negative value. The LHDFJ computes the total weighted tardiness C T e p i s o d e and the total energy consumption T E e p i s o d e after each episode. The larger these two values are the greater penalty the environment will feed back to the agent. The parameters ρ and φ are used to reduce the overall value of the objective and match the previous reward values.
E R = C T e p i s o d e · ρ + T E e p i s o d e · φ

6. Experiments

In this section, comprehensive experiments are conducted to evaluate the Rainbow DQN framework with composite scheduling rules. Section 6.1 and Section 6.3 introduce the generation of problem instances and the algorithms for comparison. Section 6.2 primarily elucidates the details of algorithm training. Section 6.4 exhibits comparative results of various scheduling strategies, through which the effectiveness and performance advantages of the proposed Rainbow DQN-based method could be validated. All the experiments were carried out on an AMD Ryzen 5 5600X 6-Core Processor @ 3.70 GHz with 32 G RAM and an NVIDIA GeForce RTX 3060. Additionally, experiments were conducted using Python 3.8, with the main libraries including PyTorch 1.11.0 and NumPy 1.20.0.

6.1. Generation of Problem Instances

To date, there is no public dataset that takes into account complicated factors including job insertions, energy consumption, and transportation of heterogeneous distributed job shops. Therefore, this study extended the existing benchmarks and utilized these problem instances to validate the applicability of the proposed method and evaluate its performance under different scenarios.
The problem instances are extended based on the mk series datasets proposed by Brandimarte [46]. A total of 8 experimental scenarios were designed for evaluation. Detailed parameter settings are listed in Table 6, including the numbers of job shops, initial jobs, dynamically inserted jobs, and machines. Furthermore, Table 7 indicates the general parameters for different scenarios, i.e., the number of operations for each problem instance, transportation time between machines or job shops, unit processing energy consumption, unit idle energy consumption, unit transport energy consumption, DDT, and the weights of jobs.
The arrival interval Y between two consecutive dynamically inserted jobs obeys the exponential distribution Y ~ e x p 1 / λ ; the λ is set to 50. In addition, the due date of job J i , denoted by D i , can be calculated according to DDT, and the calculation formula is as follows:
D i = A i + j = 1 n i p t i j f k ¯ · D D T
Here, p t i j f k ¯ represents the average processing time of operation O i j on all available machines in all job shops.
p t i j f k ¯ = m e a n f = 1 F k 1 M f p t i j f k

6.2. Training Details

During the training process, adjustments were made to several Rainbow DQN parameters to enhance convergence speed and effectiveness in job shop scheduling. The recommended hyperparameters for the Rainbow DQN are listed in Table 8. The selection of these hyperparameters has been carefully scrutinized to better adapt to the complexity of scheduling during the training process. The hyperparameters in this paper were manually adjusted based on experience and domain knowledge. Initially, the generic Rainbow DQN hyperparameters were used, and then manually fine-tuned according to modifications in the scheduling environment. This process involved iterative adjustments and testing of each hyperparameter to observe its impact on the algorithm’s performance, ultimately selecting the optimal parameter combination.
Figure 4 illustrates the convergence performance of the Rainbow DQN algorithm under different hyperparameter settings. Specifically, we focus on the impact of different learning rates, batch sizes, and buffer capacities on the algorithm across various metrics. This allows for us to determine the algorithm’s convergence speed, efficiency, and ultimate stability.
As shown in Figure 4a, under different learning rates, the trend of the average reward curves remains consistent, but the orange curve (with the learning rate equal to 1 × 10 4 ) consistently achieves higher reward values throughout most episodes compared to the other curves. Thus, the learning rate is set to 1 × 10 4 .
Figure 4b illustrates the impact of buffer size on average rewards. When the buffer size is 1 × 105, the convergence speed is relatively slow in the first half, but the convergence efficiency is higher in the latter half, resulting in a more stable convergence outcome.
Figure 4c summarizes the convergence scenarios for different batch sizes, indicating that a batch size of 256 leads to superior convergence effectiveness and efficiency.
The experimentation and adjustment of hyperparameters contribute to a deeper understanding of the performance of Rainbow DQN in job shop scheduling. Furthermore, it provides guidance for further optimization and tuning. Through a systematic validation of different parameter combinations, a better understanding of the algorithm’s robustness can be gained, thereby better meeting the requirements of real-world production environments.

6.3. Algorithms for Comparison

According to Lei et al. [47], the top four job assignment scheduling rules and two machine assignment scheduling rules were selected, and eight heuristic scheduling rules were obtained by Cartesian quadrature (shown in Table 9). These rules are considered as the benchmarks for evaluation.
This paper introduces 42 candidate composite scheduling rules specifically designed for the LHDFJSP and employs the Rainbow DQN framework to select the optimal scheduling rule at each decision point. In order to evaluate the performance of the presented method, a comparison is conducted between Rainbow DQN and the 8 classical scheduling rules, as well as the 42 composite scheduling rules, using different problem instances.
Furthermore, to demonstrate the superior learning capability of Rainbow DQN, we also compare the Rainbow DQN-based method and dueling double DQN with prioritized experience replay (D3QN with PER), which combines the advantages of double DQN and dueling DQN with prioritized experience replay by improving the network architecture and experience sampling to reduce overestimation bias, enhance exploration capability, and improve learning efficiency. These experiments allow for a comparison of the proposed Rainbow DQN method with established scheduling rules in the LHDFJSP domain.

6.4. Experimental Results

This subsection evaluates the performance of the Rainbow DQN with composite scheduling rules. The weight β in Formula (56) is set to 0.7, signifying a prioritization of 0.7 of the economic goal and 0.3 for the environmental goal. This choice underscores a greater emphasis on time efficiency while still maintaining the importance of energy consumption. The parameters ρ and φ in the episodic reward function (58) are primarily used for value scaling and are set to 160 and 1600, respectively, to ensure the magnitude of the reward values aligns with the values of other reward functions.

6.4.1. Comparison of Different Algorithms

Table 10 exhibits performance comparison of classical scheduling rules and composite scheduling rules in various scenarios. Each row indicates a scenario, while each column stands for a scheduling algorithm. In the table, only four sets of classical scheduling rules are exhibited, as they achieved top performance among all the eight rules. The column of the composite scheduling rules shows the optimal value selected from a pool of 42 candidate composite scheduling rules. A bold value represents the best solution, and an underlined value indicates the second best solution. The last column, i.e., solution distance, represents the deviation between the best solution obtained by composite scheduling rules and the currently acquired best solution (bold value). From the solution distance, it can be observed that the composite scheduling rules consistently achieve the optimal solution for the total weighted delay metric in all the scenarios. However, in terms of the total energy consumption metric, there is still a gap between the composite scheduling rules and other classical heuristic scheduling rules, except for Scenario 1.
Table 11 exhibits the comparison results of classical scheduling rules, composite scheduling rules, D3QN with PER, and Rainbow DQN. In Scenarios 2, 3, 4, and 6, Rainbow DQN exhibits significantly lower total weighted tardiness and total energy consumption values compared to other algorithms. Even in more challenging scenarios such as Scenario 7 and Scenario 8, the Rainbow DQN algorithm maintains a competitive edge by achieving relatively lower total weighted tardiness values. Rainbow DQN achieves a solution distance of 0 in most scenarios, indicating that its solutions closely match the optimal solutions, highlighting its ability to approach optimality.
Although Rainbow DQN may exhibit slightly higher total energy consumption values in certain scenarios (such as Scenarios 5 and 8), it is important to consider the tradeoff between energy consumption and job tardiness. In some cases, Rainbow DQN might prioritize minimizing tardiness over energy consumption, which could lead to marginally higher energy usage. However, this tradeoff is reasonable as it ensures timely job completion and prevents potential penalties associated with job delays.
Overall, the Rainbow DQN algorithm demonstrates its superiority in terms of total weighted tardiness across all scenarios, indicating its ability to optimize job scheduling effectively in LHDFJSP. While total energy consumption may be slightly higher in some cases, Rainbow DQN still provides competitive results and offers a promising solution for addressing the challenges of the LHDFJSP in terms of both job tardiness and energy consumption. This demonstrates its robustness and adaptability in handling complex scheduling problems. Rainbow DQN proves to be a reliable solution for the LHDFJSP, capable of optimizing both production timelines and energy consumption.

6.4.2. Optimization Process of Rainbow DQN

To demonstrate the adaptability of Rainbow DQN in different sizes of problem instances, we depicted the optimization process curves of Rainbow DQN when optimizing the total weighted tardiness and total energy consumption in Scenarios 4 and 6, respectively, with varying numbers of jobs. Figure 5 depicts the variations in total weighted tardiness during the training process of Rainbow DQN during 2000 rounds of iterative training. The red area represents the upper and lower bounds on the weighted tardiness of Rainbow DQN over multiple training sessions, while the indigo curve represents the average weighted tardiness per episode. From the changing curves, it can be observed that in the initial stages of training, there is a higher total weighted tardiness. This is because the agent randomly selects actions during the early exploration phase to collect transition experiences by interacting with the scheduling environment. As the priority experience replay buffer reaches its maximum capacity, the agent gradually learns to optimize its strategy, resulting in a gradual reduction and convergence of total weighted tardiness.
Similarly, Figure 6 represents the variations in total energy consumption during the training process of the Rainbow DQN algorithm in Scenarios 4 and 6. From the change curves, it can be observed that in the initial stages of training, there is higher total energy consumption. Afterwards, as the agent gradually learns to optimize its strategy, the total energy consumption decreases and converges over time.

6.4.3. Optimization Process of different DRL Methods

In this subsection, Scenario 4 is taken as an example to compare the training patterns of Rainbow DQN and D3QN when applied to the LHDFJSP. As displayed in Figure 7, while both algorithms exhibit a downward trend in their training curves, there are notable differences in their characteristics. The D3QN with PER training curve shows a gradual decrease over time, indicating a gradual improvement in performance. However, the curve also exhibits significant fluctuations, suggesting a certain degree of instability; the training curve of Rainbow DQN displayed a comparatively smoother trajectory. The curve exhibited a faster convergence and demonstrated a more consistent decrease in the objective metric, indicating improved performance over time. This instability during the former’s training process compared to the smoothness in the latter’s training curve suggests that Rainbow DQN was able to learn more stable and reliable strategies during training. Overall, comparing D3QN with PER and Rainbow DQN, it becomes evident that the latter exhibits greater robustness and reliability, leading to superior performance. The differences in their training curves highlight the advantages of Rainbow DQN in terms of stability and consistency.
In summary, the comparison between classic heuristic scheduling rules and Rainbow DQN reveals that the latter consistently outperforms the former in terms of both total weighted tardiness and total energy consumption across different scenarios. Classic heuristic scheduling rules, although widely used and established, often rely on heuristics or predefined strategies that may not adapt well to dynamic and complex scheduling problems like the LHDFJSP. In contrast, Rainbow DQN leverages DRL techniques to learn optimal scheduling policies from experience, allowing for it to adapt and optimize performance based on the specific problem at hand. Additionally, the composited scheduling rules, which combine multiple classic scheduling rules, provide some improvement over the heuristic rules. However, Rainbow DQN surpasses even the composited rules, achieving lower values. This demonstrates the ability of Rainbow DQN to effectively learn and optimize scheduling decisions, surpassing the performance achieved by manual combinations of classic rules. Furthermore, Rainbow DQN exhibits a more stable and reliable training curve, showcasing a smoother convergence towards optimal performance. In contrast, D3QN with PER exhibits higher volatility and fluctuations in its training curve, indicating potential challenges in achieving consistent and robust performance.
Therefore, Rainbow DQN is a superior algorithm in terms of performance. Its adaptability, robustness, and reliable performance in the LHDFJSP make it an effective approach for solving complex scheduling problems in real-world applications.

7. Conclusions

The characteristics of advanced technology, green concept, and new business model have brought serious challenges to the optimization and control in the field of job shop scheduling. In this paper, the mathematical model of the low-carbon heterogeneous distributed flexible job shop scheduling (LHDFJS) problem and a solution based on DRL were proposed to fill in some shortcomings in the field of scheduling.
Aiming at the DMoSP for LHDFJS, a multi-objective mathematical model was established with the objective of minimizing the total weighted tardiness and total energy consumption of the processing process. In this context, a set of composite scheduling rules were designed by combining job selecting rules and machine assignment rules. To select the optimal scheduling rule at each decision point, a Rainbow DQN framework was employed, which redesigns the state, action, and reward components to capture the production status and balance economic and environmental considerations. Comparative experiments were carried out on a customized dataset, demonstrating that, relative to composite scheduling rules, the Rainbow DQN achieved excellent performance in minimizing both total weighted tardiness and total energy consumption.
From an academic perspective, this paper helps fill research gaps in the field and promotes the development of job shop scheduling theory. By introducing real production factors such as low-carbon targets and heterogeneity, this study enriches the research content of scheduling problems, enhancing their practicality and applicability. Importantly, from a managerial standpoint, addressing this issue can optimize production processes, improve efficiency, and reduce energy consumption, thereby lowering production costs and enhancing the competitiveness of enterprises. Additionally, the various composite scheduling rules and DRL algorithms proposed during the research process provide enterprises with new management tools and methods.
Although this study has achieved certain results, there are still limitations. For example, in reinforcement learning, the manual implementation of state features, scheduling rules, and reward functions may be limiting when dealing with massive quantities of job shop data, constraining the effectiveness of the solutions. Therefore, exploring self-learning methods based on big data to automatically generate scheduling rules holds promise for further enhancing the quality and efficiency of solutions.
For future research, the main considerations involve scheduling problems and scheduling methods. Regarding scheduling problems, there still exists a certain gap between existing studies and practical application scenarios. For example, factors such as assembly processes, labor resources, and material supply can be considered. From the perspective of scheduling methods, DRL methods in the field of job shop scheduling are still in the early exploration stage. It is possible to explore the construction of multi-agent interactions with the environment and also to attempt integrating graph neural networks or knowledge graphs to obtain higher quality solutions.

Author Contributions

Conceptualization, Y.C. and X.L.; methodology, Y.C. and X.L.; software, Y.C. and Y.H.; formal analysis, X.L.; investigation, Y.C.; resources, Y.C.; writing—original draft preparation, Y.C.; writing—review and editing, Y.C., X.L., G.C. and Y.H.; supervision and funding acquisition, X.L. and G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by Chengdu Science and Technology Project (No. 2022-YF05-01058-SN), Sichuan Province Foreign and Overseas High-end Talent Introduction Program (No. 2022JDGD0013).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Enquiries about data availability should be directed to the first authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. International Energy Agency. Energy Efficiency 2021. 2021. Available online: https://iea.blob.core.windows.net/assets/9c30109f-38a7-4a0b-b159-47f00d65e5be/EnergyEfficiency2021.pdf (accessed on 21 April 2023).
  2. U.S. Energy Information Administration. Country Analysis Executive Summary: China. 2022. Available online: https://www.eia.gov/international/content/analysis/countries_long/China/china.pdf (accessed on 21 April 2023).
  3. U.S. Energy Information Administration. Annual Energy Review 2020. 2020. Available online: https://www.eia.gov/totalenergy/data/annual/index.php (accessed on 21 April 2023).
  4. Adibi, M.A.; Shahrabi, J. A clustering-based modified variable neighborhood search algorithm for a dynamic job shop scheduling problem. Int. J. Adv. Manuf. Technol. 2014, 70, 1955–1961. [Google Scholar] [CrossRef]
  5. Zhang, H.; Zhang, Y.Q. A discrete job-shop scheduling algorithm based on improved genetic algorithm. Int. J. Simul. Model. 2020, 19, 517–528. [Google Scholar] [CrossRef]
  6. Fang, Y.; Peng, C.; Lou, P.; Zhou, Z.; Hu, J.; Yan, J. Digital-twin-based job shop scheduling toward smart manufacturing. IEEE Trans. Ind. Inform. 2019, 15, 6425–6435. [Google Scholar] [CrossRef]
  7. De Giovanni, L.; Pezzella, F. An improved genetic algorithm for the distributed and flexible job-shop scheduling problem. Eur. J. Oper. Res. 2010, 200, 395–408. [Google Scholar] [CrossRef]
  8. Du, Y.; Li, J.Q.; Luo, C.; Meng, L.L. A hybrid estimation of distribution algorithm for distributed flexible job shop scheduling with crane transportations. Swarm Evol. Comput. 2021, 62, 100861. [Google Scholar] [CrossRef]
  9. Li, R.; Gong, W.; Wang, L.; Lu, C.; Zhuang, X. Surprisingly Popular-Based Adaptive Memetic Algorithm for Energy-Efficient Distributed Flexible Job Shop Scheduling. IEEE Trans. Cybern. 2023, 53, 8013–8023. [Google Scholar] [CrossRef]
  10. Zhao, F.; Zhang, L.; Cao, J.; Tang, J. A cooperative water wave optimization algorithm with reinforcement learning for the distributed assembly no-idle flowshop scheduling problem. Comput. Ind. Eng. 2021, 153, 107082. [Google Scholar] [CrossRef]
  11. Luo, Q.; Deng, Q.; Gong, G.; Zhang, L.; Han, W.; Li, K. An efficient memetic algorithm for distributed flexible job shop scheduling problem with transfers. Expert Syst. Appl. 2020, 160, 113721. [Google Scholar] [CrossRef]
  12. Meng, L.; Zhang, C.; Ren, Y.; Zhang, B.; Lv, C. Mixed-integer linear programming and constraint programming formulations for solving distributed flexible job shop scheduling problem. Comput. Ind. Eng. 2020, 142, 106347. [Google Scholar] [CrossRef]
  13. Sang, Y.; Tan, J. Intelligent factory many-objective distributed flexible job shop collaborative scheduling method. Comput. Ind. Eng. 2022, 164, 107884. [Google Scholar] [CrossRef]
  14. Zhang, Z.Q.; Wu, F.C.; Qian, B.; Hu, R.; Wang, L.; Jin, H.P. A Q-learning-based hyper-heuristic evolutionary algorithm for the distributed flexible job-shop scheduling problem with crane transportation. Expert Syst. Appl. 2023, 234, 121050. [Google Scholar] [CrossRef]
  15. Li, X.; Xie, J.; Ma, Q.; Gao, L.; Li, P. Improved gray wolf optimizer for distributed flexible job shop scheduling problem. Sci. China Technol. Sci. 2022, 65, 2105–2115. [Google Scholar] [CrossRef]
  16. Dai, M.; Tang, D.; Xu, Y.; Li, W. Energy-aware integrated process planning and scheduling for job shops. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2015, 229, 13–26. [Google Scholar] [CrossRef]
  17. Zhang, C.; Gu, P.; Jiang, P. Low-carbon scheduling and estimating for a flexible job shop based on carbon footprint and carbon efficiency of multi-job processing. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2015, 229, 328–342. [Google Scholar] [CrossRef]
  18. Jiang, T.; Deng, G. Optimizing the low-carbon flexible job shop scheduling problem considering energy consumption. IEEE Access 2018, 6, 46346–46355. [Google Scholar] [CrossRef]
  19. Yin, L.; Li, X.; Gao, L.; Lu, C.; Zhang, Z. A novel mathematical model and multi-objective method for the low-carbon flexible job shop scheduling problem. Sustain. Comput. Inform. Syst. 2017, 13, 15–30. [Google Scholar] [CrossRef]
  20. Li, Y.; Huang, W.; Wu, R.; Guo, K. An improved artificial bee colony algorithm for solving multi-objective low-carbon flexible job shop scheduling problem. Appl. Soft Comput. 2020, 95, 106544. [Google Scholar] [CrossRef]
  21. Zhu, K.; Gong, G.; Peng, N.; Zhang, L.; Huang, D.; Luo, Q.; Li, X. Dynamic distributed flexible job-shop scheduling problem considering operation inspection. Expert Syst. Appl. 2023, 224, 119840. [Google Scholar] [CrossRef]
  22. Shahgholi Zadeh, M.; Katebi, Y.; Doniavi, A. A heuristic model for dynamic flexible job shop scheduling problem considering variable processing times. Int. J. Prod. Res. 2019, 57, 3020–3035. [Google Scholar] [CrossRef]
  23. Li, K.; Deng, Q.; Zhang, L.; Fan, Q.; Gong, G.; Ding, S. An effective MCTS-based algorithm for minimizing makespan in dynamic flexible job shop scheduling problem. Comput. Ind. Eng. 2021, 155, 107211. [Google Scholar] [CrossRef]
  24. Ouelhadj, D.; Petrovic, S. A survey of dynamic scheduling in manufacturing systems. J. Sched. 2009, 12, 417–431. [Google Scholar] [CrossRef]
  25. Li, Y.; Gu, W.; Yuan, M.; Tang, Y. Real-time data-driven dynamic scheduling for flexible job shop with insufficient transportation resources using hybrid deep Q network. Robot. Comput.-Integr. Manuf. 2022, 74, 102283. [Google Scholar] [CrossRef]
  26. Yan, Q.; Wang, H.; Wu, F. Digital twin-enabled dynamic scheduling with preventive maintenance using a double-layer Q-learning algorithm. Comput. Oper. Res. 2022, 144, 105823. [Google Scholar] [CrossRef]
  27. Chang, J.; Yu, D.; Hu, Y.; He, W.; Yu, H. Deep Reinforcement Learning for Dynamic Flexible Job Shop Scheduling with Random Job Arrival. Processes 2022, 10, 760. [Google Scholar] [CrossRef]
  28. Yan, Q.; Wu, W.; Wang, H. Deep Reinforcement Learning for Distributed Flow Shop Scheduling with Flexible Maintenance. Machines 2022, 10, 210. [Google Scholar] [CrossRef]
  29. Zhang, M.; Lu, Y.; Hu, Y.; Amaitik, N.; Xu, Y. Dynamic Scheduling Method for Job-Shop Manufacturing Systems by Deep Reinforcement Learning with Proximal Policy Optimization. Sustainability 2022, 14, 5177. [Google Scholar] [CrossRef]
  30. Lee, Y.H.; Lee, S. Deep reinforcement learning based scheduling within production plan in semiconductor fabrication. Expert Syst. Appl. 2022, 191, 116222. [Google Scholar] [CrossRef]
  31. Waubert de Puiseau, C.; Meyes, R.; Meisen, T. On reliability of reinforcement learning based production scheduling systems: A comparative survey. J. Intell. Manuf. 2022, 33, 911–927. [Google Scholar] [CrossRef]
  32. Luo, S.; Zhang, L.; Fan, Y. Real-time scheduling for dynamic partial-no-wait multi-objective flexible job shop by deep reinforcement learning. IEEE Trans. Autom. Sci. Eng. 2021, 19, 3020–3038. [Google Scholar] [CrossRef]
  33. Luo, S.; Zhang, L.; Fan, Y. Dynamic multi-objective scheduling for flexible job shop by deep reinforcement learning. Comput. Ind. Eng. 2021, 159, 107489. [Google Scholar] [CrossRef]
  34. Liu, R.; Piplani, R.; Toro, C. Deep reinforcement learning for dynamic scheduling of a flexible job shop. Int. J. Prod. Res. 2022, 60, 4049–4069. [Google Scholar] [CrossRef]
  35. Watkins, C.J.C.H. Learning from Delayed Rewards. Ph.D. Thesis, King’s College, London, UK, 1989. [Google Scholar]
  36. Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM 1995, 38, 58–68. [Google Scholar] [CrossRef]
  37. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
  38. Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. No. 1. [Google Scholar]
  39. Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling network architectures for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1995–2003. [Google Scholar]
  40. Fortunato, M.; Azar, M.G.; Piot, B.; Menick, J.; Osband, I.; Graves, A.; Legg, S. Noisy networks for exploration. arXiv 2017, arXiv:1706.10295. [Google Scholar]
  41. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
  42. De Asis, K.; Hernandez-Garcia, J.; Holland, G.; Sutton, R. Multi-step reinforcement learning: A unifying algorithm. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LO, USA, 2–7 February 2018; Volume 32. No. 1. [Google Scholar]
  43. Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
  44. Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Silver, D. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LO, USA, 2–7 February 2018; Volume 32. No. 1. [Google Scholar]
  45. Väth, D.; Vu, N.T. To combine or not to combine? A rainbow deep reinforcement learning agent for dialog policies. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, Stockholm, Sweden, 11–13 September 2019; pp. 62–67. [Google Scholar]
  46. Brandimarte, P. Routing and scheduling in a flexible job shop by tabu search. Ann. Oper. Res. 1993, 41, 157–183. [Google Scholar] [CrossRef]
  47. Lei, K.; Guo, P.; Zhao, W.; Wang, Y.; Qian, L.; Meng, X.; Tang, L. A multi-action deep reinforcement learning framework for flexible Job-shop scheduling problem. Expert Syst. Appl. 2022, 205, 117796. [Google Scholar] [CrossRef]
Figure 1. LHDFJS framework.
Figure 1. LHDFJS framework.
Sensors 24 02251 g001
Figure 2. The architecture of Rainbow DQN in LDFJS.
Figure 2. The architecture of Rainbow DQN in LDFJS.
Sensors 24 02251 g002
Figure 3. The network structure of Rainbow DQN.
Figure 3. The network structure of Rainbow DQN.
Sensors 24 02251 g003
Figure 4. Verification results of hyperparameters: (a) learning rate; (b) buffer capacity; (c) batch size.
Figure 4. Verification results of hyperparameters: (a) learning rate; (b) buffer capacity; (c) batch size.
Sensors 24 02251 g004
Figure 5. The total weighted tardiness curve of Rainbow DQN in two scenarios: (a) Scenario 4; (b) Scenario 6.
Figure 5. The total weighted tardiness curve of Rainbow DQN in two scenarios: (a) Scenario 4; (b) Scenario 6.
Sensors 24 02251 g005
Figure 6. The energy consumption curve of Rainbow DQN in two scenarios: (a) Scenario 4; (b) Scenario 6.
Figure 6. The energy consumption curve of Rainbow DQN in two scenarios: (a) Scenario 4; (b) Scenario 6.
Sensors 24 02251 g006
Figure 7. Rainbow DQN versus D3QN with PER in Scenario 4: (a) total energy consumption curve; (b) total weighted tardiness curve.
Figure 7. Rainbow DQN versus D3QN with PER in Scenario 4: (a) total energy consumption curve; (b) total weighted tardiness curve.
Sensors 24 02251 g007
Table 1. Summary of relevant studies.
Table 1. Summary of relevant studies.
ReferencesType of ProblemLow-CarbonHeterogeneityDynamicsTransferObjectiveMethod
[11]Flexible job shop1S 2DQN
[32]Flexible job shopMImproved PPO
[33]Flexible job shopMHierarchical DQN
[10]Distributed assembly No-idle flow-shopSQ-learning and metaheuristic algorithm
[8]Distributed flexible job shopMMetaheuristic algorithm
[15]Distributed flexible job shopSMetaheuristic algorithm
[13]Distributed flexible job shopMMetaheuristic algorithm
[9]Flexible job shopSMonte Carlo tree search
[26]Flexible job shopSImproved Q-learning
[27]Flexible job shopSDouble DQN
[34]Flexible job shopSDouble DQN
[28]Distributed flexible permutation flow shopSImproved DQN
[29]Job shopSProximal policy optimization
[25]Flexible job shopMHybrid DQN
[30]Semiconductor fabricationMImproved DQN
[21]Distributed flexible job shopMMetaheuristic algorithm
[23]Distributed flexible job shopMMetaheuristic algorithm
[14]Distributed flexible Job shopMQ-learning and metaheuristic algorithm
OursDistributed heterogeneous flexible job ShopMRainbow DQN
1 / denotes whether the literature includes this research direction. 2 S/M indicates whether the problem discussed is single-objective (S) or multi-objective (M).
Table 2. Exemplified processing time and energy consumption.
Table 2. Exemplified processing time and energy consumption.
Workshop 1Workshop 2
m 1 m 2 m 3 m 1 m 3 m 4
J 1 O 11 3.13-5.93.135.9-
O 12 -2.141.17-1.173.9
J 2 O 21 10.79.1012.910.712.9-
O 22 7.1411.8-7.14-6.10
O 23 6.11-4.126.114.127.9
Table 3. Exemplified transfer time between workshops and machines.
Table 3. Exemplified transfer time between workshops and machines.
Workshop 1Workshop 2
m 1 m 2 m 3 m 1 m 3 m 4
Workshop 1 m 1 02015150150150
m 2 20025150150150
m 3 15250150150150
Workshop 2 m 1 15015015003327
m 3 15015015033019
m 4 15015015027190
Table 4. The notation of mathematical model.
Table 4. The notation of mathematical model.
ParameterDescriptionParameterDescription
i ,   g Index of jobs S i , j The start time of operation O i j
j ,   h Index of operations C i , j The completion time of operation O i j
f ,   u Index of job shops T r a n s F f u The transport time of job from job shop f to job shop u
k ,   l Index of machines T r a n s M l k The transport time of job from machine l to machine k
N The total number of jobs p r o c E The processing energy consumption of all machines
n i The total number of operations for job J i i d l e E The idle energy consumption of all machines
O i j The j th operation of job J i t r a n s E The transport energy consumption of all transportation missions
F The total number of job shop T E Total energy consumption
M f Total number of machines in job shop f p e f k Unit processing energy consumption of machine k in job shop   f
F i , j The set of job shops that can process operation O i j i e f k Unit idle energy consumption of machine k in job shop f
M i j f The set of machines in job shop f that can process operation O i j t e Unit transport energy consumption between job shops/machines
W i The weight coefficient of job J i x i j f k 0–1 decision variable: if O i j is processed on machine k in job shop f , the value is 1; otherwise, it is 0.
p t i j f k The processing time of operation O i j on machine k in job shop f y i j , g h 0–1 decision variable: if the rear operation of O i j is O g h , the value is 1; otherwise, it is 0.
A i The arrival time of job J i a i l k 0–1 decision variable: if job J i is transported from machine l to machine k in same job shop, the value is 1; otherwise, it is 0.
D i The delivery date of job J i b i f u 0–1 decision variable: if job J i is transported from job shop f to job shop u , the value is 1; otherwise, it is 0.
Table 5. The parameters and formulas of state space.
Table 5. The parameters and formulas of state space.
ParameterDescription and FormulasValue Range
t Rescheduling point (decision point): the scheduling environment changes to a new state after scheduling each operation, namely, rescheduling point t in the DRL agent.[0, i = 0 N n i ]
N P O i t At rescheduling point t , the number of completed operations for job J i .[0, n i ]
C T M k f t At rescheduling point t , the completion time of the last operation processed on the machine M k .[0, C T i ]
T c u r At rescheduling point t , the mean completion time of the last operation assigned to each machine in each job shop. The formula is shown below:[0, C T i ]
T c u r = f = 1 F k = 1 M f C T M k f t f = 1 F M f (21)
C T k f t At rescheduling point t , the completion time of the last operation on the machine M k in job shop f .[0, C T i ]
p t i j ¯ The average processing time of operation O i j on all available machines in all job shops. [0, f = 1 F k 1 M f p t i j f k ¯ ]
t t i , j ¯ ( j > 1 ) The average transport time from operation O i ,   j 1 to O i j . max t t i , j represents the maximum transit time in the operation O i j . The formula is shown below:[0, max t t i , j ]
t t i , j ¯ = 1 F i , j u F i , j T r a n s F f u (22)
T T i t ¯ At rescheduling point t , remaining estimated transport time for job J i . The formula is shown below:[0, ( n i 1 ) · T r a n s F f u · t e ]
T T i t ¯ = t t i , N P O i t + 1 ¯ + j = N P O i t + 2 n i 1 F i , j · F i , j + 1 f F i , j u F i , j + 1 T r a n s F f u (23)
T i t ¯ At rescheduling point t , the estimated time required to process the remaining part of job J i (remaining process time of job J i + remaining transport time of job J i ). The formula is shown below:- 1
T i t ¯ = j = N P O i t + 1 n i p t i j ¯ + T T i t ¯ (24)
E D T i t At rescheduling point t , the expected delayed time (EDT) of job J i . The larger the EDT value is, the more serious the delay will be. The formula is shown below:-
E D T i t = m a x T c u r , C i , N P O i t + T i t ¯ D i (25)
C R J i t At rescheduling point t , the completion rate of job J i . The formula is shown below:[0, 1]
C R J i t = N P O i t n i (26)
U R k f t At rescheduling point t , utilization rate of machine M k f in job shop f . The formula is shown below:[0, 1]
U R k f t = i = 1 n j = 1 N P O i t p t i j f k x i j f k C T k f t (27)
T E i , j t At rescheduling point t , the actual energy consumption required to complete operation O i j .[ T E i , j m i n t , T E i , j m a x t ]
T E i , j m i n t At rescheduling point t , the minimum energy consumption required to complete operation O i j . The formula is shown below:-
T E i , j m i n t = m i n u F i , j , l M i , j u t r a n s E i , j t + p r o c E i , j t + i d l e E i , j t (28)
t r a n s E i , j t = T r a n s F f u b i f u + T r a n s M k l a i k l · t e (29)
p r o c E i , j t = p t i j u l · p e u l (30)
i d l e E i , j t = m a x C i , j 1 + T r a n s F f u b i f u + T r a n s M k l a i k l , C T l u t C T l u t · i e u l (31)
T E i , j m a x t At rescheduling point t , the maximum energy consumption required to complete operation O i j . The formula is shown below:-
T E i , j m a x t = m a x u F i , j , l M i , j u t r a n s E i , j t + p r o c E i , j t + i d l e E i , j t (32)
1 “-” indicates that this parameter needs to be calculated based on the overall scheduling situation.
Table 6. The size of problem instance.
Table 6. The size of problem instance.
ScenarioNumber of Job ShopsNumber of Initial JobsNumber of Dynamically Inserted JobsNumber of Machines
1310510
2312810
33141010
44151010
54201510
64302010
75302020
85352520
Table 7. The general parameters for different scenarios.
Table 7. The general parameters for different scenarios.
ItemValue
The number of operations[1, 5]
Transport time between job shops[8, 11]
Transport time between machines[1, 4]
Unit processing energy consumption (UPEX)[10, 20]
Unit idle energy consumption U P E X · 1 10 , 1 6
Unit transport energy consumption2
Due date tightness (DDT)[0.5, 1.5]
Weight of job[1, 5]
Table 8. The hyperparameters.
Table 8. The hyperparameters.
HyperparametersValues
Number of episodes2000
Batch size256
Learning rate1 × 10−4
The update frequency of target Q-network200
Buffer capacity1 × 105
Alpha in PER0.6
Beta in PER0.4
Table 9. The classical scheduling rules.
Table 9. The classical scheduling rules.
RuleDescription
Job Assignment Scheduling Rules
First in first out (FIFO)The earlier the job arrives, the higher the processing priority. When the same operation is performed with the same machine, the first job to arrive is processed first.
Most operation number remaining (MOPNR)The more operations remaining, the higher the processing priority. When processing the same operation with the same machine, the arriving jobs are processed first with the most remaining operations.
Least work remaining (LWKR)When processing different jobs, the job with the shortest remaining average processing time is preferred.
Most work remaining (MWKR)When processing different jobs, the job with the longest remaining average processing time is preferred.
Machine Assignment Scheduling Rules
Shortest processing time (SPT)If more than one machine can process O i j , the machine with the shortest processing time is selected.
Earliest end time (EET)If more than one machine can process O i j , the machine with the earliest end time of the previous process is selected.
Table 10. Comparison results of classical scheduling rules and composite scheduling rules.
Table 10. Comparison results of classical scheduling rules and composite scheduling rules.
FIFO + SPTMOPNR+SPTLWKR+SPTMWKR+SPTComposite Scheduling RulesSolution Distance (%)
TT 1TE 2TTTETTTETTTETTTETTTE
Scenario 147142261193548054941051277512719399300
Scenario 21141594640649309865622837409782164620304.32
Scenario 314637622162879221533752216008247186804506.95
Scenario 417928357394113,46842428001483611,829745805400.66
Scenario 533539921218610,169260810,279317011,08894210,80108.87
Scenario 6484013,391690116,760739112,554787616,930113613,56008.01
Scenario 7284812,368553917,424491513,606610317,89352114,149014.4
Scenario 8952218,303395319,596723420,961587220,228489621,85023.8519.37
1 TT represents the total weighted tardiness time of an algorithm in the table. 2 TE represents the total energy consumption of an algorithm in the table.
Table 11. Comparison results of classical scheduling rules, composite scheduling rules, D3QN with PER, and Rainbow DQN.
Table 11. Comparison results of classical scheduling rules, composite scheduling rules, D3QN with PER, and Rainbow DQN.
Classical Scheduling RuleComposite Scheduling RuleD3QN with PERRainbow DQNSolution Distance (%)
TTTETTTETTTETTTETTTE
Scenario 1471422619399358762416410602.82
Scenario 286562281646203179784345591700
Scenario 31463762218680456309521101762000
Scenario 417928357745805471210,895391796600
Scenario 5218610,16994210,80198618,08044410,96107.78
Scenario 6484013,391113613,560110413,96250311,57500
Scenario 7284812,36852114,149174815,83321212,42800.48
Scenario 8395319,596489621,850350825,204217922,191013.24
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Liao, X.; Chen, G.; Hou, Y. Dynamic Intelligent Scheduling in Low-Carbon Heterogeneous Distributed Flexible Job Shops with Job Insertions and Transfers. Sensors 2024, 24, 2251. https://doi.org/10.3390/s24072251

AMA Style

Chen Y, Liao X, Chen G, Hou Y. Dynamic Intelligent Scheduling in Low-Carbon Heterogeneous Distributed Flexible Job Shops with Job Insertions and Transfers. Sensors. 2024; 24(7):2251. https://doi.org/10.3390/s24072251

Chicago/Turabian Style

Chen, Yi, Xiaojuan Liao, Guangzhu Chen, and Yingjie Hou. 2024. "Dynamic Intelligent Scheduling in Low-Carbon Heterogeneous Distributed Flexible Job Shops with Job Insertions and Transfers" Sensors 24, no. 7: 2251. https://doi.org/10.3390/s24072251

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop