Deep reinforcement learning for machine scheduling: Methodology, the state-of-the-art, and future directions




Khadivi, Maziyar; Charter, Todd; Yaghoubi, Marjan; Jalayer, Masoud; Ahang, Maryam; Shojaeinasab, Ardeshir; Najjaran, Homayoun

PublisherElsevier Ltd

2025

Computers and Industrial Engineering

Computers & Industrial Engineering

110856

200

0360-8352

1879-0550

DOIhttps://doi.org/10.1016/j.cie.2025.110856

https://doi.org/10.1016/j.cie.2025.110856

https://arxiv.org/abs/2310.03195



Machine scheduling aims to optimally assign jobs to a single or a group of machines while meeting manufacturing rules as well as job specifications. Optimizing the machine schedules leads to significant reduction in operational costs, adherence to customer demand, and rise in production efficiency. Despite its benefits for the industry, machine scheduling remains a challenging combinatorial optimization problem to be solved, inherently due to its Non-deterministic Polynomial-time (NP) hard nature. Deep Reinforcement Learning (DRL) has been regarded as a foundation for “artificial general intelligence” with promising results in tasks such as gaming and robotics. Researchers have also aimed to leverage the application of DRL, attributed to extraction of knowledge from data, across variety of machine scheduling problems since 1995. This paper presents a comprehensive review and comparison of the methodology, application, and the advantages and limitations of different DRL-based approaches. Further, the study categorizes the DRL methods based on the integrated computational components including conventional neural networks, encoder–decoder architectures, graph neural networks and metaheuristic algorithms. Our literature review concludes that the DRL-based approaches surpass the performance of exact solvers, heuristics, and tabular reinforcement learning algorithms in either computation speed, generating near-global optimal solutions, or both. They have been applied to static or dynamic scheduling of different machine environments, which consist of single machine, parallel machine, flow shop, job shop, and open shop, with different job characteristics. Nonetheless, the existing DRL-based schedulers face limitations not only in considering complex operational constraints, and configurable multi-objective optimization but also in dealing with generalization, scalability, intepretability, and robustness. Therefore, addressing these challenges shapes future work in this field. This paper serves the researchers to establish a proper investigation of state of the art and research gaps in DRL-based machine scheduling and can help the experts and practitioners choose the proper approach to implement DRL for production scheduling.



Natural Sciences and Engineering Research Council (NSERC) Canada under the Alliance Grant ALLRP 555220 – 20


Last updated on 2025-26-06 at 13:45