A1 Refereed original research article in a scientific journal
G2P2C : A modular reinforcement learning algorithm for glucose control by glucose prediction and planning in Type 1 Diabetes
Authors: Hettiarachchi Chirath, Malagutti Nicolo, Nolan Christopher J., Suominen Hanna, Daskalaki Elena
Publisher: Elsevier Ltd
Publication year: 2024
Journal: Biomedical Signal Processing and Control
Journal name in source: Biomedical Signal Processing and Control
Article number: 105839
Volume: 90
eISSN: 1746-8108
DOI: https://doi.org/10.1016/j.bspc.2023.105839
Web address : https://doi.org/10.1016/j.bspc.2023.105839
Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/387167812
Developing diagnostic and treatment solutions for medical applications is often challenging due to the complex dynamics, partial observability, high inter- and intra-population variability, and the presence of unknown delays and disturbances. A characteristic case is the control of glucose concentration in people with Type 1 Diabetes (T1D) through the administration of exogenous insulin. The above complexities, enhanced by the significant cognitive burden associated with the estimation of optimal insulin dosing related to daily activities such as food intake and exercise, call for advanced insulin administration solutions towards a fully automated Artificial Pancreas System (APS). Reinforcement Learning (RL) is currently being explored in the development of APSs thanks to its demonstrated potential in problems characterized by complex dynamics and uncertainties. Despite the progress, RL algorithms in T1D still require manual estimation and announcement of meal carbohydrate (CHO) content or rely on small meal scenarios. In this study, we proposed G2P2C, a modular deep RL algorithm, which aims to fully automate glucose control in T1D, eliminating the need for CHO estimation and announcement. G2P2C was designed based on the state-of-the-art Proximal Policy Optimization (PPO) algorithm, augmented by two novel optimization phases: (i) model learning and (ii) planning. The former integrated an auxiliary learning task to learn a glucose dynamics model. The latter fine-tuned the learned control strategy to a short-time horizon by simulating glucose trajectories into the future. We evaluated the performance of G2P2C in-silico on a challenging meal protocol (180 g of CHO per day) for 20 subjects (10 adults and 10 adolescents) using an open-source version of a T1D simulator approved by the United States Food and Drug Administration (FDA). G2P2C was compared with state-of-the-art RL algorithms and two basal-bolus (BB) clinical treatment strategies, which involve manual meal announcement and CHO estimation with automated correction insulin boli for elevated glucose. G2P2C obtained statistically significant (P<0.05) reward improvements compared to PPO in 18 out of 20 subjects, while maintaining a lower failure rate. In addition, G2P2C achieved a time in range of 73% and 64% for the adult and adolescent cohorts, respectively, outperforming BB strategies in the adult cohort although no meal announcement was performed. The control performance and algorithmic characteristics of G2P2C show promise as a candidate algorithm for glucose control in APSs. We released the codebase of G2P2C (https://github.com/chirathyh/G2P2C) and an online demonstration tool (https://capsml.com/), where users can perform custom simulations to compare G2P2C with BB strategies, under the MIT license.
Downloadable publication This is an electronic reprint of the original article. |
Funding information in the publication:
This research was funded by and has been delivered in partnership with The Australian National University (ANU), School of Computing and the Our Health in Our Hands (OHIOH) grand challenge, a strategic initiative of the ANU, which aims to transform healthcare by developing new personalized health technologies and solutions in collaboration with patients, clinicians, and healthcare providers. This work was also supported by computational resources provided by the Australian Government through the National Computational Infrastructure (NCI Australia) under the ANU Merit Allocation Scheme. The authors wish to thank Dr David O’Neal, Dr Barbora Paldus, and Dr Dale Morrison from the Diabetes Technology Research Group, St Vincent’s Hospital for their valuable insights towards this research.