Comparing Deterministic and Stochastic Reinforcement Learning for Glucose Regulation in Type 1 Diabetes - UTU Tutkimustietojärjestelmä

A4 Vertaisarvioitu artikkeli konferenssijulkaisussa

Comparing Deterministic and Stochastic Reinforcement Learning for Glucose Regulation in Type 1 Diabetes

Tekijät: Timms, David; Hettiarachchi, Chirath; Suominen, Hanna

Toimittaja: Househ, Mowafa S.; Tariq, Zain Ul Abideen; Al-Zubaidi, Mahmood; Shah, Uzair; Huesing, Elaine

Konferenssin vakiintunut nimi: World Congress on Medical and Health Informatics

Kustantaja: IOS Press

Julkaisuvuosi: 2025

Lehti: Studies in Health Technology and Informatics

Kokoomateoksen nimi: MEDINFO 2025 — Healthcare Smart × Medicine Deep: Proceedings of the 20th World Congress on Medical and Health Informatics

Tietokannassa oleva lehden nimi: Studies in health technology and informatics

Vuosikerta: 329

Aloitussivu: 1039

Lopetussivu: 1043

eISBN: 978-1-64368-608-0

ISSN: 0926-9630

eISSN: 1879-8365

DOI: https://doi.org/10.3233/SHTI250997

Julkaisun avoimuus kirjaamishetkellä: Avoimesti saatavilla

Julkaisukanavan avoimuus : Kokonaan avoin julkaisukanava

Verkko-osoite: https://doi.org/10.3233/shti250997

Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/499745855

Tiivistelmä

Type 1 Diabetes (T1D) is a chronic condition affecting millions worldwide, requiring external insulin administration to regulate blood glucose levels and prevent serious complications. Artificial Pancreas Systems (APS) for managing T1D currently rely on manual input, which adds a cognitive burden on people with T1D and their carers. Research into alleviating this burden through Reinforcement Learning (RL) explores enabling the APS to autonomously learn and adapt to the complex dynamics of blood glucose regulation, demonstrating improvements in in-silico evaluations compared to traditional clinical approaches. This evaluation study compared the primary polarities of RL for glucose regulation, namely, stochastic (e.g., Proximal Policy Optimization (PPO) and deterministic (e.g., Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms in-silico using quantitative and qualitative methods, patient specific clinical metrics, and the adult and adolescent cohorts of the U.S. Food and Drug Administration approved UVA/PADOVA 2008 model. Although the behavior of TD3 was easier to interpret, it did not typically outperform PPO, thereby challenging assessing their safety and suitability. This conclusion highlights the importance of improving RL algorithms in APS applications for both interpretability and predictive performance in future research.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

SHTI-329-SHTI250997.pdf

Julkaisussa olevat rahoitustiedot:
We gratefully acknowledge funding from the MRFF 2022 National Critical Research Infrastructure (MRFCRI000138, Developing a new digital therapeutic or depression: Closed loop non-invasive brain stimulation). This work was supported by computational resources provided by the Australian Government through the National Computational Infrastructure under the ANU Merit Allocation Scheme (ny83 and eu59) and ANU Startup Scheme (sj53).