Automated detection of algorithm debt in deep learning frameworks: an empirical study - UTU Tutkimustietojärjestelmä

A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

Automated detection of algorithm debt in deep learning frameworks: an empirical study

Tekijät: Simon, Emmanuel Iko-Ojo; Hettiarachchi, Chirath; Potanin, Alex; Suominen, Hanna; Fard, Fatemeh

Kustantaja: Springer Nature

Julkaisuvuosi: 2026

Lehti: Empirical Software Engineering

Artikkelin numero: 66

Vuosikerta: 31

Numero: 3

ISSN: 1382-3256

eISSN: 1573-7616

DOI: https://doi.org/10.1007/s10664-026-10807-5

Julkaisun avoimuus kirjaamishetkellä: Avoimesti saatavilla

Julkaisukanavan avoimuus : Osittain avoin julkaisukanava

Verkko-osoite: https://doi.org/10.1007/s10664-026-10807-5

Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/515814945

Rinnakkaistallenteen lisenssi: CC BY

Rinnakkaistallennetun julkaisun versio: Kustantajan versio

Tiivistelmä

Expedient design choices in software development can lead to Technical Debt (TD), with development teams documenting such decisions as Self-Admitted TD (SATD). Algorithm Debt (AD) is a type of TD resulting from the suboptimal implementation of algorithms, which impacts system performance. Given the impact of AD, its automated detection is crucial in Deep Learning (DL) frameworks due to their complexity and evolution. Early detection of AD in DL frameworks can help mitigate model degradation and scalability issues. Despite previous studies on the automated detection of TD from SATD using Machine Learning (ML)/DL models, research on AD detection in DL frameworks remains underexplored. In this study, we empirically investigated the performance of ML/DL models for the automated detection of AD using a dataset of 38, 881 SATD comments from seven DL frameworks. We trained, evaluated, and tested ML/DL models, used embeddings from both DL and large language models, and explored an approach to enrich the dataset with handcrafted features based on AD-related keywords. Our findings reveal that AD is frequently misclassified as Design or Implementation Debt. Logistic Regression (an ML model) with Custom AD Features, achieved an F1-score of 54% for AD, outperforming other ML/DL models (42% to 52%), highlighting the importance of tailored feature engineering. Our research advances automated AD detection in DL frameworks by providing insights into the strengths and limitations of ML/DL models, serving as a first step to guide future tool development. This could help developers using DL frameworks to identify AD issues during development, thereby enhancing system reliability by mitigating model degradation and scalability challenges.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

s10664-026-10807-5.pdf

Julkaisussa olevat rahoitustiedot:
Open Access funding enabled and organized by CAUL and its Member Institutions. This work is supported by the Australian National University (ANU) through the ANU PhD scholarship within the ANU Research School of Computing.