A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä
dMMR prediction from colorectal cancer histopathology: Leveraging non-tumor and low-magnification regions
Tekijät: Petäinen, Liisa; Väyrynen, Juha P.; Böhm, Jan; Ruusuvuori, Pekka; Ahtiainen, Maarit; Elomaa, Hanna; Karjalainen, Henna; Kastinen, Meeri; Tapiainen, Vilja V.; Äijälä, Ville K.; Sirniö, Päivi; Tuomisto, Anne; Mäkinen, Markus J.; Mecklin, Jukka-Pekka; Pölönen, Ilkka; Äyrämö, Sami
Julkaisuvuosi: 2026
Lehti: Computer Methods and Programs in Biomedicine
Artikkelin numero: 109317
Vuosikerta: 280
ISSN: 0169-2607
eISSN: 1872-7565
DOI: https://doi.org/10.1016/j.cmpb.2026.109317
Julkaisun avoimuus kirjaamishetkellä: Avoimesti saatavilla
Julkaisukanavan avoimuus : Osittain avoin julkaisukanava
Verkko-osoite: https://doi.org/10.1016/j.cmpb.2026.109317
Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/522873195
Rinnakkaistallenteen lisenssi: CC BY
Rinnakkaistallennetun julkaisun versio: Kustantajan versio
Background and Objective
Colorectal cancer is the second leading cause of cancer-related mortality worldwide, posing a substantial burden on healthcare systems. Identifying DNA mismatch repair deficiency (dMMR) is critical for guiding treatment, yet conventional methods rely on labor-intensive DNA analysis. While deep-learning approaches have shown promise for predicting dMMR from histopathological images, most studies focus exclusively on tumor regions and single-scale representations. This study systematically evaluates the predictive value of tumor and non-tumor regions across multiple magnifications for dMMR prediction from whole-slide images (WSIs).
MethodsA total of 24 different modeling approaches were evaluated, varying by tissue origin (tumor vs. non-tumor), magnification level (5x and 20x), and tile embedding strategy, including digital pathology foundation models. Tile embeddings were further trained with 1228 WSIs using multiple-instance learning (MIL) based approach. The best-performing configurations were selected for external evaluation. External testing was carried out on two independent cohorts consisting of 1010 and 457 WSIs, respectively.
ResultsNon-tumorous regions demonstrated measurable predictive value, although performance remained lower than that obtained from tumor regions (F1 = 0.896, precision = 0.888, sensitivity = 0.594, specificity = 0.982). Among the nine models selected during internal validation, the top three models—one multi-scale approach and two models trained on 20x tumor regions—achieved F1 scores of 0.870–0.889 with precision of 0.885–0.920, sensitivity of 0.852, and specificity of 0.889–0.926. On external validation, the top three models, all based on foundation-model tile embeddings, achieved F1 scores of 0.916–0.919 on the first cohort and 0.928–0.934 on the second cohort. Across cohorts, specificity remained consistently high (0.964–0.992), while sensitivity ranged from 0.500 to 0.682.
ConclusionThis study demonstrates that dMMR status in colorectal cancer can be effectively predicted from histopathological WSIs using MIL-based models, with moderate generalizability across independent cohorts. In addition to confirming the predictive value of tumor regions, the results reveal that non-tumorous tissue also contains detectable predictive signals, suggesting that microenvironmental features may contribute to dMMR-associated histological patterns. Furthermore, the use of foundation model–derived embeddings improved generalizability across datasets. Future work should explore integrating non-tumor tissue features and clinical data to further improve predictive performance.
Ladattava julkaisu This is an electronic reprint of the original article. |
Julkaisussa olevat rahoitustiedot:
This study is one part of the Central Finland AI hub II project that has received funding from the Regional Council of Central Finland (https://www.keskisuomi.fi/) and the European Regional Development Fund (ERDF) (https://ec.europa.eu/regional policy/funding/erdf en). The data (CRC samples, WSIs and MMR analysis) collected in this work was supported by Jane and Aatos Erkko Foundation (https://jaes.fi/en/frontpage/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.