dMMR prediction from colorectal cancer histopathology: Leveraging non-tumor and low-magnification regions - UTU Tutkimustietojärjestelmä

A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

dMMR prediction from colorectal cancer histopathology: Leveraging non-tumor and low-magnification regions

Tekijät: Petäinen, Liisa; Väyrynen, Juha P.; Böhm, Jan; Ruusuvuori, Pekka; Ahtiainen, Maarit; Elomaa, Hanna; Karjalainen, Henna; Kastinen, Meeri; Tapiainen, Vilja V.; Äijälä, Ville K.; Sirniö, Päivi; Tuomisto, Anne; Mäkinen, Markus J.; Mecklin, Jukka-Pekka; Pölönen, Ilkka; Äyrämö, Sami

Julkaisuvuosi: 2026

Lehti: Computer Methods and Programs in Biomedicine

Artikkelin numero: 109317

Vuosikerta: 280

ISSN: 0169-2607

eISSN: 1872-7565

DOI: https://doi.org/10.1016/j.cmpb.2026.109317

Julkaisun avoimuus kirjaamishetkellä: Avoimesti saatavilla

Julkaisukanavan avoimuus : Osittain avoin julkaisukanava

Verkko-osoite: https://doi.org/10.1016/j.cmpb.2026.109317

Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/522873195

Rinnakkaistallenteen lisenssi: CC BY

Rinnakkaistallennetun julkaisun versio: Kustantajan versio

Tiivistelmä

Background and Objective

Colorectal cancer is the second leading cause of cancer-related mortality worldwide, posing a substantial burden on healthcare systems. Identifying DNA mismatch repair deficiency (dMMR) is critical for guiding treatment, yet conventional methods rely on labor-intensive DNA analysis. While deep-learning approaches have shown promise for predicting dMMR from histopathological images, most studies focus exclusively on tumor regions and single-scale representations. This study systematically evaluates the predictive value of tumor and non-tumor regions across multiple magnifications for dMMR prediction from whole-slide images (WSIs).

Methods

A total of 24 different modeling approaches were evaluated, varying by tissue origin (tumor vs. non-tumor), magnification level (5x and 20x), and tile embedding strategy, including digital pathology foundation models. Tile embeddings were further trained with 1228 WSIs using multiple-instance learning (MIL) based approach. The best-performing configurations were selected for external evaluation. External testing was carried out on two independent cohorts consisting of 1010 and 457 WSIs, respectively.

Results

Non-tumorous regions demonstrated measurable predictive value, although performance remained lower than that obtained from tumor regions (F1 = 0.896, precision = 0.888, sensitivity = 0.594, specificity = 0.982). Among the nine models selected during internal validation, the top three models—one multi-scale approach and two models trained on 20x tumor regions—achieved F1 scores of 0.870–0.889 with precision of 0.885–0.920, sensitivity of 0.852, and specificity of 0.889–0.926. On external validation, the top three models, all based on foundation-model tile embeddings, achieved F1 scores of 0.916–0.919 on the first cohort and 0.928–0.934 on the second cohort. Across cohorts, specificity remained consistently high (0.964–0.992), while sensitivity ranged from 0.500 to 0.682.

Conclusion

This study demonstrates that dMMR status in colorectal cancer can be effectively predicted from histopathological WSIs using MIL-based models, with moderate generalizability across independent cohorts. In addition to confirming the predictive value of tumor regions, the results reveal that non-tumorous tissue also contains detectable predictive signals, suggesting that microenvironmental features may contribute to dMMR-associated histological patterns. Furthermore, the use of foundation model–derived embeddings improved generalizability across datasets. Future work should explore integrating non-tumor tissue features and clinical data to further improve predictive performance.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

1-s2.0-S0169260726000854-main.pdf

Julkaisussa olevat rahoitustiedot:
This study is one part of the Central Finland AI hub II project that has received funding from the Regional Council of Central Finland (https://www.keskisuomi.fi/) and the European Regional Development Fund (ERDF) (https://ec.europa.eu/regional policy/funding/erdf en). The data (CRC samples, WSIs and MMR analysis) collected in this work was supported by Jane and Aatos Erkko Foundation (https://jaes.fi/en/frontpage/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.