A1 Refereed original research article in a scientific journal

dMMR prediction from colorectal cancer histopathology: Leveraging non-tumor and low-magnification regions




AuthorsPetäinen, Liisa; Väyrynen, Juha P.; Böhm, Jan; Ruusuvuori, Pekka; Ahtiainen, Maarit; Elomaa, Hanna; Karjalainen, Henna; Kastinen, Meeri; Tapiainen, Vilja V.; Äijälä, Ville K.; Sirniö, Päivi; Tuomisto, Anne; Mäkinen, Markus J.; Mecklin, Jukka-Pekka; Pölönen, Ilkka; Äyrämö, Sami

Publication year2026

Journal: Computer Methods and Programs in Biomedicine

Article number109317

Volume280

ISSN0169-2607

eISSN1872-7565

DOIhttps://doi.org/10.1016/j.cmpb.2026.109317

Publication's open availability at the time of reportingOpen Access

Publication channel's open availability Partially Open Access publication channel

Web address https://doi.org/10.1016/j.cmpb.2026.109317

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/522873195

Self-archived copy's licenceCC BY

Self-archived copy's versionPublisher`s PDF


Abstract
Background and Objective

Colorectal cancer is the second leading cause of cancer-related mortality worldwide, posing a substantial burden on healthcare systems. Identifying DNA mismatch repair deficiency (dMMR) is critical for guiding treatment, yet conventional methods rely on labor-intensive DNA analysis. While deep-learning approaches have shown promise for predicting dMMR from histopathological images, most studies focus exclusively on tumor regions and single-scale representations. This study systematically evaluates the predictive value of tumor and non-tumor regions across multiple magnifications for dMMR prediction from whole-slide images (WSIs).

Methods

A total of 24 different modeling approaches were evaluated, varying by tissue origin (tumor vs. non-tumor), magnification level (5x and 20x), and tile embedding strategy, including digital pathology foundation models. Tile embeddings were further trained with 1228 WSIs using multiple-instance learning (MIL) based approach. The best-performing configurations were selected for external evaluation. External testing was carried out on two independent cohorts consisting of 1010 and 457 WSIs, respectively.

Results

Non-tumorous regions demonstrated measurable predictive value, although performance remained lower than that obtained from tumor regions (F1 = 0.896, precision = 0.888, sensitivity = 0.594, specificity = 0.982). Among the nine models selected during internal validation, the top three models—one multi-scale approach and two models trained on 20x tumor regions—achieved F1 scores of 0.870–0.889 with precision of 0.885–0.920, sensitivity of 0.852, and specificity of 0.889–0.926. On external validation, the top three models, all based on foundation-model tile embeddings, achieved F1 scores of 0.916–0.919 on the first cohort and 0.928–0.934 on the second cohort. Across cohorts, specificity remained consistently high (0.964–0.992), while sensitivity ranged from 0.500 to 0.682.

Conclusion

This study demonstrates that dMMR status in colorectal cancer can be effectively predicted from histopathological WSIs using MIL-based models, with moderate generalizability across independent cohorts. In addition to confirming the predictive value of tumor regions, the results reveal that non-tumorous tissue also contains detectable predictive signals, suggesting that microenvironmental features may contribute to dMMR-associated histological patterns. Furthermore, the use of foundation model–derived embeddings improved generalizability across datasets. Future work should explore integrating non-tumor tissue features and clinical data to further improve predictive performance.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Funding information in the publication
This study is one part of the Central Finland AI hub II project that has received funding from the Regional Council of Central Finland (https://www.keskisuomi.fi/) and the European Regional Development Fund (ERDF) (https://ec.europa.eu/regional policy/funding/erdf en). The data (CRC samples, WSIs and MMR analysis) collected in this work was supported by Jane and Aatos Erkko Foundation (https://jaes.fi/en/frontpage/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Last updated on 16/04/2026 09:48:15 AM