A1 Refereed original research article in a scientific journal
dMMR prediction from colorectal cancer histopathology: Leveraging non-tumor and low-magnification regions
Authors: Petäinen, Liisa; Väyrynen, Juha P.; Böhm, Jan; Ruusuvuori, Pekka; Ahtiainen, Maarit; Elomaa, Hanna; Karjalainen, Henna; Kastinen, Meeri; Tapiainen, Vilja V.; Äijälä, Ville K.; Sirniö, Päivi; Tuomisto, Anne; Mäkinen, Markus J.; Mecklin, Jukka-Pekka; Pölönen, Ilkka; Äyrämö, Sami
Publication year: 2026
Journal: Computer Methods and Programs in Biomedicine
Article number: 109317
Volume: 280
ISSN: 0169-2607
eISSN: 1872-7565
DOI: https://doi.org/10.1016/j.cmpb.2026.109317
Publication's open availability at the time of reporting: Open Access
Publication channel's open availability : Partially Open Access publication channel
Web address : https://doi.org/10.1016/j.cmpb.2026.109317
Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/522873195
Self-archived copy's licence: CC BY
Self-archived copy's version: Publisher`s PDF
Background and Objective
Colorectal cancer is the second leading cause of cancer-related mortality worldwide, posing a substantial burden on healthcare systems. Identifying DNA mismatch repair deficiency (dMMR) is critical for guiding treatment, yet conventional methods rely on labor-intensive DNA analysis. While deep-learning approaches have shown promise for predicting dMMR from histopathological images, most studies focus exclusively on tumor regions and single-scale representations. This study systematically evaluates the predictive value of tumor and non-tumor regions across multiple magnifications for dMMR prediction from whole-slide images (WSIs).
MethodsA total of 24 different modeling approaches were evaluated, varying by tissue origin (tumor vs. non-tumor), magnification level (5x and 20x), and tile embedding strategy, including digital pathology foundation models. Tile embeddings were further trained with 1228 WSIs using multiple-instance learning (MIL) based approach. The best-performing configurations were selected for external evaluation. External testing was carried out on two independent cohorts consisting of 1010 and 457 WSIs, respectively.
ResultsNon-tumorous regions demonstrated measurable predictive value, although performance remained lower than that obtained from tumor regions (F1 = 0.896, precision = 0.888, sensitivity = 0.594, specificity = 0.982). Among the nine models selected during internal validation, the top three models—one multi-scale approach and two models trained on 20x tumor regions—achieved F1 scores of 0.870–0.889 with precision of 0.885–0.920, sensitivity of 0.852, and specificity of 0.889–0.926. On external validation, the top three models, all based on foundation-model tile embeddings, achieved F1 scores of 0.916–0.919 on the first cohort and 0.928–0.934 on the second cohort. Across cohorts, specificity remained consistently high (0.964–0.992), while sensitivity ranged from 0.500 to 0.682.
ConclusionThis study demonstrates that dMMR status in colorectal cancer can be effectively predicted from histopathological WSIs using MIL-based models, with moderate generalizability across independent cohorts. In addition to confirming the predictive value of tumor regions, the results reveal that non-tumorous tissue also contains detectable predictive signals, suggesting that microenvironmental features may contribute to dMMR-associated histological patterns. Furthermore, the use of foundation model–derived embeddings improved generalizability across datasets. Future work should explore integrating non-tumor tissue features and clinical data to further improve predictive performance.
Downloadable publication This is an electronic reprint of the original article. |
Funding information in the publication:
This study is one part of the Central Finland AI hub II project that has received funding from the Regional Council of Central Finland (https://www.keskisuomi.fi/) and the European Regional Development Fund (ERDF) (https://ec.europa.eu/regional policy/funding/erdf en). The data (CRC samples, WSIs and MMR analysis) collected in this work was supported by Jane and Aatos Erkko Foundation (https://jaes.fi/en/frontpage/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.