A1 Refereed original research article in a scientific journal

Deep learning‐based 3D classification of head and neck cancer PET/MRI: Radiologist comparison and Grad‐CAM interpretability




AuthorsLiedes, Joonas; Hirvonen, Jussi; Rainio, Oona; Murtojärvi, Sarita; Malaspina, Simona; Klén, Riku; Kemppainen, Jukka

PublisherWiley

Publication year2025

Journal:Clinical Physiology and Functional Imaging

Journal name in sourceCLINICAL PHYSIOLOGY AND FUNCTIONAL IMAGING

Article numbere70030

Volume45

Issue5

ISSN1475-0961

eISSN1475-097X

DOIhttps://doi.org/10.1111/cpf.70030

Web address https://doi.org/10.1111/cpf.70030

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/504538722


Abstract

Purpose:

To develop and evaluate a three-dimensional convolutional neural network for automated classification of PET/MRI images in head and neck cancer (HNC) patients, assessing its performance against radiologist interpretation and its potential as a diagnostic aid.

Methods:

Data from 202 patients with HNC who underwent 18F-FDG PET/MRI were used to train and validate PET-, MRI-, and PET/MRI-based models. Of these data, 101 patients were labelled as positive in terms of having HNC, and 101 patients as negative. An additional test set of 20 patients was also evaluated, where 10 patients were labelled as positive and 10 as negative. The model performance was assessed using sensitivity, specificity, accuracy, and AUC. Grad-CAM was utilised to improve interpretability and classification results on the test set were compared with a radiologist.

Results:
The PET-based model achieved an AUC of 0.92 on the test set, with an accuracy of 90%, a sensitivity of 100% and a specificity of 80%. PET/MRI and MRI-based models underperformed relative to the PET-based model. The radiologist achieved perfect classification accuracy. Analysis of Grad-CAM showed that the model classifications are based on real areas of interest. In addition, it gave valuable insight into using similar systems in identifying false positive findings.

Conclusion:

The PET-based model demonstrated high sensitivity, indicating its potential as a pre-screening tool for HNC. However, specificity requires improvement to reduce false-positive rates. Enhanced datasets and refinement of model architecture will be crucial before clinical adoption. Grad-CAM provides valuable insights into model decisions, aiding clinical integration.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Funding information in the publication
Dr. Rainio received funding from the Sakari Alhopuro Foundation. Dr. Liedes and Dr. Kemppainen received funding from Cancer Foundation Finland. Dr. Hirvonen received funding from the Sigrid Jusélius Foundation. Open access publishing facilitated by Turun yliopisto, as part of the Wiley - FinELib agreement.


Last updated on 2025-13-10 at 12:59