Deep learning‐based 3D classification of head and neck cancer PET/MRI: Radiologist comparison and Grad‐CAM interpretability - UTU Research Portal

A1 Refereed original research article in a scientific journal

Deep learning‐based 3D classification of head and neck cancer PET/MRI: Radiologist comparison and Grad‐CAM interpretability

Authors: Liedes, Joonas; Hirvonen, Jussi; Rainio, Oona; Murtojärvi, Sarita; Malaspina, Simona; Klén, Riku; Kemppainen, Jukka

Publisher: Wiley

Publication year: 2025

Journal:: Clinical Physiology and Functional Imaging

Journal name in source: CLINICAL PHYSIOLOGY AND FUNCTIONAL IMAGING

Article number: e70030

Volume: 45

Issue: 5

ISSN: 1475-0961

eISSN: 1475-097X

DOI: https://doi.org/10.1111/cpf.70030

Web address : https://doi.org/10.1111/cpf.70030

Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/504538722

Abstract

Purpose:

To develop and evaluate a three-dimensional convolutional neural network for automated classification of PET/MRI images in head and neck cancer (HNC) patients, assessing its performance against radiologist interpretation and its potential as a diagnostic aid.

Methods:

Data from 202 patients with HNC who underwent 18F-FDG PET/MRI were used to train and validate PET-, MRI-, and PET/MRI-based models. Of these data, 101 patients were labelled as positive in terms of having HNC, and 101 patients as negative. An additional test set of 20 patients was also evaluated, where 10 patients were labelled as positive and 10 as negative. The model performance was assessed using sensitivity, specificity, accuracy, and AUC. Grad-CAM was utilised to improve interpretability and classification results on the test set were compared with a radiologist.

Results:
The PET-based model achieved an AUC of 0.92 on the test set, with an accuracy of 90%, a sensitivity of 100% and a specificity of 80%. PET/MRI and MRI-based models underperformed relative to the PET-based model. The radiologist achieved perfect classification accuracy. Analysis of Grad-CAM showed that the model classifications are based on real areas of interest. In addition, it gave valuable insight into using similar systems in identifying false positive findings.

Conclusion:

The PET-based model demonstrated high sensitivity, indicating its potential as a pre-screening tool for HNC. However, specificity requires improvement to reduce false-positive rates. Enhanced datasets and refinement of model architecture will be crucial before clinical adoption. Grad-CAM provides valuable insights into model decisions, aiding clinical integration.

Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

Liedes_etal_deep_learning-based_2025.pdf

Funding information in the publication:
Dr. Rainio received funding from the Sakari Alhopuro Foundation. Dr. Liedes and Dr. Kemppainen received funding from Cancer Foundation Finland. Dr. Hirvonen received funding from the Sigrid Jusélius Foundation. Open access publishing facilitated by Turun yliopisto, as part of the Wiley - FinELib agreement.