Artificial intelligence unveils tumor diversity in brain cancer through Raman spectroscopy: Machine learning for glioma subtype classifications - UTU Research Portal

G5 Article dissertation

Artificial intelligence unveils tumor diversity in brain cancer through Raman spectroscopy: Machine learning for glioma subtype classifications

Authors: Sjöberg, Joel

Publisher: Turun yliopisto

Publishing place: Turku

Publication year: 2026

Series title: Turun yliopiston julkaisuja - Annales Universitatis Turkunesis AI

Number in series: 753

ISBN: 978-952-02-0549-2

eISBN: 978-952-02-0550-8

ISSN: 0082-7002

eISSN: 2343-3175

Publication's open availability at the time of reporting: Open Access

Publication channel's open availability : Open Access publication channel

Web address : https://urn.fi/URN:ISBN:978-952-02-0550-8

Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/508954816

Abstract

Applying machine learning (ML) methods as diagnostic classifcation models can accelerate the process of diagnosing cancers. For glioma, a brain cancer possessing diverse genetic makeup, ML can capture the different genetic characteristics in tumor environments, providing a diverse and precise mapping of heterogeneous gliomas. The need for methods capable of computing these kinds of predictions with high reliability is of special interest when considering the rapid deterioration of health in glioma patients. A promising avenue for developing these models can be found through Raman spectroscopy, a vibrational spectroscopic technique capable of cap¬turing the genetic traits of gliomas through tumor-wide scanning. ML can be uti¬lized to curate Raman spectra, a necessary procedure for quality assurance of Raman spectroscopy datasets. In larger datasets formed through combinations of different cohorts, there is a considerable risk of the batch effect occurring. The batch effect is descriptive of the bias present within datasets which results from assumptions and methodologies carried out during their extraction. Curating Raman data from the batch effect is important to ensure model reliance on cancer specifc patterns rather than acquisition-related effects.

In this thesis, we present mathematical models capable of forming predictions for tumor-wide classifcations of genetic characteristics. We develop methods for curating Raman spectra through ML and synthetic data generation and demonstrate their effectiveness on a dataset of glioma tumors. Furthermore, we develop a method to improve dataset quality through ML for removal of the batch effect, promoting model detection of cancer-specifc patterns. Our contributions to the feld of glioma classifcation comes in the form of classifer models and the strategies which have enabled them. We present a deep learning (DL) architecture which utilizes a min¬imal number of parameters to provide consistent outputs for correction of spectra. We also present a strategy to reduce the batch effect through adversarial learning while measuring the features relevant for genetic classifcations. Through this work, we show how applying our methods can improve the performance of classifcation models for gliomas.