A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

The sensitivity of patient-reported outcome measures in surgical and non-surgical care: a systematic review and meta-epidemiological evaluation of randomised controlled trials




TekijätUimonen, Mikko; Vaajala, Matias; Saarinen, Antti; Liukkonen, Rasmus; Pakarinen, Oskari; Laaksonen, Juho; Ponkilainen, Ville; Kuitunen, Ilari; Panula, Valtteri

KustantajaElsevier BV

Julkaisuvuosi2026

Lehti: EClinicalMedicine

Artikkelin numero103776

Vuosikerta92

eISSN2589-5370

DOIhttps://doi.org/10.1016/j.eclinm.2026.103776

Julkaisun avoimuus kirjaamishetkelläAvoimesti saatavilla

Julkaisukanavan avoimuus Kokonaan avoin julkaisukanava

Verkko-osoitehttps://doi.org/10.1016/j.eclinm.2026.103776

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/508959943

Rinnakkaistallenteen lisenssiCC BY

Rinnakkaistallennetun julkaisun versioKustantajan versio


Tiivistelmä

Background: Accumulation of score distribution towards the high end of the measurement scale is an important source of bias related patient-reported outcome measures (PROM). The aim was to evaluate how PROM score distributions, scale boundaries, and sampling variability influence the likelihood of detecting a minimal clinically important difference (MCID) of 10 points between surgical and non-surgical groups in randomised controlled trials (RCTs) of musculoskeletal disorders.

Methods: We did a systematic review and meta-epidemiological analysis of 129 RCT studies comparing surgical and non-surgical interventions in patients with musculoskeletal complaints using a PROM as an outcome measure (1771 group-level PROM measurements) from PubMed and Scopus published until February 26, 2025. Simulations assessed each comparison's likelihood of detecting a difference of 10 points or more.

Findings: The mean difference between groups was 4.6 (SD 7.1) points favouring surgery, with surgical arms scoring higher in 72% of comparisons. The mean likelihood of detecting at least a 10-point difference was 19%, meaning fewer than one in five of such comparisons would detect a true difference. Detection likelihood peaked (35%) at a mean score of 70, declining toward scale extremes. Comparisons with significant observed differences (>10 points, p < 0.05) had a 54% likelihood versus 17% in non-significant comparisons, strongly linking detection likelihood to observed differences.

Interpretation: The majority of the PROM-based RCTs were unlikely to detect differences due to ceiling effects with a constant underestimation of surgical benefit. PROMs with adequate content coverage, better discrimination, and reduced ceiling susceptibility should be selected for clinical practice. Future research should align outcome selection and follow-up timing with expected treatment effects and ensure that measurement properties do not mask meaningful clinical differences.


Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.





Last updated on