A1 Refereed original research article in a scientific journal

The sensitivity of patient-reported outcome measures in surgical and non-surgical care: a systematic review and meta-epidemiological evaluation of randomised controlled trials




AuthorsUimonen, Mikko; Vaajala, Matias; Saarinen, Antti; Liukkonen, Rasmus; Pakarinen, Oskari; Laaksonen, Juho; Ponkilainen, Ville; Kuitunen, Ilari; Panula, Valtteri

Publication year2026

Journal: EClinicalMedicine

Article number103776

Volume92

eISSN2589-5370

DOIhttps://doi.org/10.1016/j.eclinm.2026.103776

Publication's open availability at the time of reportingOpen Access

Publication channel's open availability Open Access publication channel

Web address https://doi.org/10.1016/j.eclinm.2026.103776

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/508959943

Self-archived copy's licenceCC BY

Self-archived copy's versionPublisher`s PDF


Abstract

Background: Accumulation of score distribution towards the high end of the measurement scale is an important source of bias related patient-reported outcome measures (PROM). The aim was to evaluate how PROM score distributions, scale boundaries, and sampling variability influence the likelihood of detecting a minimal clinically important difference (MCID) of 10 points between surgical and non-surgical groups in randomised controlled trials (RCTs) of musculoskeletal disorders.

Methods: We did a systematic review and meta-epidemiological analysis of 129 RCT studies comparing surgical and non-surgical interventions in patients with musculoskeletal complaints using a PROM as an outcome measure (1771 group-level PROM measurements) from PubMed and Scopus published until February 26, 2025. Simulations assessed each comparison's likelihood of detecting a difference of 10 points or more.

Findings: The mean difference between groups was 4.6 (SD 7.1) points favouring surgery, with surgical arms scoring higher in 72% of comparisons. The mean likelihood of detecting at least a 10-point difference was 19%, meaning fewer than one in five of such comparisons would detect a true difference. Detection likelihood peaked (35%) at a mean score of 70, declining toward scale extremes. Comparisons with significant observed differences (>10 points, p < 0.05) had a 54% likelihood versus 17% in non-significant comparisons, strongly linking detection likelihood to observed differences.

Interpretation: The majority of the PROM-based RCTs were unlikely to detect differences due to ceiling effects with a constant underestimation of surgical benefit. PROMs with adequate content coverage, better discrimination, and reduced ceiling susceptibility should be selected for clinical practice. Future research should align outcome selection and follow-up timing with expected treatment effects and ensure that measurement properties do not mask meaningful clinical differences.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.





Last updated on 13/02/2026 01:08:34 PM