A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

EpiSmokEr2: a robust epigenetic classifier for smoking status inference using Illumina EPIC methylation data




TekijätZhu, Tianyu; Faragó, Teodóra; Bollepalli, Sailalitha; Heikkinen, Aino; Hukkanen, Mikaela; Raitakari, Olli; Lehtimäki, Terho; Korhonen, Tellervo; Kaprio, Jaakko; Fang, Fang; Lawrence, Kaitlyn G.; Sandler, Dale P.; Roberts Spildrejorde, Mari; Gervin, Kristina; Pan, Yanyu; Costeira, Ricardo; Bell, Jordana T.; Ollikainen, Miina

KustantajaFuture Medicine Ltd.

Julkaisuvuosi2026

Lehti: Epigenomics

Vuosikerta18

Numero2

Aloitussivu205

Lopetussivu215

ISSN1750-1911

eISSN1750-192X

DOIhttps://doi.org/10.1080/17501911.2026.2630841

Julkaisun avoimuus kirjaamishetkelläAvoimesti saatavilla

Julkaisukanavan avoimuus Osittain avoin julkaisukanava

Verkko-osoitehttps://doi.org/10.1080/17501911.2026.2630841

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/515686691

Rinnakkaistallenteen lisenssiCC BY

Rinnakkaistallennetun julkaisun versioKustantajan versio


Tiivistelmä
Aim

Tobacco smoking induces persistent DNA methylation (DNAm) changes in blood that can serve as long-term biomarkers for smoking exposure. We aimed to develop and validate a DNAm classifier of smoking status using Illumina EPIC array data.

Methods

We built Epigenetic Smoking status Estimator2 (EpiSmokEr2), a Least Absolute Shrinkage and Selection Operator (LASSO) regression-based DNAm classifier using 511 CpGs from Illumina Infinium MethylationEPIC array (EPIC) data. The model was trained on 1343 samples from the Young Finns Study cohort and validated across six independent datasets from four cohorts and two array platforms (EPIC and EPICv2).

Results

EpiSmokEr2 achieved an average sensitivity of 0.87 and specificity of 0.86 in distinguishing current from never smokers. Predicted smoking status correlated strongly with established DNAm smoking scores and GrimAge, indicating its ability to capture biologically relevant smoking effects. Simulation analysis showed EpiSmokEr2 was robust for up to 10% missing CpGs.

Conclusion

EpiSmokEr2 provides a reliable DNAm-based estimator of smoking status. It is available as an open-source R package on GitHub, facilitating broad use in epidemiological and clinical research.


Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Julkaisussa olevat rahoitustiedot
This study is supported by the following funds: Academy of Finland [328685, 307339, 297908 and 251316], Sigrid Juselius Foundation, Liv o Hälsa rf, and Finnish Cultural Foundation to MO. Jaakko Kaprio acknowledges support by Academy of Finland Center of Excellence in Complex Disease Genetics [grants 336823 & 352792] and Sigrid Juselius Foundation. The Young Finns Study has been financially supported by the Academy of Finland: grants 356405, 322098, 286284, 134309 (Eye), 126925, 121584, 124282, 129378 (Salve), 117797 (Gendi), and 141071 (Skidi); the Social Insurance Institution of Finland; Competitive State Research Financing of the Expert Responsibility area of Kuopio, Tampere and Turku University Hospitals [grant X51001]; Juho Vainio Foundation; Paavo Nurmi Foundation; Finnish Foundation for Cardiovascular Research; Finnish Cultural Foundation; The Sigrid Juselius Foundation; Tampere Tuberculosis Foundation; Emil Aaltonen Foundation; Yrjö Jahnsson Foundation; Signe and Ane Gyllenberg Foundation; Diabetes Research Foundation of Finnish Diabetes Association; EU Horizon 2020 [grant 755320 for TAXINOMISIS and grant 848146 for To Aition]; European Research Council [grant 742927 for MULTIEPIGEN project]; Tampere University Hospital Supporting Foundation; Finnish Society of Clinical Chemistry; the Cancer Foundation Finland; pBETTER4U_EU (Preventing obesity through Biologically and bEhaviorally Tailored inTERventions for you; project number: 101080117]; CVDLink [EU grant no. 101137278] and the Jane and Aatos Erkko Foundation. TwinsUK is funded by the Wellcome Trust, Medical Research Council, Versus Arthritis, European Union Horizon 2020, Chronic Disease Research Foundation (CDRF), Zoe Ltd, the National Institute for Health and Care Research (NIHR) Clinical Research Network (CRN) and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London. The TwinsUK data and analyses were further supported by the European HDHL Joint Programming Initiative funding scheme DIMENSION award [BBSRC BB/S020845/1 and BB/T019980/1 to J.T.B.], and BACMETH award, selected for funding by the ERC consolidator award and funded by the UK Engineering and Physical Sciences Research Council [EP/Y023765/1 to JTB], and by the UK Economic and Social Research Council [ES/N000404/1 to J.T.B]. The GuLF Long-Term Follow-up Study was supported by the Intramural Research Program of the National Institutes of Health (NIH), National Institute of Environmental Health Sciences [ZO1 ES 102945]. The contributions of the NIH author(s) were made as part of their official duties as NIH federal employees, are in compliance with agency policy requirements, and are considered Works of the United States Government. However, the findings and conclusions presented in this paper are those of the author(s) and do not necessarily reflect the views of the NIH or the U.S. Department of Health and Human Services. The DNA methylation data analyses were further supported by National Institute on Drug Abuse grant number R01DA048824 (PI: Fang). The GeNeup study was funded by the Research Council of Norway, grant number [275476 and 328657]. All the funders listed above had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Last updated on