A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä
EpiSmokEr2: a robust epigenetic classifier for smoking status inference using Illumina EPIC methylation data
Tekijät: Zhu, Tianyu; Faragó, Teodóra; Bollepalli, Sailalitha; Heikkinen, Aino; Hukkanen, Mikaela; Raitakari, Olli; Lehtimäki, Terho; Korhonen, Tellervo; Kaprio, Jaakko; Fang, Fang; Lawrence, Kaitlyn G.; Sandler, Dale P.; Roberts Spildrejorde, Mari; Gervin, Kristina; Pan, Yanyu; Costeira, Ricardo; Bell, Jordana T.; Ollikainen, Miina
Kustantaja: Future Medicine Ltd.
Julkaisuvuosi: 2026
Lehti: Epigenomics
Vuosikerta: 18
Numero: 2
Aloitussivu: 205
Lopetussivu: 215
ISSN: 1750-1911
eISSN: 1750-192X
DOI: https://doi.org/10.1080/17501911.2026.2630841
Julkaisun avoimuus kirjaamishetkellä: Avoimesti saatavilla
Julkaisukanavan avoimuus : Osittain avoin julkaisukanava
Verkko-osoite: https://doi.org/10.1080/17501911.2026.2630841
Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/515686691
Rinnakkaistallenteen lisenssi: CC BY
Rinnakkaistallennetun julkaisun versio: Kustantajan versio
Aim
Tobacco smoking induces persistent DNA methylation (DNAm) changes in blood that can serve as long-term biomarkers for smoking exposure. We aimed to develop and validate a DNAm classifier of smoking status using Illumina EPIC array data.
MethodsWe built Epigenetic Smoking status Estimator2 (EpiSmokEr2), a Least Absolute Shrinkage and Selection Operator (LASSO) regression-based DNAm classifier using 511 CpGs from Illumina Infinium MethylationEPIC array (EPIC) data. The model was trained on 1343 samples from the Young Finns Study cohort and validated across six independent datasets from four cohorts and two array platforms (EPIC and EPICv2).
ResultsEpiSmokEr2 achieved an average sensitivity of 0.87 and specificity of 0.86 in distinguishing current from never smokers. Predicted smoking status correlated strongly with established DNAm smoking scores and GrimAge, indicating its ability to capture biologically relevant smoking effects. Simulation analysis showed EpiSmokEr2 was robust for up to 10% missing CpGs.
ConclusionEpiSmokEr2 provides a reliable DNAm-based estimator of smoking status. It is available as an open-source R package on GitHub, facilitating broad use in epidemiological and clinical research.
Ladattava julkaisu This is an electronic reprint of the original article. |
Julkaisussa olevat rahoitustiedot:
This study is supported by the following funds: Academy of Finland [328685, 307339, 297908 and 251316], Sigrid Juselius Foundation, Liv o Hälsa rf, and Finnish Cultural Foundation to MO. Jaakko Kaprio acknowledges support by Academy of Finland Center of Excellence in Complex Disease Genetics [grants 336823 & 352792] and Sigrid Juselius Foundation. The Young Finns Study has been financially supported by the Academy of Finland: grants 356405, 322098, 286284, 134309 (Eye), 126925, 121584, 124282, 129378 (Salve), 117797 (Gendi), and 141071 (Skidi); the Social Insurance Institution of Finland; Competitive State Research Financing of the Expert Responsibility area of Kuopio, Tampere and Turku University Hospitals [grant X51001]; Juho Vainio Foundation; Paavo Nurmi Foundation; Finnish Foundation for Cardiovascular Research; Finnish Cultural Foundation; The Sigrid Juselius Foundation; Tampere Tuberculosis Foundation; Emil Aaltonen Foundation; Yrjö Jahnsson Foundation; Signe and Ane Gyllenberg Foundation; Diabetes Research Foundation of Finnish Diabetes Association; EU Horizon 2020 [grant 755320 for TAXINOMISIS and grant 848146 for To Aition]; European Research Council [grant 742927 for MULTIEPIGEN project]; Tampere University Hospital Supporting Foundation; Finnish Society of Clinical Chemistry; the Cancer Foundation Finland; pBETTER4U_EU (Preventing obesity through Biologically and bEhaviorally Tailored inTERventions for you; project number: 101080117]; CVDLink [EU grant no. 101137278] and the Jane and Aatos Erkko Foundation. TwinsUK is funded by the Wellcome Trust, Medical Research Council, Versus Arthritis, European Union Horizon 2020, Chronic Disease Research Foundation (CDRF), Zoe Ltd, the National Institute for Health and Care Research (NIHR) Clinical Research Network (CRN) and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London. The TwinsUK data and analyses were further supported by the European HDHL Joint Programming Initiative funding scheme DIMENSION award [BBSRC BB/S020845/1 and BB/T019980/1 to J.T.B.], and BACMETH award, selected for funding by the ERC consolidator award and funded by the UK Engineering and Physical Sciences Research Council [EP/Y023765/1 to JTB], and by the UK Economic and Social Research Council [ES/N000404/1 to J.T.B]. The GuLF Long-Term Follow-up Study was supported by the Intramural Research Program of the National Institutes of Health (NIH), National Institute of Environmental Health Sciences [ZO1 ES 102945]. The contributions of the NIH author(s) were made as part of their official duties as NIH federal employees, are in compliance with agency policy requirements, and are considered Works of the United States Government. However, the findings and conclusions presented in this paper are those of the author(s) and do not necessarily reflect the views of the NIH or the U.S. Department of Health and Human Services. The DNA methylation data analyses were further supported by National Institute on Drug Abuse grant number R01DA048824 (PI: Fang). The GeNeup study was funded by the Research Council of Norway, grant number [275476 and 328657]. All the funders listed above had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.