A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

Prediction of Recurrent Mutations in SARS-CoV-2 Using Artificial Neural Networks




TekijätSaldivar-Espinoza Bryan, Macip Guillem, Garcia-Segura Pol, Mestres-Truyol Júlia, Puigbò Pere, Cereto-Massague Adrià, Pujadas Gerard, Garcia-Vallve Santiago

KustantajaMDPI

Julkaisuvuosi2022

JournalInternational Journal of Molecular Sciences

Tietokannassa oleva lehden nimiINTERNATIONAL JOURNAL OF MOLECULAR SCIENCES

Lehden akronyymiINT J MOL SCI

Artikkelin numero 14683

Vuosikerta23

Numero23

Sivujen määrä17

eISSN1422-0067

DOIhttps://doi.org/10.3390/ijms232314683

Verkko-osoitehttps://www.mdpi.com/1422-0067/23/23/14683

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/178071596


Tiivistelmä
Predicting SARS-CoV-2 mutations is difficult, but predicting recurrent mutations driven by the host, such as those caused by host deaminases, is feasible. We used machine learning to predict which positions from the SARS-CoV-2 genome will hold a recurrent mutation and which mutations will be the most recurrent. We used data from April 2021 that we separated into three sets: a training set, a validation set, and an independent test set. For the test set, we obtained a specificity value of 0.69, a sensitivity value of 0.79, and an Area Under the Curve (AUC) of 0.8, showing that the prediction of recurrent SARS-CoV-2 mutations is feasible. Subsequently, we compared our predictions with updated data from January 2022, showing that some of the false positives in our prediction model become true positives later on. The most important variables detected by the model's Shapley Additive exPlanation (SHAP) are the nucleotide that mutates and RNA reactivity. This is consistent with the SARS-CoV-2 mutational bias pattern and the preference of some host deaminases for specific sequences and RNA secondary structures. We extend our investigation by analyzing the mutations from the variants of concern Alpha, Beta, Delta, Gamma, and Omicron. Finally, we analyzed amino acid changes by looking at the predicted recurrent mutations in the M-pro and spike proteins.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.





Last updated on 2024-26-11 at 23:30