Vertaisarvioitu alkuperäisartikkeli tai data-artikkeli tieteellisessä aikakauslehdessä (A1)

Tool evaluation for the detection of variably sized indels from next generation whole genome and targeted sequencing data




Julkaisun tekijätWang Ning, Lysenkov Vladislav, Orte Katri, Kairisto Veli, Aakko Juhani, Khan Sofia, Elo Laura L

KustantajaPUBLIC LIBRARY SCIENCE

Julkaisuvuosi2022

JournalPLoS Computational Biology

Tietokannassa oleva lehden nimiPLOS COMPUTATIONAL BIOLOGY

Lehden akronyymiPLOS COMPUT BIOL

Artikkelin numero e1009269

Volyymi18

Sivujen määrä27

ISSN1553-734X

eISSN1553-734X

DOIhttp://dx.doi.org/10.1371/journal.pcbi.1009269

Verkko-osoitehttps://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009269

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/175018398


Tiivistelmä

Insertions and deletions (indels) in human genomes are associated with a wide range of phenotypes, including various clinical disorders. High-throughput, next generation sequencing (NGS) technologies enable the detection of short genetic variants, such as single nucleotide variants (SNVs) and indels. However, the variant calling accuracy for indels remains considerably lower than for SNVs. Here we present a comparative study of the performance of variant calling tools for indel calling, evaluated with a wide repertoire of NGS datasets. While there is no single optimal tool to suit all circumstances, our results demonstrate that the choice of variant calling tool greatly impacts the precision and recall of indel calling. Furthermore, to reliably detect indels, it is essential to choose NGS technologies that offer a long read length and high coverage coupled with specific variant calling tools.

Author summary

The development of next generation sequencing (NGS) technologies and computational algorithms enabled the large scale, simultaneous detection of a wide range of genetic variants, such as single nucleotide variants as well as insertions and deletions (indels), which may confer potential clinical significance. Recently, many studies have been conducted to evaluate variant calling tools for indel calling. However, the optimal indel size range for different variant calling tools remains unclear. A good benchmarking dataset for indel calling evaluation should contain biologically representative high-confident indels with a wide size range and preferably come from various sequencing settings. In this article, we created a semi-simulated whole genome sequencing dataset where the sequencing data were computationally generated. The indels in the semi-simulated genome were incorporated from a real human sample to represent biologically realistic indels and to avoid the inclusion of variants due to potential technical sequencing errors. Furthermore, we used three real-world NGS datasets generated by whole genome or targeted sequencing to further evaluate our candidate tools. Our results demonstrated that variant calling tools vary greatly in calling different sizes of indels. Deletion calling and insertion calling also showed differences among the tools. The sequencing settings in coverage and read length also had a great impact on indel calling. Our results suggest that the accuracy of indel calling was dependent on the combination of a variant calling tool, indel size range, and sequencing settings.


Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Last updated on 2022-23-05 at 15:29