Deep learning tools are top performers in long non-coding RNA prediction - UTU Research Portal

A2 Refereed review article in a scientific journal

Deep learning tools are top performers in long non-coding RNA prediction

Authors: Ammunét Tea, Wang Ning, Khan Sofia, Elo Laura L

Publisher: OXFORD UNIV PRESS

Publication year: 2022

Journal: Briefings in Functional Genomics

Journal name in source: BRIEFINGS IN FUNCTIONAL GENOMICS

Journal acronym: BRIEF FUNCT GENOMICS

Volume: 21

Issue: 3

First page : 230

Last page: 241

Number of pages: 12

ISSN: 2041-2649

eISSN: 2041-2657

DOI: https://doi.org/10.1093/bfgp/elab045

Publication's open availability at the time of reporting: Open Access

Publication channel's open availability : Partially Open Access publication channel

Web address : https://academic.oup.com/bfg/article/21/3/230/6523275

Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/175411816

Abstract

The increasing amount of transcriptomic data has brought to light vast numbers of potential novel RNA transcripts. Accurately distinguishing novel long non-coding RNAs (lncRNAs) from protein-coding messenger RNAs (mRNAs) has challenged bioinformatic tool developers. Most recently, tools implementing deep learning architectures have been developed for this task, with the potential of discovering sequence features and their interactions still not surfaced in current knowledge. We compared the performance of deep learning tools with other predictive tools that are currently used in lncRNA coding potential prediction. A total of 15 tools representing the variety of available methods were investigated. In addition to known annotated transcripts, we also evaluated the use of the tools in actual studies with real-life data. The robustness and scalability of the tools' performance was tested with varying sized test sets and test sets with different proportions of lncRNAs and mRNAs. In addition, the ease-of-use for each tested tool was scored. Deep learning tools were top performers in most metrics and labelled transcripts similarly with each other in the real-life dataset. However, the proportion of lncRNAs and mRNAs in the test sets affected the performance of all tools. Computational resources were utilized differently between the top-ranking tools, thus the nature of the study may affect the decision of choosing one well-performing tool over another. Nonetheless, the results suggest favouring the novel deep learning tools over other tools currently in broad use.

Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

elab045.pdf