A1 Refereed original research article in a scientific journal
MicFunPred: A conserved approach to predict functional profiles from 16S rRNA gene sequence data
Authors: Mongad DS, Chavan NS, Narwade NP, Dixit K, Shouche YS, Dhotre DP
Publisher: ACADEMIC PRESS INC ELSEVIER SCIENCE
Publication year: 2021
Journal: Genomics
Journal name in source: GENOMICS
Journal acronym: GENOMICS
Volume: 113
Issue: 6
First page : 3635
Last page: 3643
Number of pages: 9
ISSN: 0888-7543
DOI: https://doi.org/10.1016/j.ygeno.2021.08.016
Abstract
The 16S rRNA gene amplicon sequencing is a popular technique that provides accurate characterization of microbial taxonomic abundances but does not provide any functional information. Several tools are available to predict functional profiles based on 16S rRNA gene sequence data that use different genome databases and approaches. As variable regions of partially-sequenced 16S rRNA gene cannot resolve taxonomy accurately beyond the genus level, these tools may give inflated results. Here, we developed 'MicFunPred', which uses a novel approach to derive imputed metagenomes based on a set of core genes only, thereby minimizing falsepositive predictions. On simulated datasets, MicFunPred showed the lowest False Positive Rate (FPR) with mean Spearman's correlation of 0.89 (SD = 0.03), while on seven real datasets the mean correlation was 0.75 (SD = 0.08). MicFunPred was found to be faster with low computational requirements and performed better or comparable when compared with other tools.
The 16S rRNA gene amplicon sequencing is a popular technique that provides accurate characterization of microbial taxonomic abundances but does not provide any functional information. Several tools are available to predict functional profiles based on 16S rRNA gene sequence data that use different genome databases and approaches. As variable regions of partially-sequenced 16S rRNA gene cannot resolve taxonomy accurately beyond the genus level, these tools may give inflated results. Here, we developed 'MicFunPred', which uses a novel approach to derive imputed metagenomes based on a set of core genes only, thereby minimizing falsepositive predictions. On simulated datasets, MicFunPred showed the lowest False Positive Rate (FPR) with mean Spearman's correlation of 0.89 (SD = 0.03), while on seven real datasets the mean correlation was 0.75 (SD = 0.08). MicFunPred was found to be faster with low computational requirements and performed better or comparable when compared with other tools.