A1 Refereed original research article in a scientific journal

Pool-seq driven proteogenomic database for Group G Streptococcus




AuthorsWeldatsadik RG, Datta N, Kolmeder C, Vuopio J, Kere J, Wilkman SV, Flatt JW, Vuento R, Haapasalo KJ, Keskitalo S, Varjosalo M, Jokiranta TS

Publication year2019

JournalJournal of Proteomics

Journal name in sourceJournal of proteomics

Journal acronymJ Proteomics

Volume201

First page 84

Last page92

Number of pages9

ISSN1874-3919

eISSN1876-7737

DOIhttps://doi.org/10.1016/j.jprot.2019.04.015


Abstract
Proteogenomic databases use genomic and transcriptomic information for improved identification of peptides and proteins from mass spectrometry analyses. One application of such databases is in the discovery of variants/mutations. In this study, we created a proteogenomic database that contained sequences with variants derived from Pooled sequencing experiments (137 Group G Streptococcus strains sequenced in 3 pools) and used tandem mass spectrometry (MS/MS) to analyse eight protein samples from randomly selected strains sequenced in the pools. Using the proteogenomic variant database, we identified 385 variant peptides from the eight samples, none of which could be identified from the single genome conventional database utilized, while 71.2% and 93.5% of them were identified from the databases that contained 4 complete genomes and 26 assemblies, respectively. The proteogenomic variant databases exhibited the same properties as the conventional databases in terms of the Andromeda score distributions and the posterior error probability (PEP) values of the identified peptides. SIGNIFICANCE: For bacterial populations, such as Group G Streptococcus (GGS), with substantial intra-species diversity, simultaneous sequencing of large numbers of strains and generation of proteogenomic databases from those aids in improving the discovery of peptides in mass spectrometric analyses. Therefore, generation of proteogenomic variant protein databases from Pooled sequencing experiments can be a cost-effective method to complement conventional databases and discover subtle strain wise differences.



Last updated on 2024-26-11 at 21:36