A guide to reverse metabolomics—a framework for big data discovery strategy




Charron-Lamoureux, Vincent; Mannochio-Russo, Helena; Lamichhane, Santosh; Xing, Shipei; Patan, Abubaker; Gomes, Paulo Wender Portal; Rajkumar, Prajit; Deleray, Victoria; Caraballo-Rodriguez, Andres Mauricio; Chua, Kee Voon; Lee, Lye Siang; Liu, Zhao; Ching, Jianhong; Wang, Mingxun; Dorrestein, Pieter C.

PublisherSpringer Science and Business Media LLC

BERLIN

2025

Nature Protocols

Nature Protocols

NAT PROTOC

20

10

2960

2993

34

1754-2189

1750-2799

DOIhttps://doi.org/10.1038/s41596-024-01136-2

https://doi.org/10.1038/s41596-024-01136-2



Untargeted metabolomics is evolving into a field of big data science. There is a growing interest within the metabolomics community in mining tandem mass spectrometry (MS/MS)-based data from public repositories. In traditional untargeted metabolomics, samples to address a predefined question are collected and liquid chromatography with MS/MS data are generated. We then identify metabolites associated with a phenotype (for example, disease versus healthy) and elucidate or validate their structural details (for example, molecular formula, structural classification, substructure or complete structural annotation or identification). In reverse metabolomics, we start with MS/MS spectra for known or unknown molecules. These spectra are used as search terms to search public data repositories to discover phenotype-relevant information such as organ/biofluid distribution, disease condition, intervention status (for example, pre- and postintervention), organisms (for example, mammals versus others), geography and any other biologically relevant associations. Here we guide the reader through a four-part process: (1) obtaining the MS/MS spectra of interest (Universal Spectrum Identifier) and (2) Mass Spectrometry Search Tool searches to find the files associated with the MS/MS that are in available databases, (3) using the Reanalysis Data User Interface framework to link the files with their metadata and (4) validating the observations. Parts 1-3 could take from hours to days depending on the method used for collecting MS/MS spectra. For example, we use MS/MS spectra from three small molecules: phenylalanine-cholic acid (a microbially conjugated bile acid), phenylalanine-C4:0 and histidine-C4:0 (two N-acyl amides). We leverage the Global Natural Products Social Molecular Networking-based framework to explore the microbial producers of these molecules and their associations with health conditions and organ distributions in humans and rodents.



V.C.L. is supported by Fonds de recherche du Quebec - Sante (FRQS) postdoctoral fellowship (335368). This is supported, in part, by the National Institutes of Health (NIH) for the NIH collaborative microbial metabolite center U24DK133658; harmonization of metabolomics metadata across repositories R03OD034493. This project was enabled in part by the Alzheimer's Gut Microbiome Project (AGMP), supported by the National Institute on Aging grants 1U19AG063744 and 3U19AG063744-04S1, awarded to Kaddurah-Daouk at Duke University in partnership with multiple academic institutions. As such, the investigators within the AGMP not listed in this publication's authors' list, provided analysis-ready data, but did not participate in designing the study, conducting the analyses or writing of this manuscript. A listing of AGMP Investigators can be found at https://alzheimergut.org/meet-the-team/. A complete listing of the AD Metabolomics Consortium investigators can be found at https://sites.duke.edu/adnimetab/team/ and BBSRC-NSF award 2152526. A.M.C.-R. and P.C.D. were supported by the Gordon and Betty Moore Foundation, GBMF12120. S.L was supported by the Research Council of Finland (decision number 363417) and the InFLAMES Flagship Programme of the Research Council of Finland (decision number 337530). M.W. is supported by NIH 5U24DK133658-02 and was partially supported by the US Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, is supported by the Office of Science of the US Department of Energy operated under contract number DE-AC02-05CH11231.


Last updated on 2025-20-10 at 16:06