Overview of DrugProt BioCreative VII track: quality evaluation and large scale text mining of drug-gene/protein relations - UTU Research Portal

B3 Non-refereed article in a conference publication

Overview of DrugProt BioCreative VII track: quality evaluation and large scale text mining of drug-gene/protein relations

Authors: Miranda Antonio, Mehryary Farrokh, Luoma Jouni, Pyysalo Sampo, Valencia Alfonso, Krallinger Martin

Editors: N/A

Conference name: BioCreative

Publication year: 2021

Book title : Proceedings of the BioCreative VII Challenge Evaluation Workshop

eISBN: 978-0-578-32368-8

Web address : https://biocreative.bioinformatics.udel.edu/resources/publications/bc-vii-workshop-proceedings/

Abstract

Considering recent progress in NLP, deep learning techniques and biomedical language models there is a pressing need to generate annotated resources and comparable evaluation scenarios that enable the development of advanced biomedical relation extraction systems that extract interactions between drugs/chemical entities and genes, proteins or miRNAs. Building on the results and experience of the CHEMDNER, CHEMDNER patents and ChemProt tracks, we have posed the DrugProt track at BioCreative VII. The DrugProt track focused on the evaluation of automatic systems able to extract 13 different types of drug-genes/protein relations of importance to understand gene regulatory and pharmacological mechanisms. The DrugProt track addressed regulatory associations (direct/indirect, activator/inhibitor relations), certain types of binding associations (antagonist and agonist relations) as well as metabolic associations (substrate or product relations). To promote development of novel tools and offer a comparative evaluation scenario we have released 61,775 manually annotated gene mentions, 65,561 chemical and drug mentions and a total of 24,526 relationships manually labeled by domain experts. A total of 30 teams submitted results for the DrugProt main track, while 9 teams submitted results for the large-scale text mining subtrack that required processing of over 2,3 million records. Teams obtained very competitive results, with predictions reaching fmeasures of over 0.92 for some relation types (antagonist) and fmeasures across all relation types close to 0.8.