G5 Doctoral dissertation (article)
Biomedical Event Extraction with Machine Learning




List of Authors: Jari Björne
Publisher: TUCS Dissertations
Publication year: 2014
ISBN: 978-952-12-3078-3

Abstract


Biomedical natural language processing (BioNLP) is a subfield of natural

language processing, an area of computational linguistics concerned

with developing programs that work with natural language: written texts and

speech. Biomedical relation extraction concerns the detection of

semantic relations such as protein--protein interactions (PPI) from scientific

texts. The aim is to enhance information retrieval by detecting relations

between concepts, not just individual concepts as with a keyword search.



In recent years, events have been proposed as a more detailed alternative for

simple pairwise PPI relations. Events provide a systematic, structural

representation for annotating the content of natural language texts. Events are

characterized by annotated trigger words, directed and typed arguments and the

ability to nest other events. For example, the sentence ``Protein A causes

protein B to bind protein C'' can be annotated with the nested event structure

CAUSE(A, BIND(B, C)). Converted to such formal representations, the

information of natural language texts can be used by computational

applications. Biomedical event annotations were introduced by the BioInfer and

GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task

on Event Extraction.



In this thesis we present a method for automated event extraction, implemented

as the Turku Event Extraction System (TEES). A unified graph format is defined

for representing event annotations and the problem of extracting complex event

structures is decomposed into a number of independent classification tasks.

These classification tasks are solved using SVM and RLS classifiers, utilizing

rich feature representations built from full dependency parsing.  Building on

earlier work on pairwise relation extraction and using a generalized graph

representation, the resulting TEES system is capable of detecting binary

relations as well as complex event structures.



We show that this event extraction system has good performance,

reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently,

TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared

Tasks, as well as shown competitive performance in the binary relation Drug-Drug

Interaction Extraction 2011 and 2013 shared tasks.



The Turku Event Extraction System is published as a freely available open-source

project, documenting the research in detail as well as making the method

available for practical applications. In particular, in this thesis we

describe the application of the event extraction method to PubMed-scale text

mining, showing how the developed approach not only shows good performance, but

is generalizable and applicable to large-scale real-world text mining projects.



Finally, we discuss related literature, summarize the contributions of the work

and present some thoughts on future directions for biomedical event extraction.

This thesis includes and builds on six original research publications. The first

of these introduces the analysis of dependency parses that leads to

development of TEES. The entries in the three BioNLP Shared Tasks, as well as

in the DDIExtraction 2011 task are covered in four publications, and the sixth

one demonstrates the application of the system to PubMed-scale text mining.



Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Last updated on 2019-29-01 at 22:35