A4 Vertaisarvioitu artikkeli konferenssijulkaisussa
Regularized Least-Squares for parse ranking
Tekijät: Tsivtsivadze E, Pahikkala T, Pyysalo S, Boberg J, Myllari A, Salakoski T
Toimittaja: Famili A Fazel, Kok Joost N, Peña José Manuel, Siebes Arno, Feelders, A. J.
Konferenssin vakiintunut nimi: 6th International Symposium on Intelligent Data Analysis
Julkaisuvuosi: 2005
Journal: Lecture Notes in Computer Science
Kokoomateoksen nimi: Proceedings of the 6th International Symposium on Intelligent Data Analysis
Tietokannassa oleva lehden nimi: ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS
Lehden akronyymi: LECT NOTES COMPUT SC
Vuosikerta: 3646
Aloitussivu: 464
Lopetussivu: 474
Sivujen määrä: 11
ISBN: 3-540-28795-7
ISSN: 0302-9743
Tiivistelmä
We present an adaptation of the Regularized Least-Squares algorithm for the rank learning problem and an application of the method to reranking of the parses produced by the Link Grammar (LG) dependency parser. We study the use of several grammatically motivated features extracted from parses and evaluate the ranker with individual features and the combination of all features on a set of biomedical sentences annotated for syntactic dependencies. Using a parse goodness function based on the F-score, we demonstrate that our method produces a statistically significant increase in rank correlation from 0.18 to 0.42 compared to the built-in ranking heuristics of the LG parser. Further, we analyze the performance of our ranker with respect to the number of sentences and parses per sentence used for training and illustrate that the method is applicable to sparse datasets, showing improved performance with as few as 100 training sentences.
We present an adaptation of the Regularized Least-Squares algorithm for the rank learning problem and an application of the method to reranking of the parses produced by the Link Grammar (LG) dependency parser. We study the use of several grammatically motivated features extracted from parses and evaluate the ranker with individual features and the combination of all features on a set of biomedical sentences annotated for syntactic dependencies. Using a parse goodness function based on the F-score, we demonstrate that our method produces a statistically significant increase in rank correlation from 0.18 to 0.42 compared to the built-in ranking heuristics of the LG parser. Further, we analyze the performance of our ranker with respect to the number of sentences and parses per sentence used for training and illustrate that the method is applicable to sparse datasets, showing improved performance with as few as 100 training sentences.