Abstract
Application of the Question Answering method to extract information from materials science literature
Authors: Sipilä, Matilda; Mehryary, Farrokh; Pyysalo, Sampo; Ginter, Filip; Todorović Milica
Conference name: ML4MS
Publication year: 2024
Scientific text is a promising source of data in materials science, and there is ongoing research on how to utilise textual data in materials discovery. In addition to the more established approaches like named entity recognition or dictionary-based methods, new machine learning tools such as question answering (QA) are becoming available. The advantages of this method are that it is easy to scale and that it does not require manual text labeling or annotating work, but there may be some loss in precision compared to other methods.
We tested the performance of the QA method on the well-known task of information extraction. We extracted bandgap values of halide perovskite materials from scientific literature. Large language models (BERT models) were tuned towards a specific QA task and then used to select the correct answer for the question about materials properties. In comparison to more established methods, the QA method performed well, and we were able to extract correct information from text. This information can be used to map the space of materials properties and find promising new materials solutions. The potential in QA method lies in versatility, accessibility and scalability, since it is easy to use even for researchers with no previous knowledge of language technology and can be easily scaled to extract different materials and properties.