Abstrakti

Question Answering models for information extraction from perovskite materials science literature




TekijätSipilä, Matilda; Mehryary, Farrokh; Pyysalo, Sampo; Ginter, Filip, Todorović, Milica

Konferenssin vakiintunut nimiMRS Fall Meeting and Exhibit

Julkaisuvuosi2024

Kokoomateoksen nimi2024 MRS Fall Meeting and Exhibit


Tiivistelmä

Scientific text is a promising source of data in materials science, and there is ongoing research on how to utilize textual data in materials discovery. The recent success of transformer-based language models has led to the development of new machine learning tools. These tools, such as question answering (QA), are now available for information extraction (IE) from scientific literature. The QA models are large language (BERT) models tuned towards an IE task, conducted by asking a comprehensible question. The potential of the QA method lies in its versatility, accessibility and scalability. Human language queries make it easy to use even for researchers with no previous knowledge of language technology. Also, no re-training of QA model is needed to extract information about different materials and properties. 

We explored the IE performance of the QA method on the task of extracting bandgap values of halide perovskite materials from scientific literature. We tested five different BERT models and found that MatBERT model produced the best results. Compared to the more established IE tool ChemDataExtractor2, the QA method performed well, and we were able to collect correct bandgap values from text. Extracted information will next be used to map the space of materials properties and find promising new materials solutions. We implemented this method into a web application to make the QA tool more widely available. Through this work, we seek to lower the barriers for non-experts to use large language models for IE and help democratize use of language technology in materials research. 



Last updated on 2025-06-02 at 10:58