A1 Refereed original research article in a scientific journal
Question Answering models for information extraction from perovskite materials science literature
Authors: Sipilä, Matilda; Mehryary, Farrokh; Pyysalo, Sampo; Ginter, Filip; Todorović, Milica
Publisher: Springer Science and Business Media LLC
Publication year: 2025
Journal: Communications materials
Article number: 260
Volume: 6
eISSN: 2662-4443
DOI: https://doi.org/10.1038/s43246-025-00979-w
Publication's open availability at the time of reporting: Open Access
Publication channel's open availability : Open Access publication channel
Web address : https://doi.org/10.1038/s43246-025-00979-w
Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/505920997
Scientific text is a promising source of data in materials science, with ongoing research into utilising textual data for materials discovery. In this study, we developed and tested a Question Answering (QA) approach to extract material-property relationships from scientific publications. QA performance was evaluated for information extraction of perovskite bandgaps based on a human query. We observed considerable variation in results with five different large language models fine-tuned for the QA task. Best extraction accuracy was achieved with the QA MatSciBERT and F1-scores improved on the current state-of-the-art. QA also outperformed three latest generative large language models on the information extraction task, except the GPT-4 model. This work demonstrates the QA workflow and paves the way towards further applications. The simplicity and versatility of the QA approach all point to its considerable potential for text-driven discoveries in materials research.
Downloadable publication This is an electronic reprint of the original article. |
Funding information in the publication:
The research was funded by the Research Council of Finland through grant number 345698. M.S. thanks the University of Turku Graduate School (UTUGS) and Finnish Cultural Foundation (grant number 241085) grants for doctoral research.