A1 Refereed original research article in a scientific journal

Question Answering models for information extraction from perovskite materials science literature




AuthorsSipilä, Matilda; Mehryary, Farrokh; Pyysalo, Sampo; Ginter, Filip; Todorović, Milica

PublisherSpringer Science and Business Media LLC

Publication year2025

Journal: Communications materials

Article number260

Volume6

eISSN2662-4443

DOIhttps://doi.org/10.1038/s43246-025-00979-w

Publication's open availability at the time of reportingOpen Access

Publication channel's open availability Open Access publication channel

Web address https://doi.org/10.1038/s43246-025-00979-w

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/505920997


Abstract

Scientific text is a promising source of data in materials science, with ongoing research into utilising textual data for materials discovery. In this study, we developed and tested a Question Answering (QA) approach to extract material-property relationships from scientific publications. QA performance was evaluated for information extraction of perovskite bandgaps based on a human query. We observed considerable variation in results with five different large language models fine-tuned for the QA task. Best extraction accuracy was achieved with the QA MatSciBERT and F1-scores improved on the current state-of-the-art. QA also outperformed three latest generative large language models on the information extraction task, except the GPT-4 model. This work demonstrates the QA workflow and paves the way towards further applications. The simplicity and versatility of the QA approach all point to its considerable potential for text-driven discoveries in materials research.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Funding information in the publication
The research was funded by the Research Council of Finland through grant number 345698. M.S. thanks the University of Turku Graduate School (UTUGS) and Finnish Cultural Foundation (grant number 241085) grants for doctoral research.


Last updated on 2025-12-12 at 13:36