A4 Vertaisarvioitu artikkeli konferenssijulkaisussa

TULUN: Transparent and Adaptable Low-resource Machine Translation




TekijätMerx, Raphael; Suominen, Hanna; Hong, Lois Yinghui; Thieberger, Nick; Cohn, Trevor; Vylomova, Ekaterina

ToimittajaMishra, Pushkar; Muresan, Smaranda; Yu, Tao

Konferenssin vakiintunut nimiAnnual Meeting of the Association for Computational Linguistics

KustantajaASSOC COMPUTATIONAL LINGUISTICS-ACL

Julkaisuvuosi2025

Lehti: Annual Meeting of the Association for Computational Linguistics

Kokoomateoksen nimiProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics : (Volume 3: System Demonstrations)

Vuosikerta63

Aloitussivu129

Lopetussivu139

ISBN979-8-89176-253-4

ISSN0736-587X

Julkaisun avoimuus kirjaamishetkelläAvoimesti saatavilla

Julkaisukanavan avoimuus Kokonaan avoin julkaisukanava

Verkko-osoitehttps://aclanthology.org/2025.acl-demo.13/

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/506057677


Tiivistelmä
Machine translation (MT) systems that support low-resource languages often struggle on specialized domains. While researchers have proposed various techniques for domain adaptation, these approaches typically require model fine-tuning, making them impractical for non-technical users and small organizations. To address this gap, we propose TULUN,(1) a versatile solution for terminology-aware translation, combining neural MT with large language model (LLM)-based post-editing guided by existing glossaries and translation memories. Our open-source web-based platform enables users to easily create, edit, and leverage terminology resources, fostering a collaborative human-machine translation process that respects and incorporates domain expertise while increasing MT accuracy. Evaluations show effectiveness in both real-world and benchmark scenarios: on medical and disaster relief translation tasks for Tetun and Bislama, our system achieves improvements of 16.90-22.41 ChrF++ points over baseline MT systems. Across six low-resource languages on the FLORES dataset, TULUN outperforms both standalone MT and LLM approaches, achieving an average improvement of 2.8 ChrF++ points over NLLB-54B. TULUN is publicly accessible at bislama-trans.rapha.dev.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Julkaisussa olevat rahoitustiedot
This research was supported by The University of Melbourne’s Research Computing Services and the Petascale Campus Initiative.


Last updated on