A4 Refereed article in a conference publication

TULUN: Transparent and Adaptable Low-resource Machine Translation




AuthorsMerx, Raphael; Suominen, Hanna; Hong, Lois Yinghui; Thieberger, Nick; Cohn, Trevor; Vylomova, Ekaterina

EditorsMishra, Pushkar; Muresan, Smaranda; Yu, Tao

Conference nameAnnual Meeting of the Association for Computational Linguistics

PublisherASSOC COMPUTATIONAL LINGUISTICS-ACL

Publication year2025

Journal: Annual Meeting of the Association for Computational Linguistics

Book title Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics : (Volume 3: System Demonstrations)

Volume63

First page 129

Last page139

ISBN979-8-89176-253-4

ISSN0736-587X

Publication's open availability at the time of reportingOpen Access

Publication channel's open availability Open Access publication channel

Web address https://aclanthology.org/2025.acl-demo.13/

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/506057677


Abstract
Machine translation (MT) systems that support low-resource languages often struggle on specialized domains. While researchers have proposed various techniques for domain adaptation, these approaches typically require model fine-tuning, making them impractical for non-technical users and small organizations. To address this gap, we propose TULUN,(1) a versatile solution for terminology-aware translation, combining neural MT with large language model (LLM)-based post-editing guided by existing glossaries and translation memories. Our open-source web-based platform enables users to easily create, edit, and leverage terminology resources, fostering a collaborative human-machine translation process that respects and incorporates domain expertise while increasing MT accuracy. Evaluations show effectiveness in both real-world and benchmark scenarios: on medical and disaster relief translation tasks for Tetun and Bislama, our system achieves improvements of 16.90-22.41 ChrF++ points over baseline MT systems. Across six low-resource languages on the FLORES dataset, TULUN outperforms both standalone MT and LLM approaches, achieving an average improvement of 2.8 ChrF++ points over NLLB-54B. TULUN is publicly accessible at bislama-trans.rapha.dev.

Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Funding information in the publication
This research was supported by The University of Melbourne’s Research Computing Services and the Petascale Campus Initiative.


Last updated on 13/01/2026 11:08:49 AM