Explaining Classes through Stable Word Attributions - UTU Research Portal

A4 Refereed article in a conference publication

Explaining Classes through Stable Word Attributions

Authors: Rönnqvist Samuel, Myntti Amanda, Kyröläinen Aki-Juhani, Ginter Filip, Laippala Veronika

Editors: Smaranda Muresan, Preslav Nakov, Aline Villavicencio

Conference name: Annual Meeting of the Association for Computational Linguistics

Publication year: 2022

Journal: Annual Meeting of the Association for Computational Linguistics

Book title : The 60th Annual Meeting of the Association for Computational Linguistics: Findings of ACL 2022

Journal name in source: FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022)

Series title: Annual Meeting of the Association for Computational Linguistics

Volume: 60

First page : 1063

Last page: 1074

Number of pages: 12

ISBN: 978-1-955917-25-4

DOI: https://doi.org/10.18653/v1/2022.findings-acl.85

Publication's open availability at the time of reporting: Open Access

Publication channel's open availability : Open Access publication channel

Web address : https://aclanthology.org/2022.findings-acl.85

Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/176874206

Abstract

Input saliency methods have recently become a popular tool for explaining predictions of deep learning models in NLP. Nevertheless, there has been little work investigating methods for aggregating prediction-level explanations to the class level, nor has a framework for evaluating such class explanations been established. We explore explanations based on XLM-R and the Integrated Gradients input attribution method, and propose 1) the Stable Attribution Class Explanation method (SACX) to extract keyword lists of classes in text classification tasks, and 2) a framework for the systematic evaluation of the keyword lists. We find that explanations of individual predictions are prone to noise, but that stable explanations can be effectively identified through repeated training and explanation. We evaluate on web register data and show that the class explanations are linguistically meaningful and distinguishing of the classes.

Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

Explaining classes through Stable WOrld Attributions.pdf