A Deep Dive into Multi-Head Attention and Multi-Aspect Embedding - UTU Research Portal

D3 Article in a professional conference publication

A Deep Dive into Multi-Head Attention and Multi-Aspect Embedding

Authors: Teimouri, Maryam; Kanerva, Jenna; Ginter, Filip

Editors: Angelova, Galia; Kunilovskaya, Maria; Escribe, Marie; Mitkov, Ruslan

Conference name: International Conference on Recent Advances in Natural Language Processing

Publication year: 2025

Book title : Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI era

First page : 1263

Last page: 1270

eISBN: 978-954-452-098-4

DOI: https://doi.org/10.26615/978-954-452-098-4-146

Publication's open availability at the time of reporting: Open Access

Publication channel's open availability : Open Access publication channel

Web address : https://doi.org/10.26615/978-954-452-098-4-146

Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/500281244

Abstract

Multi-vector embedding models play an increasingly important role in retrievalaugmented generation, yet their internal behaviour lacks comprehensive analysis. We conduct a systematic, head-level study of the 32-head Semantic Feature Representation (SFR) encoder with the FineWeb corpus containing 10 billion tokens. For a set of 4,000 web documents, we pair head-specific embeddings with GPT-4o topic annotations and analyse the results using t-SNE visualisations, heat maps, and a 32-way logistic probe. The analysis shows that (i) clear semantic separation between heads emerges only at an intermediate layer, (ii) some heads align with specific topics while others capture broader corpus features, and (iii) naive pooling of head outputs can blur these distinctions, leading to frequent topic mismatches. The study offers practical guidance on where to extract embeddings, which heads may be pruned, and how to aggregate them to support more transparent and controllable retrieval pipelines.

Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

teimouri_kanerva_ginter_2025.pdf

Funding information in the publication:
This research was conducted as part of the EU Horizon project SEUS – Smart European Shipbuilding (Grant Agreement No. 101096224), funded by the European Union. Additional support was provided by the Human Diversity Consortium under the Profi7 program of the Research Council of Finland. Computational resources were provided by CSC – IT Center for Science.