A1 Refereed original research article in a scientific journal

KANN: estimation of genetic ancestry profiles by nearest neighbor regression




AuthorsRiikonen, Juha; Kerminen, Sini; Havulinna, Aki; Pirinen, Matti

PublisherOxford University Press (OUP)

Publication year2026

Journal: Nucleic Acids Research

Article numbergkag209

Volume54

Issue5

ISSN0305-1048

eISSN1362-4962

DOIhttps://doi.org/10.1093/nar/gkag209

Publication's open availability at the time of reportingOpen Access

Publication channel's open availability Open Access publication channel

Web address https://doi.org/10.1093/nar/gkag209

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/516225679

Self-archived copy's licenceCC BY

Self-archived copy's versionPublisher`s PDF


Abstract

State-of-the-art methods for inferring individual-level genetic ancestry are based on statistical models for haplotype data. Unfortunately, these methods are computationally demanding, making them impractical for biobank-scale analyses. In this paper, we describe KANN, an efficient k-nearest neighbor regression method for individual-level ancestry estimation with respect to predefined source populations using only principal components of genetic structure. Contrary to the existing tools that can only use reference samples with discrete source population assignment, KANN enables the use of reference samples with continuous ancestry profiles across multiple source populations. We observe that KANN’s ancestry estimates agree well with the haplotype-based method SOURCEFIND when estimating ancestry profiles across up to 10 Finnish source populations on a dataset of 18 125 Finnish samples from THL Biobank. In the 1000 Genomes Project data containing globally diverse genetic backgrounds, KANN produces highly similar results to the ADMIXTURE software. Based on our results, KANN is a promising tool for ancestry estimation in large-scale genomic studies.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Funding information in the publication
This work was supported by the Sigrid Jusélius Foundation [8047 to M.P.] and the Research Council of Finland [338507, 352795, and 336285 to M.P.]. Funding to pay the Open Access publication charges for this article was provided by the Helsinki University Library.


Last updated on 08/04/2026 11:29:30 AM