A1 Refereed original research article in a scientific journal
KANN: estimation of genetic ancestry profiles by nearest neighbor regression
Authors: Riikonen, Juha; Kerminen, Sini; Havulinna, Aki; Pirinen, Matti
Publisher: Oxford University Press (OUP)
Publication year: 2026
Journal: Nucleic Acids Research
Article number: gkag209
Volume: 54
Issue: 5
ISSN: 0305-1048
eISSN: 1362-4962
DOI: https://doi.org/10.1093/nar/gkag209
Publication's open availability at the time of reporting: Open Access
Publication channel's open availability : Open Access publication channel
Web address : https://doi.org/10.1093/nar/gkag209
Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/516225679
Self-archived copy's licence: CC BY
Self-archived copy's version: Publisher`s PDF
State-of-the-art methods for inferring individual-level genetic ancestry are based on statistical models for haplotype data. Unfortunately, these methods are computationally demanding, making them impractical for biobank-scale analyses. In this paper, we describe KANN, an efficient k-nearest neighbor regression method for individual-level ancestry estimation with respect to predefined source populations using only principal components of genetic structure. Contrary to the existing tools that can only use reference samples with discrete source population assignment, KANN enables the use of reference samples with continuous ancestry profiles across multiple source populations. We observe that KANN’s ancestry estimates agree well with the haplotype-based method SOURCEFIND when estimating ancestry profiles across up to 10 Finnish source populations on a dataset of 18 125 Finnish samples from THL Biobank. In the 1000 Genomes Project data containing globally diverse genetic backgrounds, KANN produces highly similar results to the ADMIXTURE software. Based on our results, KANN is a promising tool for ancestry estimation in large-scale genomic studies.
Downloadable publication This is an electronic reprint of the original article. |
Funding information in the publication:
This work was supported by the Sigrid Jusélius Foundation [8047 to M.P.] and the Research Council of Finland [338507, 352795, and 336285 to M.P.]. Funding to pay the Open Access publication charges for this article was provided by the Helsinki University Library.