A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

An efficient incremental algorithm for clustering large datasets




TekijätLampainen, Jenni; Joki, Kaisa; Karmitsa, Napsu; Mäkelä, Marko M.

KustantajaSpringer Science and Business Media LLC

Julkaisuvuosi2026

Lehti: Advances in Data Analysis and Classification

ISSN1862-5347

eISSN1862-5355

DOIhttps://doi.org/10.1007/s11634-025-00661-6

Julkaisun avoimuus kirjaamishetkelläAvoimesti saatavilla

Julkaisukanavan avoimuus Osittain avoin julkaisukanava

Verkko-osoitehttps://doi.org/10.1007/s11634-025-00661-6

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/523214179

Rinnakkaistallenteen lisenssiCC BY

Rinnakkaistallennetun julkaisun versioKustantajan versio


Tiivistelmä
Clustering is a fundamental task in data mining and machine learning, particularly for analyzing large-scale data. In this paper, we introduce Clust-Splitter, an efficient algorithm based on novel incremental approach and nonsmooth formulation of the the minimum sum-of-squares clustering problem. Particularly, the clustering task is approached through a sequence of three nonsmooth optimization problems: two auxiliary problems used to generate suitable starting points, followed by a main clustering formulation. To solve these problems effectively in very large datasets, the limited memory bundle method (Haarala et al. in Optim Methods Softw 19(6):673–692, 2004) is applied as an underlying solver in Clust-Splitter. We test and evaluate Clust-Splitter on real-world datasets characterized by both a large number of attributes and a large number of data points and compare its performance with several state-of-the-art large-scale clustering algorithms. Experimental results demonstrate the efficiency of the proposed method for clustering very large datasets, as well as the high quality of its solutions, which are on par with those of the best existing methods.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Julkaisussa olevat rahoitustiedot
The work was financially supported by the Research Council of Finland (projects no. #345804 and #345805 led by Prof. Tapio Pahikkala and Prof. Antti Airola, respectively), and Jenny and Antti Wihuri Foundation. Open Access funding provided by University of Turku (including Turku University Central Hospital).


Last updated on