A1 Refereed original research article in a scientific journal
Clustering in large data sets with the limited memory bundle method
Authors: Napsu Karmitsa, Adil M. Bagirov, Sona Taheri
Publisher: ELSEVIER SCI LTD
Publication year: 2018
Journal: Pattern Recognition
Journal name in source: PATTERN RECOGNITION
Journal acronym: PATTERN RECOGN
Volume: 83
First page : 245
Last page: 259
Number of pages: 15
ISSN: 0031-3203
eISSN: 1873-5142
DOI: https://doi.org/10.1016/j.patcog.2018.05.028
Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/35725660
Abstract
The aim of this paper is to design an algorithm based on nonsmooth optimization techniques to solve the minimum sum-of-squares clustering problems in very large data sets. First, the clustering problem is formulated as a nonsmooth optimization problem. Then the limited memory bundle method [Haarala et al., 2007] is modified and combined with an incremental approach to design a new clustering algorithm. The algorithm is evaluated using real world data sets with both the large number of attributes and the large number of data points. It is also compared with some other optimization based clustering algorithms. The numerical results demonstrate the efficiency of the proposed algorithm for clustering in very large data sets.
The aim of this paper is to design an algorithm based on nonsmooth optimization techniques to solve the minimum sum-of-squares clustering problems in very large data sets. First, the clustering problem is formulated as a nonsmooth optimization problem. Then the limited memory bundle method [Haarala et al., 2007] is modified and combined with an incremental approach to design a new clustering algorithm. The algorithm is evaluated using real world data sets with both the large number of attributes and the large number of data points. It is also compared with some other optimization based clustering algorithms. The numerical results demonstrate the efficiency of the proposed algorithm for clustering in very large data sets.
Downloadable publication This is an electronic reprint of the original article. |