The density peaks (DP) algorithm for cluster analysis, introduced by Rodriguez and Laio in 2014, has proven empirically competitive or superior in multiple aspects to other contemporary clustering algorithms. Yet, it suffers from certain drawbacks and limitations when used for clustering high-dimensional data. We introduce SD-DP, the sparse dual version of DP. While following the DP principle and maintaining its appealing properties, we establish a sparse descriptor of local density as a robust representation. By analyzing and exploiting the consequential properties, we are able to use sparse graph-matrix expressions and operations throughout the clustering process. As a result, SD-DP has provably linear-scaling computation complexity under practical conditions. We show, with experimental results on several real-world high-dimensional datasets, that SD-DP outperforms DP in robustness, accuracy, self-governance, and efficiency.
Article
-
Dimitris Floros*,
Tiancheng Liu*,
Nikos Pitsianis and
Xiaobai Sun.
Sparse Dual of the Density Peaks Algorithm for
Cluster Analysis of High-Dimensional Data.
Proc. 2018 IEEE High Performance Extreme Computing Conference (HPEC), 2018.
PDF
[Supplementary material]
[BibTex]
The first two authors contributed equally to this work
Best student paper finalist, IEEE HPEC 2018
Slides
-
Presentation Nov. 2018
The presentation contains animations that require any of the
following PDF viewers: Acrobat Reader (version >= 7), PDF-XChange
or Foxit Reader
Supplementary material
Code
To be available soon
Update log
October 10, 2018
- Webpage went online
The presentation contains animations that require any of the following PDF viewers: Acrobat Reader (version >= 7), PDF-XChange or Foxit Reader
Code
To be available soon
Update log
October 10, 2018
- Webpage went online
October 10, 2018
- Webpage went online