You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
80 lines
2.5 KiB
80 lines
2.5 KiB
5 years ago
|
C++ interface to fast hierarchical clustering algorithms
|
||
|
========================================================
|
||
|
|
||
|
This is a simplified C++ interface to fast implementations of hierarchical
|
||
|
clustering by Daniel Müllner. The original library with interfaces to R
|
||
|
and Python is described in:
|
||
|
|
||
|
Daniel Müllner: "fastcluster: Fast Hierarchical, Agglomerative Clustering
|
||
|
Routines for R and Python." Journal of Statistical Software 53 (2013),
|
||
|
no. 9, pp. 1–18, http://www.jstatsoft.org/v53/i09/
|
||
|
|
||
|
|
||
|
Usage of the library
|
||
|
--------------------
|
||
|
|
||
|
For using the library, the following source files are needed:
|
||
|
|
||
|
fastcluster_dm.cpp, fastcluster_R_dm.cpp
|
||
|
original code by Daniel Müllner
|
||
|
these are included by fastcluster.cpp via #include, and therefore
|
||
|
need not be compiled to object code
|
||
|
|
||
|
fastcluster.[h|cpp]
|
||
|
simplified C++ interface
|
||
|
fastcluster.cpp is the only file that must be compiled
|
||
|
|
||
|
The library provides the clustering function *hclust_fast* for
|
||
|
creating the dendrogram information in an encoding as used by the
|
||
|
R function *hclust*. For a description of the parameters, see fastcluster.h.
|
||
|
Its parameter *method* can be one of
|
||
|
|
||
|
HCLUST_METHOD_SINGLE
|
||
|
single link with the minimum spanning tree algorithm (Rohlf, 1973)
|
||
|
|
||
|
HHCLUST_METHOD_COMPLETE
|
||
|
complete link with the nearest-neighbor-chain algorithm (Murtagh, 1984)
|
||
|
|
||
|
HCLUST_METHOD_AVERAGE
|
||
|
complete link with the nearest-neighbor-chain algorithm (Murtagh, 1984)
|
||
|
|
||
|
HCLUST_METHOD_MEDIAN
|
||
|
median link with the generic algorithm (Müllner, 2011)
|
||
|
|
||
|
For splitting the dendrogram into clusters, the two functions *cutree_k*
|
||
|
and *cutree_cdist* are provided.
|
||
|
|
||
|
Note that output parameters must be allocated beforehand, e.g.
|
||
|
int* merge = new int[2*(npoints-1)];
|
||
|
For a complete usage example, see lines 135-142 of demo.cpp.
|
||
|
|
||
|
|
||
|
Demonstration program
|
||
|
---------------------
|
||
|
|
||
|
A simple demo is implemented in demo.cpp, which can be compiled and run with
|
||
|
|
||
|
make
|
||
|
./hclust-demo -m complete lines.csv
|
||
|
|
||
|
It creates two clusters of line segments such that the segment angle between
|
||
|
line segments of different clusters have a maximum (cosine) dissimilarity.
|
||
|
For visualizing the result, plotresult.r can be used as follows
|
||
|
(requires R <https://r-project.org> to be installed):
|
||
|
|
||
|
./hclust-demo -m complete lines.csv | Rscript plotresult.r
|
||
|
|
||
|
|
||
|
Authors & Copyright
|
||
|
-------------------
|
||
|
|
||
|
Daniel Müllner, 2011, <http://danifold.net>
|
||
|
Christoph Dalitz, 2018, <http://www.hsnr.de/ipattern/>
|
||
|
|
||
|
|
||
|
License
|
||
|
-------
|
||
|
|
||
|
This code is provided under a BSD-style license.
|
||
|
See the file LICENSE for details.
|