# Why so many clustering algorithms: a position paper

@article{EstivillCastro2002WhySM, title={Why so many clustering algorithms: a position paper}, author={Vladimir Estivill-Castro}, journal={SIGKDD Explor.}, year={2002}, volume={4}, pages={65-75} }

We argue that there are many clustering algorithms, because the notion of "cluster" cannot be precisely defined. Clustering is in the eye of the beholder, and as such, researchers have proposed many induction principles and models whose corresponding optimization problem can only be approximately solved by an even larger number of algorithms. Therefore, comparing clustering algorithms, must take into account a careful understanding of the inductive principles involved.

#### Topics from this paper

#### 546 Citations

Empirical Analysis of Data Clustering Algorithms

- Computer Science
- 2018

Different clustering approaches are studied from the theoretical perspective to understand their relevance in context of massive data-sets and empirically these have been tested on artificial benchmarks to highlight their strengths and weaknesses. Expand

Approximation Algorithms for Clustering

- 2019

Agglomerative hierarchical clustering is an important clustering algorithm which has many real life applications such as customer segmentation. Its biggest drawback is its large time complexity of… Expand

Introduction to partitioning-based clustering methods with a robust example

- Computer Science
- 2006

A new robust partitioning-based method is presented and a review on iterative relocation clustering algorithms, and some illustrative results are presented. Expand

Sequentially Grouping Items into Clusters of Unspecified Number

- Computer Science
- IC2IT
- 2017

It is shown how sequentially obtained cluster sets can be improved by reclustering, and how items considered as outliers can be removed. Expand

Number 4

- 2019

Till date, different papers are available on survey of clustering algorithms. The novel approach used in this paper is use of Mind Maps to present key details about clustering algorithms in visual… Expand

A brief study on clustering methods: Based on the k-means algorithm

- Computer Science
- 2011 International Conference on E-Business and E-Government (ICEE)
- 2011

A process model for data mining and the typical requirements of clustering methods have been described and the k-means algorithm and its advantages and disadvantages are introduced. Expand

Common Clustering Algorithms

- Computer Science
- 2009

This chapter surveys common clustering algorithms widely used in the data mining community in light of chemometrics, and overviews hybrid clustering approaches combining partitioning clustering and hierarchical clustering. Expand

An optimization approach to partitional data clustering

- Computer Science
- J. Oper. Res. Soc.
- 2009

Numerical results show that computation time can be dramatically reduced by using a partial set of instances without sacrificing solution quality, and these results are more persuasive as the size of the problem is larger. Expand

Cluster Validity Using Support Vector Machines

- Computer Science
- DaWaK
- 2003

A method to compare clustering results from different algorithms or different runs of the same algorithm, but it can also filter noise and outliers so that for a fixed data set the authors can identify what is the most robust and potentially meaningful clustering result. Expand

A mathematical model of similarity and clustering

- Computer Science
- International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004.
- 2004

An abstract model of data similarity and clustering is introduced, and a heuristic method to search for sub-optimal clusterings for a given tolerance relation is proposed. Expand

#### References

SHOWING 1-10 OF 49 REFERENCES

On Some Clustering Techniques

- Computer Science
- IBM J. Res. Dev.
- 1964

A number of methods which make use of IBM 7090 computer programs to do clustering are described, and a medical research problem is used to illustrate and compare these methods. Expand

Non-crisp Clustering by Fast, Convergent, and Robust Algorithms

- Computer Science
- PKDD
- 2001

These algorithms are robust because they use medians rather than means as estimators of location, and the resulting representative of a cluster is actually a data item, and it is demonstrated mathematically that they converge. Expand

Data clustering: a review

- Computer Science
- CSUR
- 1999

An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. Expand

Efficient and Effective Clustering Methods for Spatial Data Mining

- Computer Science
- VLDB
- 1994

The analysis and experiments show that with the assistance of CLAHANS, these two algorithms are very effective and can lead to discoveries that are difficult to find with current spatial data mining algorithms. Expand

A human-computer cooperative system for effective high dimensional clustering

- Computer Science
- KDD '01
- 2001

A system which performs high dimensional clustering by effective cooperation between the human and the computer in order to create very meaningful sets of clusters in high dimensionality is proposed. Expand

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

- Computer Science
- KDD
- 1996

DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it. Expand

Chameleon: Hierarchical Clustering Using Dynamic Modeling

- Computer Science
- Computer
- 1999

Chameleon's key feature is that it accounts for both interconnectivity and closeness in identifying the most similar pair of clusters, which is important for dealing with highly variable clusters. Expand

CURE: an efficient clustering algorithm for large databases

- Computer Science
- SIGMOD '98
- 1998

This work proposes a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size, and demonstrates that random sampling and partitioning enable CURE to not only outperform existing algorithms but also to scale well for large databases without sacrificing clustering quality. Expand

Quality Scheme Assessment in the Clustering Process

- Computer Science
- PKDD
- 2000

This paper presents an approach for evaluation of clustering schemes (partitions) so as to find the best number of clusters, which occurs in a specific data set, and selects the best clustering scheme according to a quality index. Expand

BIRCH: an efficient data clustering method for very large databases

- Computer Science
- SIGMOD '96
- 1996

A data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) is presented, and it is demonstrated that it is especially suitable for very large databases. Expand