====== Vector quantization / Clustering ====== ==References== - [[http://www.comp.lancs.ac.uk/~kristof/research/notes/clustr/index.html|Methods Overview [EN] ]] ===== Sequential leader ===== For every new sample : * if the distance between the sample and a cluster is smaller than a given threshold, then add the sample to the cluster, else create a new cluster with the sample ===== Pairwise Clustering ===== Initially every sample is a cluster.\\ Repeat until the desired number of clusters is obtained : * merge the two closest clusters ===== k-means ===== //k-moyennes// Randomly chose k clusters.\\ For every sample : * decrease the distance between the closest cluster to the sample, and the sample (using a learning rate) * decrease the learning rate in time ==== k-means++ ==== A variant that initializes centers so that there is a guarantee in accuracy, and a faster convergence : * chose the first center randomly with uniform distribution among the samples * chose the next centers randomly with probability proportional to the minimum distance of the sample to the already chosen centers. == References == * [[http://www.stanford.edu/~sergeiv/papers/kMeansPP-soda.pdf|2006, Arthur-Vassilvitskii, k-means++: The Advantages of Careful Seeding]] ==== Elbow criterion ==== A way to chose the optimal number of clusters k. Compute for different number of clusters the ratio of the intra-clusters variance to the total variance. The optimal number of clusters is when adding clusters do not bring significant decrease of the ratio. ===== GNG (Growing Neural Gas) ===== ===== Kohonen auto-organizing maps ===== //Cartes auto-organisatrices de Kohonen//