K-means clustering and Bisecting K-means in Plain English

Jul 13, 2016

K-means clustering is unsupervised machine learning algorithm.

Demonstration of the standard algorithm
1. k initial “means” (in this case k=3) are randomly generated within the data domain (shown in color).
2. k clusters are created by associating every observation with the nearest mean. The partitions here represent theVoronoi diagram generated by the means.
3. The centroid of each of thek clusters becomes the new mean. Typically, the new centroid will be the average of all points in the cluster.
4. Steps 2 and 3 are repeated until convergence has been reached.

Bisecting K-means:

Pick a cluster to split.

Find 2 sub-clusters using the basic k-Means algorithm (Bisecting step)

Repeat step 2, the bisecting step, take the split that produces the clustering with the highest overall similarity. And there are different ways to proceed, for example, you can choose the biggest cluster or the cluster with the worst quality or a combination of both.

Repeat steps 1, 2 and 3 until the desired number of clusters is reached.

References:

https://en.wikipedia.org/wiki/K-means_clustering

http://minethedata.blogspot.com/2012/08/bisecting-k-means.html