K-Means Clustering is an unsupervised learning algorithm. It works by grouping similar data points together to try to find underlying patterns.
The number of groups are pre-defined by the user as K.
How the Algorithm works
Before the iterative update starts, a random selection of centroid locations are picked on the graph. These centroids act as the beginning points for each cluster. (if K = 5, there will be 5 random centroids)
- Data Assignment Step
- Each data point is assigned to its nearest centroid, based on the squared Euclidean distance
- Centroid Update
- Given the new data points, re-calculate the centroid value
- Repeat until centroid no longer changes, or until a stopping criteria.
How do we choose K? Well, iteratively of cause. We define K to be a range of values, and run K-mean clustering through those values.
This was a pretty short post, but it acts as a summary of how K-means clustering works!