K-Means Clustering

K-Means Clustering is an unsupervised learning algorithm. It works by grouping similar data points together to try to find underlying patterns.

The number of groups are pre-defined by the user as K.

How the Algorithm works

Before the iterative update starts, a random selection of centroid locations are picked on the graph. These centroids act as the beginning points for each cluster. (if K = 5, there will be 5 random centroids)

  1. Data Assignment Step
    • Each data point is assigned to its nearest centroid, based on the squared Euclidean distance
  2. Centroid Update
    • Given the new data points, re-calculate the centroid value
  3. Repeat until centroid no longer changes, or until a stopping criteria.

Choosing K

How do we choose K? Well, iteratively of cause. We define K to be a range of values, and run K-mean clustering through those values.


This was a pretty short post, but it acts as a summary of how K-means clustering works!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: