# Excerpts from Big Data for Health Informatics

Now let’s talk
about Rand Index, or RI. For example, X1 indicate 1 cluster from your
algorithm, and Y1 indicate a cluster from the ground truth. This two points, P1 and P2, belong to the
same cluster in X because they belong to X1, and the same set of points, P1 P2, also belong
to the same cluster in Y. This 2 point, P1 and P2, belong to different
clusters in X and Y. The Rand Index equals to a plus b divided
by the number of all possible pairs, which is n times n minus 1 divided by 2. If it’s 0, it means there’s no agreement
between this two clustering assignments, and if Rand Index equal to 1, that means a perfect
matching. First, we compute the distance matrix between
all pairs of points, then we initialize each point as a cluster. Then we check how many clusters are left. If 1 left, then we’re done. Otherwise, we merge 2 closest cluster into
1 cluster. Then we update the distance matrix with the
remaining clusters, then iterate and until the number of clusters become 1.