K-means algorithm in the optimal initial centroids based on dissimilarity

Wang Shunye, Cui Yeqin, Jin Zu

Abstract

K-means clustering algorithm is one of the most popular clustering algorithms and has been applied in many fields. A major problem of the basic k-means clustering algorithm is that the cluster result heavily depends on the initial centroids which are chosen at random. At the same time, it is not suitable for the sparse spatial datasets which use space distance as the similarity measurement on the algorithm. In this paper, an improved k-means clustering algorithm in the optimal initial centroids based on dissimilarity is proposed. It adopts the dissimilarity to reflect the degree of correlation between data objects, and then uses Huffman tree to find the initial centroids. Many experiments confirm that the proposed algorithm is an efficient algorithm with better clustering accuracy on the same mainly time complexity.

Relevant Publications in Journal of Chemical and Pharmaceutical Research