Acceleration of K-means Clustering by K-dijkstra method for Graph Partitioning

Zhang Anna (1351213)


Currently internet produces huge amount of data every day, the data set can be used well for big data processing or some other work. One algorithm comes from GRAPHLAB performs well on graph partitioning, it can partition a huge relationship network into several groups, we call it Traditional Partitioning. But due to the design logic, Traditional Partitioning has a low eciency and some other drawbacks.

In this paper I will introduce a new idea K-dijkstra. K-dijkstra is using advanced BFS (breadth rst search) and dijkstra, the partition idea is based on the idea of k-means, which takes important role in Traditional Partitioning. At the same time, K-dijkstra has a high eciency with a forceful partition result, it observes a 100x speed-up compared with Traditional Partitioning.

In order to satisfy more users requirement, a hybrid algorithm which called K-mix will also be introduced at the end. K-mix has both the high precision as Traditional Partitioning, when deal with same data set, K-mix may mostly save 15% process time compared with Traditional Partitioning; and thanks to K-dijkstra, it is much faster than Traditional Partitioning. So total three ideas are explained in the paper, users can select the favorite one to use for graph partitioning, and optimize it to a much better e ectiveness.