Incremental and Parallel Learning Algorithms for Data Stream Knowledge Discovery

ZHU LEI (1561211)


Incremental and parallel are two appeal capabilities for machine learning algorithms to accommodate data from real world applications. With the rise of BigData, data become simultaneously large scale and streaming, which is our motivation of addressing incremental, and parallel incremental learning in this work. We first consider the incremental learning alone. An augmenting path based online max-flow algorithm is proposed in Chapter 3. The proposed algorithm is then applied to upgrade an existing batch semi-supervised learning algorithm know as graph minicuts to be incremental in Chapter 4. In these works, the training speed is not in good satisfactory when data is huge. A straightforward solution is to combine parallel data processing with incremental learning. We solve these two learning problems in one process (i.e. PI integration) and develop parallel incremental wESVM in Chapter 6. In the track of data stream knowledge discovery, we investigated incremental machine learning and invented a wESVM based parallel learning and incremental learning integrated system. The limitation of our work is PI integration applies only to models satisfy the knowledge mergeable condition. Future work on top of the presented research lies at how to release the constraint, and expand PI integration to other models such as SVM and neural network.