Improving Automation in Bug Report Categorization and Defect Prediction

Nachai Limsettho (1361206)


Many automated software engineering techniques have been proposed to help the process of bug report categorization and defect prediction. While the outcome of these techniques can be rewarding, the process of deploying them is often difficult and labor intensive. This dissertation focuses on improving the ease of deployment for bug report categorization and defect prediction with less human resources. This dissertation investigates and proposes the solutions for three aspects of automated software engineering techniques: nonparametric preprocessing of natural language, cross-project prediction, and unsupervised categorization. In preprocessing, a new approach is proposed to extract feature vectors from natural language in the bug reports and conducted experiments. The experimental results showed that the new features still retain the pattern which can easily be categorized by classifier algorithm. The cross-project prediction and unsupervised categorization tackle the same problem, that is, the unavailability of much historical training dataset. When a similar dataset from another project is available, cross-project approach can be used. This dissertation proposes a technique for improving the cross-project performance by taking the distribution of the target unlabeled project into account. Compared with conventional techniques, the experimental results showed that the prediction performances were significantly improved. Lastly, the unsupervised categorization framework is proposed for the situations where the similar dataset is unavailable. Using clustering and cluster labeling techniques, the proposed framework could automatically categorize bug reports with comparable performance to the supervised approach. The conclusion is that the proposed techniques and framework could reduce human efforts required for the deployment of bug report categorization and defect prediction techniques, while still retain their performances compared to conventional techniques.