The Impact of Granularity Levels in Program Elements on IR-based Bug Localization

Chakkrit Tantithamthavorn(1251205)


In modern software development, software bugs are so prevalent and inevitable. While software systems continue to grow in size and complexity, finding a bug in millions of instructions is labor and cost intensive for developers. To this end, several studies have proposed Information Retrieval (IR) based bug localization techniques. IR-based bug localization techniques rank source code entities based on their textual similarity to a bug report. The ranked source code entities could be at a file or method granularity level. Files are usually large in size and might contain a lot of methods which are not buggy. Hence, the author conjectures that file-level bug localization requires more effort than method-level bug localization. In this thesis, the author investigated whether method-level bug localization is more practical than file-level bug localization by conducting a large-scale empirical study to compare the results of using the Vector Space Model as a baseline model at the file and method levels. As part of this thesis, the author developed a baseline ground-truth dataset at different granularity levels. The author also presented a new evaluation method using effort-based evaluation. This thesis concludes that method-level bug localization is effective in practice.