Software Document Classification Using N-gram IDF and Automated Machine Learning

Rungroj Maipradit ( 1751208 )


In software development, there are many documents regarding software engineering activities. It is a time-consuming task to manually classify the type of documents based on their contents, in order to overcome time-consuming problems an automated tool is required. We proposed a framework to automatically classify software document using N-gram IDF and Auto-sklearn. In this research, we apply the framework on two problems, identifying on-hold self-admitted technical debt and sentiment classification. In the first problem, our framework achieved F1 score of 0.73 compared to naive baseline which has 0.31 score. In the second problem, our framework achieved the highest f1 score in positive and negative sentences which is comparable with well-known sentiment classification tools. The result shows that our framework performs better working than traditional sentiment classification tools. In both cases our framework outperforms baselines.

In software development, there are many documents regarding software engineering activities. It is a time-consuming task to manually classify the type of documents based on their contents, in order to overcome time-consuming problems an automated tool is required. We proposed a framework to automatically classify software document using N-gram IDF and Auto-sklearn. In this research, we apply the framework on two problems, identifying on-hold self-admitted technical debt and sentiment classification. In the first problem, our framework achieved F1 score of 0.73 compared to naive baseline which has 0.31 score. In the second problem, our framework achieved the highest f1 score in positive and negative sentences which is comparable with well-known sentiment classification tools. The result shows that our framework performs better working than traditional sentiment classification tools. In both cases our framework outperforms baselines.