Investigating Code Examples: Coverage Metrics and Trait Patterns

Stevche Radevski ( 1551129 )


It is not unreasonable to say that examples are one of the most commonly used knowledge sources when learning the usage and best practices of a new API. Despite that, there is very little knowledge on what are the characteristics of code examples, if there are any emerging patterns across examples, and what are the similarities between examples from different sources.

This thesis aims at improving the understanding of code examples. Divided in 2 studies, the first study is concerned with investigating the current situation of coding examples, common problems, and potential metrics that can be used to evaluate examples. The second study is concerned with collecting code examples data to be used for further analysis, definition of traits and features for code examples and creation of code example clusters, and analysis of the patterns that emerge from existing examples.

The results have shown that examples vary in both quantity and quality between different libraries. Compilation errors, lack of error handling, and mistaken naming in the same example are some of the common occurrences in examples. Example coverage is a potential metric that can be used to evaluate examples, but calculating can be complicated. After analyzing the trait patterns that occur in two different datasets, containing 1762 code examples, three pure patterns with two sub-patterns per pattern emerged across the two datasets. Despite their immaturity, the emerging patterns can be considered as the basis for automated code example evaluation and derivation of best practices guidelines for writing code examples.