Learning from Data of Varying Qulity for Sentence Role Identification in MEDLINE Abstracts

Natalia Aizenberg (0551138)

The abstract of a scientific paper typically consists of sentences describing the background and objective of the study as well as its experimental methods, results and conclusions.

There has been an increasing interest in recent years in identifying such structural roles, with particular motivations from the information retrieval point of view. In previous research done with respect to MEDLINE abstract, various sentence feature combinations were used in order to achieve successful performance, but one important issue has not yet been addressed: the unrepresentativeness of the major part of learning data; in this task, the learning set samples tend to originate from different sources baring many differences, while the application data source distribution does not necessarily obey that of the learning set.

In this work we solve the issue mentioned previously by applying “example source" sensitive costs in the training process.