Perspectives on the Marking of Discourse Relations: Cognitive Models and Machine Translation

Frances Pikyu Yung (1461023)


Discourse relations are the semantic and pragmatic relations between sentences and clauses that make a discourse coherent. On one hand, understanding these relations is the key to comprehend the meaning of a text and the intention of the speaker/writer. On the other hand, producing cohesive discourse with naturally presented discourse relations facilitates communication, for humans and machines alike. Critically, discourse relations can either be explicitly marked by discourse connectives (DCs), such as therefore and but, or implicitly conveyed in natural language contexts. It is not well understood how speakers choose between the two options, and how this choice impacts applications in natural language processing.

This dissertation explores the marking of discourse relations from two different perspectives. The first part of the study investigates discourse relations from the perspective of human language processing, in the monolingual dimension. A computational psycholinguistic model is proposed to predict whether a discourse relation is marked or not, and how human comprehends explicit and implicit discourse relations. Results are evaluated against corpus annotation as well as behavioral experiments by means of crowd-sourcing.

The second part of the dissertation investigates the marking of discourse relations from an applicational perspective, in the cross-lingual dimension. A bilingual resource of manually annotated discourse relations is constructed, and machine translation experiments are conducted to compare human and machine translation of explicit and implicit discourse relations.

This dissertation contributes to the field of computational linguistics by improving the state-of-the-art in the task of predicting speakers’ choice of discourse marking, proposing an explanatory cognitive model for the marking of discourse relations based on information-theoretic approaches, building an open-sourced Chinese-English parallel corpus with aligned discourse relations, and advancing our understanding of explicit translation of implicit discourse relations by humans and machine translation systems.