NAIST-IS-DD1461213: Phan Duc Anh

Perspectives on the making of Multiple Emotion Detection System in Text

Phan Duc Anh ( 1461213 )

Emotion analysis in text, also known as affective computing refers to the use of natural language processing methods to recognize, interpret and simulate human emotions or affects. These emotions maybe the state of the author or the emotional effect intended by the author. Being able to interpret human emotions, the machine adapts itself better and produces appropriate behavior in response to those emotions. On the other hand, being able to simulate human emotions, the machine improves its communication ability and enriches interactivity between human and machine. Emotions in text may be expressed explicitly with emotional words, such as happy and hate or implicitly through the contexts. There has not been a method of emotion analysis in text that can interpret with high accuracy and simultaneously many emotions without being heavily domain-dependent.

We studies emotion analysis in text in two successive parts. The first part investigates both the linguistics and psychology theories behind the expression of emotions in text. The differences between emotion analysis and sentiment analysis, opinion mining are discussed. We also investigate the properties of emotional text: subjectivity and objectivity, explicit and implicit expressions of emotions, affects - direct emotions of the author and intended emotional communication - emotional effect intended by the author. Lastly, we study various psychology theory about emotions and discuss what theory we shall use in our study.

The second part approaches emotion analysis in text from an application perspective by taking advantages of the investigated theories, using natural language processing tools and machine learning techniques to produce emotion lexicon and models. The lexicon and models predict the multiple emotions that a piece of text may hold implicitly or explicitly. We propose two lexicon building method: bootstrapped lexicon on general domain and word-embedding lexicon on target domain. We later use both supervised and semi-supervised learning techniques to generate the models from the lexicon. The results are evaluated against several baselines and methods as well as among the lexicon and models. We verify and report the superior performace of our methods against the existing ones.

Our work contributes to the field of Computation Linguistics by improving the state-of-the-art in Multiple Emotion Detection in text, discussing the theories behind emotion detection in text in both linguistics and psychology viewpoints, annotating a semi-supervised corpus, proposing a framework for the task, building an emotional lexicon and a predicting model, and deepening our understanding of topic. We believe systems that try to interpret human emotions and adjust their behaviors accordingly will greatly benefits from our work.