Word Segmentation and Part-of-Speech Tagging for Lao Language

Insisiengmay Alivanh (1551123)


Word segmentation and part-of-speech (POS) tagging are considered fundamental steps for high-level natural language processing tasks such as parsing, machine translation, and information extraction. However, simultaneous implementation of both step has not been reported for Lao language. In this study, we built a pipeline framework that performs syllable segmentation, word segmentation, and part-of-speech tagging using neural networks on Lao language.The key advantage of using neural networks is that hand-designed features and language- specific resources are avoided. Developing hand-designed features and language- specific resources is a time-consuming task, whereas neural networks can effectively leverage useful features from the input representations. For syllable segmentation and word segmentation, we apply convolutional neural networks to identify syllable and word boundary. We then perform a POS tagging task by applying convolutional neural networks with conditional random fields layer to assign a part-of-speech tag to each word