Joint English Spelling Error Correction and POS Tagging for Language LearnersWriting

坂口 慶祐 (1151052)


I propose an approach to correcting spelling errors and assigning part-of-speech (POS) tags simultaneously for sentences written by learners of English as a second language (ESL). In ESL writing, there are several types of errors such as preposition, determiner, verb, noun, and spelling errors. Spelling errors often interfere with POS tagging and syntactic parsing, which makes other error detection and correction tasks very difficult. In studies of grammatical error detection and correction in ESL writing, spelling correction has been regarded as a preprocessing step in a pipeline. However, several types of spelling errors in ESL are difficult to correct in the preprocessing, for example, homophones (e.g. *there/their), confusion (*form/from), inflection (*please/pleased), derivation(*badly/bad), where the words to be corrected are possible existing words in English. Furthermore, there are other types of misspelling, for example, split (*now a days/nowadays) and word merging (*dresscode/dress code), which require word boundary disambiguation.

In order to correct these spelling errors in addition to typical typographical errors (*begginning/beginning), I propose a joint analysis of POS tagging and spelling error correction with a CRF (Conditional Random Field)-based model. I present an approach that achieves significantly better accuracies in F-value for both POS tagging and spelling correction, compared to existing approaches: individual, exhaustive or pipeline analysis. I also show that the joint model can deal with novel types of misspelling in ESL writing.