Searchable 73,023 items

Metadata

Asia-Pacific Journal of Information Technology and Multimedia, Volume 4, Issue 1, 2015, pp. 11-23

MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS

Abstract :

Malay language is an agglutinative language which rich morphology. Affixation to a root word is the most common morphological processes used to derive a new word for other meaning that would affect the change in their part of speech (POS). Malay annotated corpus is not freely available, so there is no publication report on the comparison of the performance of POS tagging using Hidden Markov Model (HMM), Maximum Entropy (ME) and Support Vector Machine (SVM), especially to look into the effect of Malay morphology for tagging unknown words. This paper aims to present the evaluation of TnT using HMM, MaxEnt using ME and SVMTool using SVM. In order to train and test such methods in tagging Malay language, efforts has been taken to annotate the Malay corpus in health domain. Modifications has been done to TnT to fit in prefix and circumfix features. The results of the experiments shows that SVMTool outperforms TnT and MaxEnt for overall accuracy (99.23% for SVMTool, 94% for TnT and 96% for Maxent) and tagging unknown words accuracy (96.78% for SVMTool, 67% for TnT and 86.23% for MaxEnt ). MaxEnt outperforms TnT for the overall accuracy and tagging unknown words. As the tagging accuracy of SVMTool to unknown word succeeds 96.78%, it would be the best tool for tagging Malay language for a specific domain.

Keywords : Malay POS tagger, Malay morphemes, unknown words
Subject Area : Computer Science(all) Health Professions(all) Medicine(all) Business, Management and Accounting(all) Decision Sciences(all)

Reference (27)

Cited (0)