TY - GEN
T1 - Predictive Text Entry for Agglutinative Languages Using Unsupervised Morphological Segmentation
AU - Silfverberg, Miikka
AU - Lindén, Krister
AU - Hyvärinen, Hissu
PY - 2012
Y1 - 2012
N2 - Systems for predictive text entry on ambiguous keyboards typically rely on dictionaries with word frequencies which are used to suggest the most likely words matching user input. This approach is insufficient for agglutinative languages, where morphological phenomena increase the rate of out-of-vocabulary words. We propose a method for text entry, which circumvents the problem of out-of-vocabulary words, by replacing the dictionary with a Markov chain on morph sequences combined with a third order hidden Markov model (HMM) mapping key sequences to letter sequences and phonological constraints for pruning suggestion lists. We evaluate our method by constructing text entry systems for Finnish and Turkish and comparing our systems with published text entry systems and the text entry systems of three commercially available mobile phones. Measured using the keystrokes per character ratio (KPC) [8], we achieve superior results. For training, we use corpora, which are segmented using unsupervised morphological segmentation.
AB - Systems for predictive text entry on ambiguous keyboards typically rely on dictionaries with word frequencies which are used to suggest the most likely words matching user input. This approach is insufficient for agglutinative languages, where morphological phenomena increase the rate of out-of-vocabulary words. We propose a method for text entry, which circumvents the problem of out-of-vocabulary words, by replacing the dictionary with a Markov chain on morph sequences combined with a third order hidden Markov model (HMM) mapping key sequences to letter sequences and phonological constraints for pruning suggestion lists. We evaluate our method by constructing text entry systems for Finnish and Turkish and comparing our systems with published text entry systems and the text entry systems of three commercially available mobile phones. Measured using the keystrokes per character ratio (KPC) [8], we achieve superior results. For training, we use corpora, which are segmented using unsupervised morphological segmentation.
KW - Predictive text entry
KW - Agglutinative languages
KW - Markov chains
KW - Hidden Markov model (HMM)
KW - Morphological segmentation
KW - Predictive text entry
KW - Agglutinative languages
KW - Markov chains
KW - Hidden Markov model (HMM)
KW - Morphological segmentation
U2 - 10.1007/978-3-642-28601-8_40
DO - 10.1007/978-3-642-28601-8_40
M3 - Article in proceedings
SN - 978-3-642-28600-1
T3 - Lecture Notes in Computer Science
SP - 478
EP - 489
BT - International Conference on Intelligent Text Processing and Computational Linguistics
PB - Springer
ER -