Skip to main navigation Skip to search Skip to main content

Predictive Text Entry for Agglutinative Languages Using Unsupervised Morphological Segmentation

Miikka Silfverberg, Krister Lindén, Hissu Hyvärinen

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

Systems for predictive text entry on ambiguous keyboards typically rely on dictionaries with word frequencies which are used to suggest the most likely words matching user input. This approach is insufficient for agglutinative languages, where morphological phenomena increase the rate of out-of-vocabulary words. We propose a method for text entry, which circumvents the problem of out-of-vocabulary words, by replacing the dictionary with a Markov chain on morph sequences combined with a third order hidden Markov model (HMM) mapping key sequences to letter sequences and phonological constraints for pruning suggestion lists. We evaluate our method by constructing text entry systems for Finnish and Turkish and comparing our systems with published text entry systems and the text entry systems of three commercially available mobile phones. Measured using the keystrokes per character ratio (KPC) [8], we achieve superior results. For training, we use corpora, which are segmented using unsupervised morphological segmentation.
Original languageEnglish
Title of host publicationInternational Conference on Intelligent Text Processing and Computational Linguistics : CICLing 2012: Computational Linguistics and Intelligent Text Processing
PublisherSpringer
Publication date2012
Pages478-489
ISBN (Print)978-3-642-28600-1
ISBN (Electronic)978-3-642-28601-8
DOIs
Publication statusPublished - 2012
Externally publishedYes
SeriesLecture Notes in Computer Science
Volume7182
ISSN0302-9743

Keywords

  • Predictive text entry
  • Agglutinative languages
  • Markov chains
  • Hidden Markov model (HMM)
  • Morphological segmentation

Fingerprint

Dive into the research topics of 'Predictive Text Entry for Agglutinative Languages Using Unsupervised Morphological Segmentation'. Together they form a unique fingerprint.

Cite this