A structural query system for Han characters

Matthew Skala

Research output: Journal Article or Conference Article in JournalJournal articleResearchpeer-review

Abstract

The IDSgrep structural query system for Han character dictionaries is presented. This dictionary search system represents the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes), a data model and syntax based on the Unicode IDS concept. It includes a query language for EIDS databases, with a freely available implementation and format translation from popular third-party IDS and XML character databases. The system is designed to suit the needs of font developers and foreign language learners. The search algorithm includes a bit vector index inspired by Bloom filters to support faster query operations. Experimental results are presented, evaluating the effect of the indexing on query performance.
Original languageEnglish
JournalInternational Journal of Asian Language Processing
Volume23
Issue number2
Pages (from-to)127-159
ISSN0219-5968
Publication statusPublished - Jan 2016

Keywords

  • tree matching
  • grep
  • radical
  • font
  • character description
  • Han script
  • Theory
  • Languages
  • Algorithms

Fingerprint

Dive into the research topics of 'A structural query system for Han characters'. Together they form a unique fingerprint.

Cite this