A structural query system for Han characters

Matthew Skala

Publikation: Artikel i tidsskrift og konference artikel i tidsskriftTidsskriftartikelForskningpeer review

Abstract

The IDSgrep structural query system for Han character dictionaries is presented. This dictionary search system represents the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes), a data model and syntax based on the Unicode IDS concept. It includes a query language for EIDS databases, with a freely available implementation and format translation from popular third-party IDS and XML character databases. The system is designed to suit the needs of font developers and foreign language learners. The search algorithm includes a bit vector index inspired by Bloom filters to support faster query operations. Experimental results are presented, evaluating the effect of the indexing on query performance.
OriginalsprogEngelsk
TidsskriftInternational Journal of Asian Language Processing
Vol/bind23
Udgave nummer2
Sider (fra-til)127-159
ISSN0219-5968
StatusUdgivet - jan. 2016

Emneord

  • tree matching
  • grep
  • radical
  • font
  • character description
  • Han script
  • Theory
  • Languages
  • Algorithms

Fingeraftryk

Dyk ned i forskningsemnerne om 'A structural query system for Han characters'. Sammen danner de et unikt fingeraftryk.

Citationsformater