Abstract
The IDSgrep structural query system for Han character dictionaries is presented. This dictionary search system represents the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes), a data model and syntax based on the Unicode IDS concept. It includes a query language for EIDS databases, with a freely available implementation and format translation from popular third-party IDS and XML character databases. The system is designed to suit the needs of font developers and foreign language learners. The search algorithm includes a bit vector index inspired by Bloom filters to support faster query operations. Experimental results are presented, evaluating the effect of the indexing on query performance.
Original language | English |
---|---|
Journal | International Journal of Asian Language Processing |
Volume | 23 |
Issue number | 2 |
Pages (from-to) | 127-159 |
ISSN | 0219-5968 |
Publication status | Published - Jan 2016 |
Keywords
- tree matching
- grep
- radical
- font
- character description
- Han script
- Theory
- Languages
- Algorithms