Abstract
The IDSgrep structural query system for Han character dictionaries is presented. This dictionary search system represents the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes), a data model and syntax based on the Unicode IDS concept. It includes a query language for EIDS databases, with a freely available implementation and format translation from popular third-party IDS and XML character databases. The system is designed to suit the needs of font developers and foreign language learners. The search algorithm includes a bit vector index inspired by Bloom filters to support faster query operations. Experimental results are presented, evaluating the effect of the indexing on query performance.
Originalsprog | Engelsk |
---|---|
Tidsskrift | International Journal of Asian Language Processing |
Vol/bind | 23 |
Udgave nummer | 2 |
Sider (fra-til) | 127-159 |
ISSN | 0219-5968 |
Status | Udgivet - jan. 2016 |
Emneord
- tree matching
- grep
- radical
- font
- character description
- Han script
- Theory
- Languages
- Algorithms