Abstrakt
The IDSgrep structural query system for Han character dictionaries is presented. This dictionary search system represents the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes), a data model and syntax based on the Unicode IDS concept. It includes a query language for EIDS databases, with a freely available implementation and format translation from popular third-party IDS and XML character databases. The system is designed to suit the needs of font developers and foreign language learners. The search algorithm includes a bit vector index inspired by Bloom filters to support faster query operations. Experimental results are presented, evaluating the effect of the indexing on query performance.
Originalsprog | Engelsk |
---|---|
Tidsskrift | International Journal of Asian Language Processing |
Vol/bind | 23 |
Udgave nummer | 2 |
Sider (fra-til) | 127-159 |
Status | Udgivet - jan. 2016 |