TY - GEN

T1 - Bloom Filters, Adaptivity, and the Dictionary Problem

AU - Bender, Michael

AU - Farach-Colton, Martin

AU - Goswami, Mayank

AU - Johnson, Rob

AU - McCauley, Samuel

AU - Singh, Shinkha

PY - 2018

Y1 - 2018

N2 - An approximate membership query data structure (AMQ)—such as a Bloom, quotient, or cuckoo filter—maintains a compact, probabilistic representation of a set S of keys from a universe U. It supports lookups and inserts. Some AMQs also support deletes. A query for x in S returns "present." A query for x not in S returns "present" with a tunable false positive probability epsilon, and otherwise returns "absent." AMQs are widely used to speed up dictionaries that are stored remotely (e.g., on disk or across a network). The AMQ is stored locally (e.g., in memory). The remote dictionary is only accessed when the AMQ returns "present." Thus, the primary performance metric of an AMQ is how often it returns "absent" for negative queries. Existing AMQs offer weak guarantees on the number of false positives in a sequence of queries. The false-positive probability epsilon holds only for a single query. It is easy for an adversary to drive an AMQ's false-positive rate towards 1 by simply repeating false positives. This paper shows what it takes to get strong guarantees on the number of false positives. We say that an AMQ is adaptive if it guarantees a false-positive probability of epsilon for every query, regardless of answers to previous queries. We establish upper and lower bounds for adaptive AMQs. Our lower bound shows that it is impossible to build a small adaptive AMQ, even when the AMQ is immediately told whenever a query is a false positive. On the other hand, we show that it is possible to maintain an AMQ that uses the same amount of local space as a non-adaptive AMQ (up to lower order terms), performs all queries and updates in constant time, and guarantees that each negative query to the dictionary accesses remote storage with probability epsilon, independent of the results of past queries. Thus, we show that adaptivity can be achieved effectively for free.

AB - An approximate membership query data structure (AMQ)—such as a Bloom, quotient, or cuckoo filter—maintains a compact, probabilistic representation of a set S of keys from a universe U. It supports lookups and inserts. Some AMQs also support deletes. A query for x in S returns "present." A query for x not in S returns "present" with a tunable false positive probability epsilon, and otherwise returns "absent." AMQs are widely used to speed up dictionaries that are stored remotely (e.g., on disk or across a network). The AMQ is stored locally (e.g., in memory). The remote dictionary is only accessed when the AMQ returns "present." Thus, the primary performance metric of an AMQ is how often it returns "absent" for negative queries. Existing AMQs offer weak guarantees on the number of false positives in a sequence of queries. The false-positive probability epsilon holds only for a single query. It is easy for an adversary to drive an AMQ's false-positive rate towards 1 by simply repeating false positives. This paper shows what it takes to get strong guarantees on the number of false positives. We say that an AMQ is adaptive if it guarantees a false-positive probability of epsilon for every query, regardless of answers to previous queries. We establish upper and lower bounds for adaptive AMQs. Our lower bound shows that it is impossible to build a small adaptive AMQ, even when the AMQ is immediately told whenever a query is a false positive. On the other hand, we show that it is possible to maintain an AMQ that uses the same amount of local space as a non-adaptive AMQ (up to lower order terms), performs all queries and updates in constant time, and guarantees that each negative query to the dictionary accesses remote storage with probability epsilon, independent of the results of past queries. Thus, we show that adaptivity can be achieved effectively for free.

KW - Approximate Membership Query

KW - False Positive Probability

KW - Adaptive Data Structures

KW - Bloom Filter

KW - Remote Dictionary Access

KW - Approximate Membership Query

KW - False Positive Probability

KW - Adaptive Data Structures

KW - Bloom Filter

KW - Remote Dictionary Access

U2 - 10.1109/FOCS.2018.00026

DO - 10.1109/FOCS.2018.00026

M3 - Article in proceedings

SN - 978-1-5386-4230-6

BT - 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS)

PB - IEEE

ER -