Abstract
Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how “random” a hash function or a random number generator is, is its independence: a sequence of random variables is said to be k-independent if every variable is uniform and every size k subset is independent.
In this paper we consider three classic algorithms under limited independence. Besides the theoretical interest in removing the unrealistic assumption of full independence, the work is motivated by lower independence being more practical. We provide new bounds for randomized quicksort, min-wise hashing and largest bucket size under limited independence. Our results can be summarized as follows.
Randomized Quicksort. When pivot elements are computed using a 5-independent hash function, Karloff and Raghavan, J.ACM’93 showed O(nlogn) expected worst-case running time for a special version of quicksort. We improve upon this, showing that the same running time is achieved with only 4-independence.
Min-Wise Hashing. For a set A, consider the probability of a particular element being mapped to the smallest hash value. It is known that 5-independence implies the optimal probability O(1/n). Broder et al., STOC’98 showed that 2-independence implies it is O(1/|A|−−−√). We show a matching lower bound as well as new tight bounds for 3- and 4-independent hash functions.
Largest Bucket. We consider the case where n balls are distributed to n buckets using a k-independent hash function and analyze the largest bucket size. Alon et. al, STOC’97 showed that there exists a 2-independent hash function implying a bucket of size Ω( n1/2). We generalize the bound, providing a k-independent family of functions that imply size Ω( n1/k).
In this paper we consider three classic algorithms under limited independence. Besides the theoretical interest in removing the unrealistic assumption of full independence, the work is motivated by lower independence being more practical. We provide new bounds for randomized quicksort, min-wise hashing and largest bucket size under limited independence. Our results can be summarized as follows.
Randomized Quicksort. When pivot elements are computed using a 5-independent hash function, Karloff and Raghavan, J.ACM’93 showed O(nlogn) expected worst-case running time for a special version of quicksort. We improve upon this, showing that the same running time is achieved with only 4-independence.
Min-Wise Hashing. For a set A, consider the probability of a particular element being mapped to the smallest hash value. It is known that 5-independence implies the optimal probability O(1/n). Broder et al., STOC’98 showed that 2-independence implies it is O(1/|A|−−−√). We show a matching lower bound as well as new tight bounds for 3- and 4-independent hash functions.
Largest Bucket. We consider the case where n balls are distributed to n buckets using a k-independent hash function and analyze the largest bucket size. Alon et. al, STOC’97 showed that there exists a 2-independent hash function implying a bucket of size Ω( n1/2). We generalize the bound, providing a k-independent family of functions that imply size Ω( n1/k).
Original language | English |
---|---|
Title of host publication | Algorithms – ESA 2015 : 23rd Annual European Symposium, Patras, Greece, September 14–16, 2015, Proceedings |
Volume | 9294 |
Publisher | Springer |
Publication date | 2015 |
Pages | 828-839 |
ISBN (Print) | 978-3-662-48349-7 |
ISBN (Electronic) | 978-3-662-48350-3 |
DOIs | |
Publication status | Published - 2015 |
Series | Lecture Notes in Computer Science |
---|---|
ISSN | 0302-9743 |
Keywords
- Randomized Algorithms
- Probabilistic Analysis
- k-Independent Hash Functions
- Randomized Quicksort
- Min-Wise Hashing