Abstract
Regarding complexity of computation, randomness is a signicant resource beside time and space.
Particularly from a theoretical viewpoint, it is a fundamental question whether availability of ran-
dom numbers gives any additional power. Most of randomized algorithms are analyzed under the
assumption that independent and unbiased random bits are accessible. However, truly random
bits are scarce in reality. In practice, pseudorandom generators are used in place of random
numbers; usually, even the seed of the generator does not come from a source of true random-
ness. While things mostly work well in practice, there are occasional problems with use of weak
pseudorandom generators. Further, randomized algorithms are not suited for applications where
reliability is a key concern.
Derandomization is the process of minimizing the use of random bits, either to small amounts
or removing them altogether. We may identify two lines of work in this direction. There has
been a lot of work in designing general tools for simulating randomness and making deterministic
versions of randomized algorithms, with some loss in time and space performance. These methods
are not tied to particular algorithms, but work on large classes of problems. The central question
in this area of computational complexity is \P=BPP?".
Instead of derandomizing whole complexity classes, one may work on derandomizing concrete
problems. This approach trades generality for possibility of having much better performance
bounds. There are a few common techniques for derandomizing concrete problems, but often one
needs to specically design a new method that is \friendlier" to deterministic computation. This
kind of solutions prevails in this thesis.
A central part of the thesis are algorithms for deterministic selection of hash functions that
have a \nicely spread" image of a given set. The main application is design of ecient dictionary
data structures. A dictionary stores a set of keys from some universe, and allows the user to
search through the set, looking for any value from the universe. Additional information may be
associated with each key and retrieved on successful lookup. In a static dictionary the stored
set remains xed after initialization, while in a dynamic dictionary it may change over time.
Our static dictionaries attain worst-case performance that is very close the expected performance
of best randomized dictionaries. In the dynamic case the gap is larger; it is a signicant open
question to establish if a gap between deterministic and randomized dynamic dictionaries is
necessary.
We also have a new analysis of the classical linear probing hash tables, showing that it works
well with simple and eciently computable hash functions. Here we have a randomized structure
in which the randomness requirements are cut down to a reasonable level. Traditionally, linear
probing was analyzed under the unrealistic uniform hashing assumption that the hash function
employed behaves like a truly random function. This was later improved to explicit, but cum-
bersome and inecient families of functions. Our analysis shows that practically usable hash
functions suce, but the simplest kinds of functions do not.
Apart from dictionaries, we look at the problem of sparse approximations of vectors, which
has applications in dierent areas such as data stream computation and compressed sensing. We
present a method that achieves close to optimal performance on virtually all attributes. It is
deterministic in the sense that a single measurement matrix works for all inputs.
One of our dictionary results and the result on sparse recovery of vectors share an impor-
tant tool, although the problems are unrelated. The shared tool is a type of expander graphs.
We employ bipartite expander graphs, with unbalanced sides. For some algorithms, expander
graphs capture all the required \random-like" properties. In such cases they can replace use of
randomness, while maintaining about the same performance of algorithms.
The problems that we study require and allow fast solutions. The algorithms involved have
linear or near-linear running times. Even sublogarithmic factors in performance bounds are
meaningful. With such high demands, one has to look for specic deterministic solutions that
are ecient for particular problems; the general derandomization tools would be of no use.
Particularly from a theoretical viewpoint, it is a fundamental question whether availability of ran-
dom numbers gives any additional power. Most of randomized algorithms are analyzed under the
assumption that independent and unbiased random bits are accessible. However, truly random
bits are scarce in reality. In practice, pseudorandom generators are used in place of random
numbers; usually, even the seed of the generator does not come from a source of true random-
ness. While things mostly work well in practice, there are occasional problems with use of weak
pseudorandom generators. Further, randomized algorithms are not suited for applications where
reliability is a key concern.
Derandomization is the process of minimizing the use of random bits, either to small amounts
or removing them altogether. We may identify two lines of work in this direction. There has
been a lot of work in designing general tools for simulating randomness and making deterministic
versions of randomized algorithms, with some loss in time and space performance. These methods
are not tied to particular algorithms, but work on large classes of problems. The central question
in this area of computational complexity is \P=BPP?".
Instead of derandomizing whole complexity classes, one may work on derandomizing concrete
problems. This approach trades generality for possibility of having much better performance
bounds. There are a few common techniques for derandomizing concrete problems, but often one
needs to specically design a new method that is \friendlier" to deterministic computation. This
kind of solutions prevails in this thesis.
A central part of the thesis are algorithms for deterministic selection of hash functions that
have a \nicely spread" image of a given set. The main application is design of ecient dictionary
data structures. A dictionary stores a set of keys from some universe, and allows the user to
search through the set, looking for any value from the universe. Additional information may be
associated with each key and retrieved on successful lookup. In a static dictionary the stored
set remains xed after initialization, while in a dynamic dictionary it may change over time.
Our static dictionaries attain worst-case performance that is very close the expected performance
of best randomized dictionaries. In the dynamic case the gap is larger; it is a signicant open
question to establish if a gap between deterministic and randomized dynamic dictionaries is
necessary.
We also have a new analysis of the classical linear probing hash tables, showing that it works
well with simple and eciently computable hash functions. Here we have a randomized structure
in which the randomness requirements are cut down to a reasonable level. Traditionally, linear
probing was analyzed under the unrealistic uniform hashing assumption that the hash function
employed behaves like a truly random function. This was later improved to explicit, but cum-
bersome and inecient families of functions. Our analysis shows that practically usable hash
functions suce, but the simplest kinds of functions do not.
Apart from dictionaries, we look at the problem of sparse approximations of vectors, which
has applications in dierent areas such as data stream computation and compressed sensing. We
present a method that achieves close to optimal performance on virtually all attributes. It is
deterministic in the sense that a single measurement matrix works for all inputs.
One of our dictionary results and the result on sparse recovery of vectors share an impor-
tant tool, although the problems are unrelated. The shared tool is a type of expander graphs.
We employ bipartite expander graphs, with unbalanced sides. For some algorithms, expander
graphs capture all the required \random-like" properties. In such cases they can replace use of
randomness, while maintaining about the same performance of algorithms.
The problems that we study require and allow fast solutions. The algorithms involved have
linear or near-linear running times. Even sublogarithmic factors in performance bounds are
meaningful. With such high demands, one has to look for specic deterministic solutions that
are ecient for particular problems; the general derandomization tools would be of no use.
Originalsprog | Engelsk |
---|
Forlag | IT-Universitetet i København |
---|---|
Status | Udgivet - 2009 |