## Abstract

Regarding complexity of computation, randomness is a signicant resource beside time and space.

Particularly from a theoretical viewpoint, it is a fundamental question whether availability of ran-

dom numbers gives any additional power. Most of randomized algorithms are analyzed under the

assumption that independent and unbiased random bits are accessible. However, truly random

bits are scarce in reality. In practice, pseudorandom generators are used in place of random

numbers; usually, even the seed of the generator does not come from a source of true random-

ness. While things mostly work well in practice, there are occasional problems with use of weak

pseudorandom generators. Further, randomized algorithms are not suited for applications where

reliability is a key concern.

Derandomization is the process of minimizing the use of random bits, either to small amounts

or removing them altogether. We may identify two lines of work in this direction. There has

been a lot of work in designing general tools for simulating randomness and making deterministic

versions of randomized algorithms, with some loss in time and space performance. These methods

are not tied to particular algorithms, but work on large classes of problems. The central question

in this area of computational complexity is \P=BPP?".

Instead of derandomizing whole complexity classes, one may work on derandomizing concrete

problems. This approach trades generality for possibility of having much better performance

bounds. There are a few common techniques for derandomizing concrete problems, but often one

needs to specically design a new method that is \friendlier" to deterministic computation. This

kind of solutions prevails in this thesis.

A central part of the thesis are algorithms for deterministic selection of hash functions that

have a \nicely spread" image of a given set. The main application is design of ecient dictionary

data structures. A dictionary stores a set of keys from some universe, and allows the user to

search through the set, looking for any value from the universe. Additional information may be

associated with each key and retrieved on successful lookup. In a static dictionary the stored

set remains xed after initialization, while in a dynamic dictionary it may change over time.

Our static dictionaries attain worst-case performance that is very close the expected performance

of best randomized dictionaries. In the dynamic case the gap is larger; it is a signicant open

question to establish if a gap between deterministic and randomized dynamic dictionaries is

necessary.

We also have a new analysis of the classical linear probing hash tables, showing that it works

well with simple and eciently computable hash functions. Here we have a randomized structure

in which the randomness requirements are cut down to a reasonable level. Traditionally, linear

probing was analyzed under the unrealistic uniform hashing assumption that the hash function

employed behaves like a truly random function. This was later improved to explicit, but cum-

bersome and inecient families of functions. Our analysis shows that practically usable hash

functions suce, but the simplest kinds of functions do not.

Apart from dictionaries, we look at the problem of sparse approximations of vectors, which

has applications in dierent areas such as data stream computation and compressed sensing. We

present a method that achieves close to optimal performance on virtually all attributes. It is

deterministic in the sense that a single measurement matrix works for all inputs.

One of our dictionary results and the result on sparse recovery of vectors share an impor-

tant tool, although the problems are unrelated. The shared tool is a type of expander graphs.

We employ bipartite expander graphs, with unbalanced sides. For some algorithms, expander

graphs capture all the required \random-like" properties. In such cases they can replace use of

randomness, while maintaining about the same performance of algorithms.

The problems that we study require and allow fast solutions. The algorithms involved have

linear or near-linear running times. Even sublogarithmic factors in performance bounds are

meaningful. With such high demands, one has to look for specic deterministic solutions that

are ecient for particular problems; the general derandomization tools would be of no use.

Particularly from a theoretical viewpoint, it is a fundamental question whether availability of ran-

dom numbers gives any additional power. Most of randomized algorithms are analyzed under the

assumption that independent and unbiased random bits are accessible. However, truly random

bits are scarce in reality. In practice, pseudorandom generators are used in place of random

numbers; usually, even the seed of the generator does not come from a source of true random-

ness. While things mostly work well in practice, there are occasional problems with use of weak

pseudorandom generators. Further, randomized algorithms are not suited for applications where

reliability is a key concern.

Derandomization is the process of minimizing the use of random bits, either to small amounts

or removing them altogether. We may identify two lines of work in this direction. There has

been a lot of work in designing general tools for simulating randomness and making deterministic

versions of randomized algorithms, with some loss in time and space performance. These methods

are not tied to particular algorithms, but work on large classes of problems. The central question

in this area of computational complexity is \P=BPP?".

Instead of derandomizing whole complexity classes, one may work on derandomizing concrete

problems. This approach trades generality for possibility of having much better performance

bounds. There are a few common techniques for derandomizing concrete problems, but often one

needs to specically design a new method that is \friendlier" to deterministic computation. This

kind of solutions prevails in this thesis.

A central part of the thesis are algorithms for deterministic selection of hash functions that

have a \nicely spread" image of a given set. The main application is design of ecient dictionary

data structures. A dictionary stores a set of keys from some universe, and allows the user to

search through the set, looking for any value from the universe. Additional information may be

associated with each key and retrieved on successful lookup. In a static dictionary the stored

set remains xed after initialization, while in a dynamic dictionary it may change over time.

Our static dictionaries attain worst-case performance that is very close the expected performance

of best randomized dictionaries. In the dynamic case the gap is larger; it is a signicant open

question to establish if a gap between deterministic and randomized dynamic dictionaries is

necessary.

We also have a new analysis of the classical linear probing hash tables, showing that it works

well with simple and eciently computable hash functions. Here we have a randomized structure

in which the randomness requirements are cut down to a reasonable level. Traditionally, linear

probing was analyzed under the unrealistic uniform hashing assumption that the hash function

employed behaves like a truly random function. This was later improved to explicit, but cum-

bersome and inecient families of functions. Our analysis shows that practically usable hash

functions suce, but the simplest kinds of functions do not.

Apart from dictionaries, we look at the problem of sparse approximations of vectors, which

has applications in dierent areas such as data stream computation and compressed sensing. We

present a method that achieves close to optimal performance on virtually all attributes. It is

deterministic in the sense that a single measurement matrix works for all inputs.

One of our dictionary results and the result on sparse recovery of vectors share an impor-

tant tool, although the problems are unrelated. The shared tool is a type of expander graphs.

We employ bipartite expander graphs, with unbalanced sides. For some algorithms, expander

graphs capture all the required \random-like" properties. In such cases they can replace use of

randomness, while maintaining about the same performance of algorithms.

The problems that we study require and allow fast solutions. The algorithms involved have

linear or near-linear running times. Even sublogarithmic factors in performance bounds are

meaningful. With such high demands, one has to look for specic deterministic solutions that

are ecient for particular problems; the general derandomization tools would be of no use.

Original language | English |
---|

Publisher | IT-Universitetet i København |
---|---|

Publication status | Published - 2009 |