Abstract
Regarding complexity of computation, randomness is a signicant resource beside time and space. Particularly from a theoretical viewpoint, it is a fundamental question whether availability of random numbers gives any additional power. Most of randomized algorithms are analyzed under the assumption that independent and unbiased random bits are accessible. However, truly rando bits are scarce in reality. In practice, pseudorandom generators are used in place of random numbers; usually, even the seed of the generator does not come from a source of true randomness. While things mostly work well in practice, there are occasional problems with use of weak pseudorandom generators. Further, randomized algorithms are not suited for applications where reliability is a key concern. Derandomization is the process of minimizing the use of random bits, either to small amounts or removing them altogether. We may identify two lines of work in this direction. There has been a lot of work in designing general tools for simulating randomness and making deterministic versions of randomized algorithms, with some loss in time and space performance. These methods are not tied to particular algorithms, but work on large classes of problems. The central question in this area of computational complexity is \P=BPP?". Instead of derandomizing whole complexity classes, one may work on derandomizing concrete problems. This approach trades generality for possibility of having much better performance bounds. There are a few common techniques for derandomizing concrete problems, but often one needs to specically design a new method that is \friendlier" to deterministic computation. This kind of solutions prevails in this thesis. A central part of the thesis are algorithms for deterministic selection of hash functions that have a \nicely spread" image of a given set. The main application is design of ecient dictionary data structures. A dictionary stores a set of keys from some universe, and allows the user to search through the set, looking for any value from the universe. Additional information may be associated with each key and retrieved on successful lookup. In a static dictionary the stored set remains xed after initialization, while in a dynamic dictionary it may change over time. Our static dictionaries attain worst-case performance that is very close the expected performance of best randomized dictionaries. In the dynamic case the gap is larger; it is a signicant open question to establish if a gap between deterministic and randomized dynamic dictionaries is necessary. We also have a new analysis of the classical linear probing hash tables, showing that it works well with simple and eciently computable hash functions. Here we have a randomized structure in which the randomness requirements are cut down to a reasonable level. Traditionally, linear probing was analyzed under the unrealistic uniform hashing assumption that the hash function employed behaves like a truly random function. This was later improved to explicit, but cumbersome and inecient families of functions. Our analysis shows that practically usable hash functions suce, but the simplest kinds of functions do not. Apart from dictionaries, we look at the problem of sparse approximations of vectors, which has applications in dierent areas such as data stream computation and compressed sensing. We present a method that achieves close to optimal performance on virtually all attributes. It is deterministic in the sense that a single measurement matrix works for all inputs. One of our dictionary results and the result on sparse recovery of vectors share an important tool, although the problems are unrelated. The shared tool is a type of expander graphs. We employ bipartite expander graphs, with unbalanced sides. For some algorithms, expander graphs capture all the required \random-like" properties. In such cases they can replace use of randomness, while maintaining about the same performance of algorithms. The problems that we study require and allow fast solutions. The algorithms involved have linear or near-linear running times. Even sublogarithmic factors in performance bounds are meaningful. With such high demands, one has to look for specic deterministic solutions that are ecient for particular problems; the general derandomization tools would be of no use.
| Originalsprog | Engelsk |
|---|---|
| Kvalifikation | Ph.d. |
| Vejleder(e) |
|
| Bevillingsdato | 8 okt. 2009 |
| Udgiver | |
| Status | Udgivet - 2009 |