Randomized Primitives for Big Data Processing

Morten Stöckel

Research output: Book / Anthology / Report / Ph.D. thesisPh.D. thesis


The growth in information technology during the last decade has brought a great increase in the number of users that have access to computers or mobile phones, as well as an increase in the number of data-based services offered to users. For instance, the number of web servers almost doubled from 70 to 135 million during 2005-2007. The growth in users, combined with the growth in services, means that the amount of total data to manage is exploding.

An important query in the field of algorithms asks how much two data sets intersect, that is, the "overlap" between the pieces of data. Such a query is fundamental in applications such as recommender systems, where the answer would be used to measure similarity over shopping patterns and, based on that, recommend items to the user.

In this dissertation we examine the problem of computing intersection sizes among data sets in several applications and in the context of the information explosion. That is, we consider that the data in our applications is too large to be stored entirely or too large to fit in the main memory of the computer. The main contribution of the dissertation is improvement of several fundamental applications of such data intersection computations, such as approximating the set intersection size and multiplying two matrices. The improvements over the current state of the art methods are either in the form of less space required or less time needed to process the data to compute the answer to the query.
Original languageEnglish
PublisherIT-Universitetet i København
Number of pages155
ISBN (Print)978-87-7949-303-2
Publication statusPublished - 2015


Dive into the research topics of 'Randomized Primitives for Big Data Processing'. Together they form a unique fingerprint.

Cite this