Randomized Primitives for Big Data Processing

Morten Stöckel

Publikation: Bog / Antologi / Rapport / Ph.D.-afhandlingPh.d.-afhandling

Abstract

The growth in information technology during the last decade has brought a great increase in the number of users that have access to computers or mobile phones, as well as an increase in the number of data-based services offered to users. For instance, the number of web servers almost doubled from 70 to 135 million during 2005-2007. The growth in users, combined with the growth in services, means that the amount of total data to manage is exploding.

An important query in the field of algorithms asks how much two data sets intersect, that is, the "overlap" between the pieces of data. Such a query is fundamental in applications such as recommender systems, where the answer would be used to measure similarity over shopping patterns and, based on that, recommend items to the user.

In this dissertation we examine the problem of computing intersection sizes among data sets in several applications and in the context of the information explosion. That is, we consider that the data in our applications is too large to be stored entirely or too large to fit in the main memory of the computer. The main contribution of the dissertation is improvement of several fundamental applications of such data intersection computations, such as approximating the set intersection size and multiplying two matrices. The improvements over the current state of the art methods are either in the form of less space required or less time needed to process the data to compute the answer to the query.
OriginalsprogEngelsk
ForlagIT-Universitetet i København
Antal sider155
ISBN (Trykt)978-87-7949-303-2
StatusUdgivet - 2015
NavnITU-DS
Nummer108
ISSN1602-3536

Fingeraftryk

Dyk ned i forskningsemnerne om 'Randomized Primitives for Big Data Processing'. Sammen danner de et unikt fingeraftryk.

Citationsformater