## Abstract

As a result of the variety of digital devices available and fast and cheap storage, huge amounts of data are constantly collected and shared for various purposes. The mere extent of the data creation and collection introduces challenges within various research fields such as access control, cryptography and privacy protec- tion. While security is a complementary field of research, we will in this thesis focus on privacy protection.

We study differential privacy, the state-of-the-art privacy technique, due to the stringent definition and the fact that we make no assumptions about the background knowledge of the analyst or their computational power. Differential privacy lets us explicitly quantify the privacy loss and establishes transparency in an application’s privacy guarantees. Specifically, we will primarily focus on a distributed setting, where data is split among many curators as is often the case in practice, such as, for example, data collected from mobile devices or country-specific data about, say, a pandemic.

The main contributions of this dissertation are as follows:

• We introduce a compact summary of a dataset, a sketch, for efficiently and accurately estimating the cardinality of the dataset while preserving differential privacy. An important application is that two such sketches over sets A and B can be combined into a sketch for the symmetric difference, A△B, allowing for privately estimating the cardinality of the symmetric difference between input sets held by different curators, and so cannot be exchanged.

• We introduce a differentially private sketch for estimating the Euclidean distance between real input vectors held by different curators and prove that our sketch achieves better privacy, efficiency and accuracy guarantees than previous work.

• We introduce a new noise distribution, the Arete distribution, which permits a differentially private mechanism, the Arete mechanism. This mechanism incurs error exponentially decreasing in the privacy parameter ε, improving over the Laplace mechanism in the low privacy regime (for large ε). Further- more, the noise distribution has a continuous density function and is infinitely divisible, meaning that we can distribute the noise necessary to ensure differential privacy among several data curators to allow for private, distributed analysis with high accuracy.

We formally define differential privacy along with the two central problems of weighted cardinality esti- mation and Euclidean distance approximation and give stringent proofs for each of the results mentioned. Furthermore, we present and discuss several open problems extending the contributions of this thesis.

We study differential privacy, the state-of-the-art privacy technique, due to the stringent definition and the fact that we make no assumptions about the background knowledge of the analyst or their computational power. Differential privacy lets us explicitly quantify the privacy loss and establishes transparency in an application’s privacy guarantees. Specifically, we will primarily focus on a distributed setting, where data is split among many curators as is often the case in practice, such as, for example, data collected from mobile devices or country-specific data about, say, a pandemic.

The main contributions of this dissertation are as follows:

• We introduce a compact summary of a dataset, a sketch, for efficiently and accurately estimating the cardinality of the dataset while preserving differential privacy. An important application is that two such sketches over sets A and B can be combined into a sketch for the symmetric difference, A△B, allowing for privately estimating the cardinality of the symmetric difference between input sets held by different curators, and so cannot be exchanged.

• We introduce a differentially private sketch for estimating the Euclidean distance between real input vectors held by different curators and prove that our sketch achieves better privacy, efficiency and accuracy guarantees than previous work.

• We introduce a new noise distribution, the Arete distribution, which permits a differentially private mechanism, the Arete mechanism. This mechanism incurs error exponentially decreasing in the privacy parameter ε, improving over the Laplace mechanism in the low privacy regime (for large ε). Further- more, the noise distribution has a continuous density function and is infinitely divisible, meaning that we can distribute the noise necessary to ensure differential privacy among several data curators to allow for private, distributed analysis with high accuracy.

We formally define differential privacy along with the two central problems of weighted cardinality esti- mation and Euclidean distance approximation and give stringent proofs for each of the results mentioned. Furthermore, we present and discuss several open problems extending the contributions of this thesis.

Original language | English |
---|

Publisher | IT-Universitetet i København |
---|---|

Number of pages | 102 |

ISBN (Print) | 978-87-7949-059-8 |

Publication status | Published - 2021 |

Series | ITU-DS |
---|---|

Number | 185 |

ISSN | 1602-3536 |