TY - JOUR
T1 - Quantifying Privacy Risk with Gaussian Mixtures
AU - Ronneberg, Rasmus
AU - Randone, Francesca
AU - Pardo, Raúl
AU - Wasowski, Andrzej
N1 - Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
PY - 2025/6/23
Y1 - 2025/6/23
N2 - Data anonymization methods gain legal importance as data collection and analysis are expanding dramatically in data management and statistical research. Yet applying anonymization, or understanding how well a given analytics program hides sensitive information, is non-trivial. Privug is a method to quantify privacy risks of data analytics programs by analyzing their source code. The method uses probability distributions to model attacker knowledge and Bayesian inference to update said knowledge based on observable outputs. Currently, Privug is equipped with approximate Bayesian inference methods (such as Markov Chain Monte Carlo), and an exact Bayesian inference method based on multivariate Gaussian distributions. This paper introduces a privacy risk analysis engine based on Gaussian mixture models that combines exact and approximate inference. It extends the multivariate Gaussian engine by supporting exact inference in programs with continuous and discrete distributions as well as if-statements. Furthermore, the engine allows for approximating attacker knowledge that is not normally distributed. We evaluate the method by analyzing privacy risks in programs to release public statistics, differential privacy mechanisms, randomized response and attribute generalization. Finally, we show that our engine can be used to analyze programs involving thousands of sensitive records.
AB - Data anonymization methods gain legal importance as data collection and analysis are expanding dramatically in data management and statistical research. Yet applying anonymization, or understanding how well a given analytics program hides sensitive information, is non-trivial. Privug is a method to quantify privacy risks of data analytics programs by analyzing their source code. The method uses probability distributions to model attacker knowledge and Bayesian inference to update said knowledge based on observable outputs. Currently, Privug is equipped with approximate Bayesian inference methods (such as Markov Chain Monte Carlo), and an exact Bayesian inference method based on multivariate Gaussian distributions. This paper introduces a privacy risk analysis engine based on Gaussian mixture models that combines exact and approximate inference. It extends the multivariate Gaussian engine by supporting exact inference in programs with continuous and discrete distributions as well as if-statements. Furthermore, the engine allows for approximating attacker knowledge that is not normally distributed. We evaluate the method by analyzing privacy risks in programs to release public statistics, differential privacy mechanisms, randomized response and attribute generalization. Finally, we show that our engine can be used to analyze programs involving thousands of sensitive records.
KW - Privacy risk analysis
KW - Bayesian inference
KW - Probabilistic programming
KW - Data analytics programs
U2 - 10.1007/s10270-025-01298-x
DO - 10.1007/s10270-025-01298-x
M3 - Journal article
SP - 1
EP - 22
JO - Software and Systems Modeling (SoSyM)
JF - Software and Systems Modeling (SoSyM)
ER -