Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "Differential privacy"

Sort by: Order: Results:

  • Räisä, Ossi (2021)
    Differential privacy has over the past decade become a widely used framework for privacy-preserving machine learning. At the same time, Markov chain Monte Carlo (MCMC) algorithms, particularly Metropolis-Hastings (MH) algorithms, have become an increasingly popular method of performing Bayesian inference. Surprisingly, their combination has not received much attention in the litera- ture. This thesis introduces the existing research on differentially private MH algorithms, proves tighter privacy bounds for them using recent developments in differential privacy, and develops two new differentially private MH algorithms: an algorithm using subsampling to lower privacy costs, and a differentially private variant of the Hamiltonian Monte Carlo algorithm. The privacy bounds of both new algorithms are proved, and convergence to the exact posterior is proven for the latter. The performance of both the old and the new algorithms is compared on several Bayesian inference problems, revealing that none of the algorithms is clearly better than the others, but subsampling is likely only useful to lower computational costs.
  • Suihkonen, Sini (2023)
    The importance of protecting sensitive data from information breaches has increased in recent years due to companies and other institutions gathering massive datasets about their customers, including personally identifiable information. Differential privacy is one of the state-of-the-art methods for providing provable privacy to these datasets, protecting them from adversarial attacks. This thesis focuses on studying existing differentially private random forest (DPRF) algorithms, comparing them, and constructing a version of the DPRF algorithm based on these algorithms. Twelve articles from the late 2000s to 2022, each implementing a version of the DPRF algorithm, are included in the review of previous work. The created algorithm, called DPRF_thesis , uses a privatized median as a method for splitting internal nodes of the decision trees. The class counts of the leaf-nodes are made with the exponential mechanism. Tests on the DPRF_thesis algorithm were run on three binary classification UCI datasets, and the accuracy results were mostly comparable with the two existing DPRF algorithms DPRF_thesis was compared to. ACM Computing Classification System (CCS): Computing methodologies → Machine learning → Machine learning approaches → Classification and regression trees Security and privacy → Database and storage security → Data anonymization and sanitization
  • Jälkö, Joonas (2017)
    This thesis focuses on privacy-preserving statistical inference. We use a probabilistic point of view of privacy called differential privacy. Differential privacy ensures that replacing one individual from the dataset with another individual does not affect the results drastically. There are different versions of the differential privacy. This thesis considers the ε-differential privacy also known as the pure differential privacy, and also a relaxation known as the (ε, δ)-differential privacy. We state several important definitions and theorems of DP. The proofs for most of the theorems are given in this thesis. Our goal is to build a general framework for privacy preserving posterior inference. To achieve this we use an approximative approach for posterior inference called variational Bayesian (VB) methods. We build the basic concepts of variational inference with certain detail and show examples on how to apply variational inference. After giving the prerequisites on both DP and VB we state our main result, the differentially private variational inference (DPVI) method. We use a recently proposed doubly stochastic variational inference (DSVI) combined with Gaussian mechanism to build a privacy-preserving method for posterior inference. We give the algorithm definition and explain its parameters. The DPVI method is compared against the state-of-the-art method for DP posterior inference called the differentially private stochastic gradient Langevin dynamics (DP-SGLD). We compare the performance on two different models, the logistic regression model and the Gaussian mixture model. The DPVI method outperforms DP-SGLD in both tasks.