Browsing by Author "Yu, Hanlin"
Now showing items 11 of 1

Yu, Hanlin (2023)Bayesian inference tells us how we can incorporate information from the data into the parameters. In practice, this can be carried out using Markov Chain Monte Carlo (MCMC) methods which draw approximate samples from the posterior distribution, but using them for complex models like neural networks remains challenging. The most commonly used methods in these cases are Stochastic Gradient Markov Chain Monte Carlo (SGMCMC) methods based on minibatches. This thesis presents improvements for this family of algorithms. We focus on the specific algorithm of Stochastic Gradient Riemannian Langevin Dynamics (SGRLD). The core idea of it is to perform sampling on a suitably defined Riemannian manifold characterized by a Riemannain metric, which allows utilizing information of the curvature of the target distribution for improved efficiency. While SGRLD has nice theoretical properties, for arbitrary Riemannian geometries, the algorithm is slow due to the need of repeatedly calculating the inverse and inverse square root of the metric, which is highdimensional for large neural networks. For the task of efficiently sampling from an arbitrary neural network with a large number of parameters, the current algorithms overcome the issue by using diagonal metrics, which enable fast computations, but unfortunately lose some advantages of the Riemannian formulation. This thesis proposes the first computationally efficient algorithms with nondiagonal metrics applicable for training an arbitrary large neural network for this task. We propose two alternative metrics, and name the resulting algorithms MongeSGLD and ShampooSGLD, respectively. We demonstrate that with both of them we can achieve improvements upon the existing ones in terms of convergence, especially when using neural networks with priors that result in good performances. We also propose to use the average curvature of the obtained samples as an evaluation measure.
Now showing items 11 of 1