Training Algorithms for Multilingual Latent Dirichlet Allocation

Title:	Training Algorithms for Multilingual Latent Dirichlet Allocation
Author(s):	Jin, Haibo
Contributor:	University of Helsinki, Faculty of Science, Department of Computer Science
Language:	English
Acceptance year:	2016
Abstract:	in English Multilingual Latent Dirichlet Allocation (MLDA) is an extension of Latent Dirichlet Allocation (LDA) in a multilingual setting, which aims to discover aligned latent topic structures of a parallel corpus. Although the two popular training algorithms of LDA, collapsed Gibbs sampling and variational inference, can be naturally adopted to MLDA, the two algorithms both become time-inefficient with MLDA due to its special structure. To address this problem, we propose an approximate training framework of MLDA, which works with both collapsed Gibbs sampling and variational inference. Through the experiments, we show that the proposed training framework is able to reduce the training time of MLDA considerably, especially when there are many languages. We also summarize the scenarios where the approximate framework gives comparable model accuracy to that of the standard framework. Finally, we discuss several possible explorations as a future plan.

Files in this item

Files	Size	Format	View
MSc_Thesis_HaiboJin.pdf	1.255Mb	PDF