Training Algorithms for Multilingual Latent Dirichlet Allocation

Training Algorithms for Multilingual Latent Dirichlet Allocation

dc.date.accessioned	2016-08-17T10:00:30Z	und
dc.date.accessioned	2017-10-24T12:24:12Z
dc.date.available	2016-08-17T10:00:30Z	und
dc.date.available	2017-10-24T12:24:12Z
dc.date.issued	2016-08-17T10:00:30Z
dc.identifier.uri	http://radr.hulib.helsinki.fi/handle/10138.1/5708	und
dc.identifier.uri	http://hdl.handle.net/10138.1/5708
dc.title	Training Algorithms for Multilingual Latent Dirichlet Allocation	en
ethesis.department.URI	http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department	Institutionen för datavetenskap	sv
ethesis.department	Department of Computer Science	en
ethesis.department	Tietojenkäsittelytieteen laitos	fi
ethesis.faculty	Matematisk-naturvetenskapliga fakulteten	sv
ethesis.faculty	Matemaattis-luonnontieteellinen tiedekunta	fi
ethesis.faculty	Faculty of Science	en
ethesis.faculty.URI	http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI	http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university	Helsingfors universitet	sv
ethesis.university	University of Helsinki	en
ethesis.university	Helsingin yliopisto	fi
dct.creator	Jin, Haibo
dct.issued	2016
dct.language.ISO639-2	eng
dct.abstract	Multilingual Latent Dirichlet Allocation (MLDA) is an extension of Latent Dirichlet Allocation (LDA) in a multilingual setting, which aims to discover aligned latent topic structures of a parallel corpus. Although the two popular training algorithms of LDA, collapsed Gibbs sampling and variational inference, can be naturally adopted to MLDA, the two algorithms both become time-inefficient with MLDA due to its special structure. To address this problem, we propose an approximate training framework of MLDA, which works with both collapsed Gibbs sampling and variational inference. Through the experiments, we show that the proposed training framework is able to reduce the training time of MLDA considerably, especially when there are many languages. We also summarize the scenarios where the approximate framework gives comparable model accuracy to that of the standard framework. Finally, we discuss several possible explorations as a future plan.	en
dct.language	en
ethesis.language.URI	http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language	English	en
ethesis.language	englanti	fi
ethesis.language	engelska	sv
ethesis.thesistype	pro gradu-avhandlingar	sv
ethesis.thesistype	pro gradu -tutkielmat	fi
ethesis.thesistype	master's thesis	en
ethesis.thesistype.URI	http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram	Algorithms and Machine Learning	en
dct.identifier.urn	URN:NBN:fi-fe2017112251760
dc.type.dcmitype	Text

Files in this item

Files	Size	Format	View
MSc_Thesis_HaiboJin.pdf	1.255Mb	PDF

This item appears in the following Collection(s)

Faculty of Science [4203]

Show simple item record

Training Algorithms for Multilingual Latent Dirichlet Allocation

Files in this item

This item appears in the following Collection(s)

Yhteystiedot

HELSINGIN YLIOPISTO