Skip to main content
Login | Suomeksi | På svenska | In English

Training Algorithms for Multilingual Latent Dirichlet Allocation

Show simple item record

dc.date.accessioned 2016-08-17T10:00:30Z und
dc.date.accessioned 2017-10-24T12:24:12Z
dc.date.available 2016-08-17T10:00:30Z und
dc.date.available 2017-10-24T12:24:12Z
dc.date.issued 2016-08-17T10:00:30Z
dc.identifier.uri http://radr.hulib.helsinki.fi/handle/10138.1/5708 und
dc.identifier.uri http://hdl.handle.net/10138.1/5708
dc.title Training Algorithms for Multilingual Latent Dirichlet Allocation en
ethesis.department.URI http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department Institutionen för datavetenskap sv
ethesis.department Department of Computer Science en
ethesis.department Tietojenkäsittelytieteen laitos fi
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty.URI http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university Helsingfors universitet sv
ethesis.university University of Helsinki en
ethesis.university Helsingin yliopisto fi
dct.creator Jin, Haibo
dct.issued 2016
dct.language.ISO639-2 eng
dct.abstract Multilingual Latent Dirichlet Allocation (MLDA) is an extension of Latent Dirichlet Allocation (LDA) in a multilingual setting, which aims to discover aligned latent topic structures of a parallel corpus. Although the two popular training algorithms of LDA, collapsed Gibbs sampling and variational inference, can be naturally adopted to MLDA, the two algorithms both become time-inefficient with MLDA due to its special structure. To address this problem, we propose an approximate training framework of MLDA, which works with both collapsed Gibbs sampling and variational inference. Through the experiments, we show that the proposed training framework is able to reduce the training time of MLDA considerably, especially when there are many languages. We also summarize the scenarios where the approximate framework gives comparable model accuracy to that of the standard framework. Finally, we discuss several possible explorations as a future plan. en
dct.language en
ethesis.language.URI http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language English en
ethesis.language englanti fi
ethesis.language engelska sv
ethesis.thesistype pro gradu-avhandlingar sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype.URI http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram Algorithms and Machine Learning en
dct.identifier.urn URN:NBN:fi-fe2017112251760
dc.type.dcmitype Text

Files in this item

Files Size Format View
MSc_Thesis_HaiboJin.pdf 1.255Mb PDF

This item appears in the following Collection(s)

Show simple item record