Novel dynamic topic models for modelling sequential document collections

Novel dynamic topic models for modelling sequential document collections

dc.date.accessioned	2014-05-08T09:36:44Z	und
dc.date.accessioned	2017-10-24T12:23:41Z
dc.date.available	2014-05-08T09:36:44Z	und
dc.date.available	2017-10-24T12:23:41Z
dc.date.issued	2014-05-08T09:36:44Z
dc.identifier.uri	http://radr.hulib.helsinki.fi/handle/10138.1/3703	und
dc.identifier.uri	http://hdl.handle.net/10138.1/3703
dc.title	Novel dynamic topic models for modelling sequential document collections	en
ethesis.department.URI	http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department	Institutionen för datavetenskap	sv
ethesis.department	Department of Computer Science	en
ethesis.department	Tietojenkäsittelytieteen laitos	fi
ethesis.faculty	Matematisk-naturvetenskapliga fakulteten	sv
ethesis.faculty	Matemaattis-luonnontieteellinen tiedekunta	fi
ethesis.faculty	Faculty of Science	en
ethesis.faculty.URI	http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI	http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university	Helsingfors universitet	sv
ethesis.university	University of Helsinki	en
ethesis.university	Helsingin yliopisto	fi
dct.creator	Liye, He
dct.issued	2014
dct.language.ISO639-2	eng
dct.abstract	In this thesis, we concentrate on the problem of modelling real document collections, especially sequential document collections. The goal is to discover important hidden topics in the collection automatically by statistical modelling of its content. For the sequential document collections, we want to also capture how the topics change over time. To date, several computational tools such as latent dirichlet allocation (LDA) have been developed for modelling document collections. In this thesis, we develop new topic models for modelling the dynamic characteristics of a sequential document collection such as the news archives. We are, for example, interested in splitting the topics into long-term topics such as 'Eurozone crisis' that are discussed over years, and short-term topics such as 'Winter Olympics in 2014' that are only popular for several weeks. We first review the popular models for detecting the hidden topics and their evolution, and then propose two new approaches to detect these two kinds of topics. To provide real world data for the evaluation of our new approaches, we additionally design a pipeline for constructing sequential document collections through collecting documents from the Web. To investigate the performance of our new approaches from different aspects, we conduct qualitative and quantitative experiments on two different kinds of datasets respectively: news documents collected by the pipeline and 17 years' documents from the Neural Information Processing Systems (NIPS) conferences. The qualitative experiments aim at evaluating the quality of the discovered topics, whereas the quantitative experiments concern about their ability to predict new words from the unseen documents.	en
dct.language	en
ethesis.language.URI	http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language	English	en
ethesis.language	englanti	fi
ethesis.language	engelska	sv
ethesis.thesistype	pro gradu-avhandlingar	sv
ethesis.thesistype	pro gradu -tutkielmat	fi
ethesis.thesistype	master's thesis	en
ethesis.thesistype.URI	http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram	Algorithms and Machine Learning	en
dct.identifier.urn	URN:NBN:fi-fe2017112251881
dc.type.dcmitype	Text

Files in this item

Files	Size	Format	View
Masterthesis_LiyeHe.pdf	1.289Mb	PDF

This item appears in the following Collection(s)

Faculty of Science [4203]

Show simple item record

Novel dynamic topic models for modelling sequential document collections

Files in this item

This item appears in the following Collection(s)

Yhteystiedot

HELSINGIN YLIOPISTO