Skip to main content
Login | Suomeksi | På svenska | In English

Applying Component Models on Theme-based News Tracking and Detection

Show simple item record

dc.date.accessioned 2013-01-29T13:47:36Z und
dc.date.accessioned 2017-10-24T12:24:31Z
dc.date.available 2013-01-29T13:47:36Z und
dc.date.available 2017-10-24T12:24:31Z
dc.date.issued 2013-01-29T13:47:36Z
dc.identifier.uri http://radr.hulib.helsinki.fi/handle/10138.1/2326 und
dc.identifier.uri http://hdl.handle.net/10138.1/2326
dc.title Applying Component Models on Theme-based News Tracking and Detection en
ethesis.discipline Computer science en
ethesis.discipline Tietojenkäsittelytiede fi
ethesis.discipline Datavetenskap sv
ethesis.discipline.URI http://data.hulib.helsinki.fi/id/1dcabbeb-f422-4eec-aaff-bb11d7501348
ethesis.department.URI http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department Institutionen för datavetenskap sv
ethesis.department Department of Computer Science en
ethesis.department Tietojenkäsittelytieteen laitos fi
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty.URI http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university Helsingfors universitet sv
ethesis.university University of Helsinki en
ethesis.university Helsingin yliopisto fi
dct.creator Wang, Ziran
dct.issued 2013
dct.language.ISO639-2 eng
dct.abstract This thesis considers the problem of finding a process that, given a collection of news, can detect significant dates and breaking news related to different themes. The themes are unsupervisedly learned from some training corpora, and they mostly have intuitive meanings, like 'finance', 'disaster', 'wars' and so on. They are constructed only based on textual information provided in the corpora without any human intervention. To conduct this learning, the thesis use various types of component models, specifically Latent Dirichlet Allocation(LDA) and Correlated Topic Model(CTM). On top of that, to enrich the experiment, the Latent Semantic Indexing(LSA) and Multinomial Principal Component Analysis(MPCA) are also adopted for comparison. The learning produces every news coverage a relevance weight for given theme, which can be viewed as a theme distribution from statistical perspective. With the help of news time-stamp information, one can sum up and normalize these distributions from all news in day unit, and then draw the moving of accumulated relevance weights on a theme through time-line. It is natural to treat these curves as describing attention strength paid from media to different themes, and one can assume that behind every peak, there are striking events and associated news can be detected. This thesis is valuable in Media Studies research, and also can be further connected to stock or currency market for creating real value. en
dct.language en
ethesis.language.URI http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language English en
ethesis.language englanti fi
ethesis.language engelska sv
ethesis.thesistype pro gradu-avhandlingar sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype.URI http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
dct.identifier.urn URN:NBN:fi-fe2017112252262
dc.type.dcmitype Text

Files in this item

Files Size Format View
thesis.pdf 1.038Mb PDF

This item appears in the following Collection(s)

Show simple item record