Skip to main content
Login | Suomeksi | På svenska | In English

Applying Component Models on Theme-based News Tracking and Detection

Show full item record

Title: Applying Component Models on Theme-based News Tracking and Detection
Author(s): Wang, Ziran
Contributor: University of Helsinki, Faculty of Science, Department of Computer Science
Discipline: Computer science
Language: English
Acceptance year: 2013
Abstract:
This thesis considers the problem of finding a process that, given a collection of news, can detect significant dates and breaking news related to different themes. The themes are unsupervisedly learned from some training corpora, and they mostly have intuitive meanings, like 'finance', 'disaster', 'wars' and so on. They are constructed only based on textual information provided in the corpora without any human intervention. To conduct this learning, the thesis use various types of component models, specifically Latent Dirichlet Allocation(LDA) and Correlated Topic Model(CTM). On top of that, to enrich the experiment, the Latent Semantic Indexing(LSA) and Multinomial Principal Component Analysis(MPCA) are also adopted for comparison. The learning produces every news coverage a relevance weight for given theme, which can be viewed as a theme distribution from statistical perspective. With the help of news time-stamp information, one can sum up and normalize these distributions from all news in day unit, and then draw the moving of accumulated relevance weights on a theme through time-line. It is natural to treat these curves as describing attention strength paid from media to different themes, and one can assume that behind every peak, there are striking events and associated news can be detected. This thesis is valuable in Media Studies research, and also can be further connected to stock or currency market for creating real value.


Files in this item

Files Size Format View
thesis.pdf 1.038Mb PDF

This item appears in the following Collection(s)

Show full item record