Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Wang, Ziran"

Sort by: Order: Results:

  • Wang, Ziran (2013)
    This thesis considers the problem of finding a process that, given a collection of news, can detect significant dates and breaking news related to different themes. The themes are unsupervisedly learned from some training corpora, and they mostly have intuitive meanings, like 'finance', 'disaster', 'wars' and so on. They are constructed only based on textual information provided in the corpora without any human intervention. To conduct this learning, the thesis use various types of component models, specifically Latent Dirichlet Allocation(LDA) and Correlated Topic Model(CTM). On top of that, to enrich the experiment, the Latent Semantic Indexing(LSA) and Multinomial Principal Component Analysis(MPCA) are also adopted for comparison. The learning produces every news coverage a relevance weight for given theme, which can be viewed as a theme distribution from statistical perspective. With the help of news time-stamp information, one can sum up and normalize these distributions from all news in day unit, and then draw the moving of accumulated relevance weights on a theme through time-line. It is natural to treat these curves as describing attention strength paid from media to different themes, and one can assume that behind every peak, there are striking events and associated news can be detected. This thesis is valuable in Media Studies research, and also can be further connected to stock or currency market for creating real value.