Skip to main content
Login | Suomeksi | På svenska | In English

Topic Visualization and Survival Analysis

Show simple item record

dc.date.accessioned 2017-06-20T11:14:37Z und
dc.date.accessioned 2017-10-24T12:24:28Z
dc.date.available 2017-06-20T11:14:37Z und
dc.date.available 2017-10-24T12:24:28Z
dc.date.issued 2017-06-20T11:14:37Z
dc.identifier.uri http://radr.hulib.helsinki.fi/handle/10138.1/6126 und
dc.identifier.uri http://hdl.handle.net/10138.1/6126
dc.title Topic Visualization and Survival Analysis en
ethesis.discipline Computer science en
ethesis.discipline Tietojenkäsittelytiede fi
ethesis.discipline Datavetenskap sv
ethesis.discipline.URI http://data.hulib.helsinki.fi/id/1dcabbeb-f422-4eec-aaff-bb11d7501348
ethesis.department.URI http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department Institutionen för datavetenskap sv
ethesis.department Department of Computer Science en
ethesis.department Tietojenkäsittelytieteen laitos fi
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty.URI http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university Helsingfors universitet sv
ethesis.university University of Helsinki en
ethesis.university Helsingin yliopisto fi
dct.creator Wang, Ping Jr
dct.issued 2017
dct.language.ISO639-2 eng
dct.abstract Latent semantic structure in a text collection is called a topic. In this thesis, we aim to visualize topics in the scientific literature and detect active or inactive research areas based on their lifetime. Topics were extracted from over 1 million abstracts from the arXiv.org database using Latent Dirichlet Allocation (LDA). Hellinger distance measures similarity between two topics. Topics are determined to be relevant if their pairwise distances are smaller than the threshold of Hellinger distance we set beforehand. The dynamic topic graph displays the evolution of topics over time. Topic hierarchical relationships are shown in a tree, where topics near the leaves are subtopics to those far from the bottom. The dynamic topic graph of category focuses on topics associated with a particular categories in the arXiv classification system. Logistic regression was used to predict topic lifetime and discover which factors have positive effect on lifetime and which ones induce the death of topics. Especially, we are interested in the effect of time, category, the number of documents, the number of topic variety and their interactions. In the experiment, we investigated topics in the dynamic topic graph of category under thresholds of Hellinger distance of 0.4, 0.5 and 0.6, respectively. Categories whose coefficients were negative for all datasets are defined to be popular as topics in this field are more probable to survive. en
dct.language en
ethesis.language.URI http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language English en
ethesis.language englanti fi
ethesis.language engelska sv
ethesis.thesistype pro gradu-avhandlingar sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype.URI http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
dct.identifier.urn URN:NBN:fi-fe2017112252257
dc.type.dcmitype Text

Files in this item

Files Size Format View
engl_malli.pdf 2.516Mb PDF

This item appears in the following Collection(s)

Show simple item record