Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Wang, Ping Jr"

Sort by: Order: Results:

  • Wang, Ping Jr (2017)
    Latent semantic structure in a text collection is called a topic. In this thesis, we aim to visualize topics in the scientific literature and detect active or inactive research areas based on their lifetime. Topics were extracted from over 1 million abstracts from the arXiv.org database using Latent Dirichlet Allocation (LDA). Hellinger distance measures similarity between two topics. Topics are determined to be relevant if their pairwise distances are smaller than the threshold of Hellinger distance we set beforehand. The dynamic topic graph displays the evolution of topics over time. Topic hierarchical relationships are shown in a tree, where topics near the leaves are subtopics to those far from the bottom. The dynamic topic graph of category focuses on topics associated with a particular categories in the arXiv classification system. Logistic regression was used to predict topic lifetime and discover which factors have positive effect on lifetime and which ones induce the death of topics. Especially, we are interested in the effect of time, category, the number of documents, the number of topic variety and their interactions. In the experiment, we investigated topics in the dynamic topic graph of category under thresholds of Hellinger distance of 0.4, 0.5 and 0.6, respectively. Categories whose coefficients were negative for all datasets are defined to be popular as topics in this field are more probable to survive.