Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Aalto, Iiro"

Sort by: Order: Results:

  • Aalto, Iiro (2020)
    Slack is an instant messaging platform intended for the internal communications of companies and other organizations. For organizations that use Slack extensively it may provide an interesting source of insight, but as such the data is difficult to analyze. Topic modeling, primarily latent Dirichlet allocation (LDA), is commonly used to summarize textual data in a meaningful way. Instant messages tend to be very short, which causes problems for conventional topic modeling methods such as LDA. The data sparsity problem can be tackled with data expansion and data combination techniques. For instant messages, data combination is particularly attractive as the messages are not independent of each other, but form implicit, and sometimes expicit, threads as the participants reply to each other. Most of the threads in the Slack data are not explicit, but must be ’untangled’ from the message stream if they are to be used as a basis for a data combination scheme. In this thesis we study the possibility of detecting implicit threads from a slack message stream and leveraging the threads as a data combination scheme in topic modeling. The threads are detected using a hierarchical clustering algorithm which uses word mover’s distance, latent semantic analysis, and metadata to compute the distances between messages. The clusters are then concatenated and used as the input for LDA. It is shown that on a dataset gathered from the Gofore Oyj Slack workspace, the cluster-based model improves on the message-based model, but falls short of being practical.