Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Tulijoki, Juha-Pekka"

Sort by: Order: Results:

  • Tulijoki, Juha-Pekka (2024)
    A tag is a freely chosen keyword that a user attaches to an item. Offering a simple, cheap, and natural way to describe content, tagging has become popular in contemporary web applications. The tag genome is a data structure that contains item-tag relevance scores, i.e., continuous scale numbers from 0 to 1 indicating how relevant a tag is for an item. For example, the tag romantic comedy has a relevance score of 0.97 for the movie Love Actually. With sufficient data, a tag genome dataset can be constructed for any domain. To the best of available knowledge, there are tag genome datasets for movies and books. The tag genome for movies is used in a movie recommender and for various purposes in recommender systems research, such as detecting filter bubbles and serendipity. Creating a diverse tag genome dataset requires an effective machine learning solution, as manual assessment of item-tag relevance scores is impractical. The current state-of-the-art solution, called TagDL, uses features extracted from user-generated tags, reviews, and ratings to employ a multilayer perceptron architecture to predict the item-tag relevance scores. This study aims to enhance TagDL by extracting more features from the embeddings of textual content, namely tags, user reviews, and item titles, using Bidirectional Encoder Representations from Transformers (BERT). The results show that features based on BERT embeddings have a potential positive impact on item-tag relevance score prediction. However, the results do not generalize to both tag genome datasets, improving the results only for the movie dataset. This may indicate that the new features have a stronger impact if the amount of available training data is smaller, as with the movie dataset. Moreover, this thesis discusses future work ideas and implementation possibilities.