Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "FinBERT"

Sort by: Order: Results:

  • Nurmi, Akseli (2024)
    Deep neural networks are widely used in natural language processing. Large language models, trained with large corpora, enable improved information extraction from data that is too large for human processing. This thesis reviews the performance of a deep learning natural language processing pipeline in detecting and removing (anonymising) personal information. Methods for fast and accurate ano- or pseudonymisation of data containing sensitive information are vital to research and development in science and industry, as legislation demands extensive procedures concerning handling of data with direct or indirect personal information. We propose a method that achieves state of the art results on noisy data, and good performance on a contemporary benchmark. Our comparison of anonymisation performance is one of the first for Finnish free texts.
  • Nurmi, Akseli (2024)
    Deep neural networks are widely used in natural language processing. Large language models, trained with large corpora, enable improved information extraction from data that is too large for human processing. This thesis reviews the performance of a deep learning natural language processing pipeline in detecting and removing (anonymising) personal information. Methods for fast and accurate ano- or pseudonymisation of data containing sensitive information are vital to research and development in science and industry, as legislation demands extensive procedures concerning handling of data with direct or indirect personal information. We propose a method that achieves state of the art results on noisy data, and good performance on a contemporary benchmark. Our comparison of anonymisation performance is one of the first for Finnish free texts.
  • Huhtilainen, Heli (2023)
    This thesis studies the application of language models to improve search in an online shop specialising in wholesale builder–trade product and service sales. The first aim was to determine if a Finnish language model could capture the meaning behind the search query words to improve the match between the queries and the product descriptions. Secondly, it was investigated if it was possible to train the model to recognise what products the users wanted to find with the search terms they used. Finally, it was investigated if it was possible to use the model for search ranking. Three models were trained using FinBERT as a model checkpoint and domain-specific product and clickthrough data for fine-tuning the models. The first two models were trained to classify online store products into product categories. The task was completed with a 0.98 F measure score for the model with 55 target categories and a 0.81 F measure score for the model with 762 target categories. The third model was trained to evaluate the relevance probabilities of search query-product pairs. The model generally determined more products as relevant than the current search engine solution. The F measure score for the model was 0.90, and in qualitative evaluation, the predictions made by the model made semantically sense. The restrictions for the practical use of the third model for search ranking come from the prediction inference needing to be faster to make search ranking predictions for many products.