Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Title

Sort by: Order: Results:

  • Kallio, Jarmo (2021)
    Despite benefits and importance of ERP systems, they suffer from many usability problems. They have user interfaces that are complex and suffer from "daunting usability problems". Also, their implementation success rate is relatively low and their usability significantly influences this implementation success. As a company offering an ERP system to ferry operators was planning to renew the user interface of this system in future, we investigated usability of the current system so this could guide future implementation of the new user interface. We studied new and long time users by conducting sessions where the users told about their experiences, performed tasks with the system and filled usability questionnaire (System Usability Scale). Many novice and long time users reported problems. The scores from usability questionnaire show all but two participants perceived the usability of the system as below average and in adjective rating "not acceptable". Two users rated the usability as "excellent". We reasoned that there could be a group of users who use the system in such a way and in such context that they do not experience these problems. The results indicate novices have trouble, for example, navigating and completing tasks. Also some long time users reported navigation issues. The system seems to require that it’s users remember lots of things in order to use it well. The interviews and tasks indicate the system is complex and hard to use and both novices and experts face problems. This is supported by perceived usability scores. While experts could in most cases finish all tasks, during interview some of them reported problems such as finding products the customers needed, error reporting being unclear, configuration being tedious, and need for lots of manual typing, for example. We gave recommendations on what to consider when implementing new user interface for this ERP system. For example, navigation should be improved and users should be provided with powerful search tools. ERP usability is not studied much. Our study supports use of already developed heuristics in classifying usability problems. Our recommendations how to improve usability of the ERP system studied should give some guidelines on what could be done, although not much is backed by laboratory studies. More work is needed in this field to find and test solutions to usability problems users face.
  • Tulijoki, Juha-Pekka (2024)
    A tag is a freely chosen keyword that a user attaches to an item. Offering a simple, cheap, and natural way to describe content, tagging has become popular in contemporary web applications. The tag genome is a data structure that contains item-tag relevance scores, i.e., continuous scale numbers from 0 to 1 indicating how relevant a tag is for an item. For example, the tag romantic comedy has a relevance score of 0.97 for the movie Love Actually. With sufficient data, a tag genome dataset can be constructed for any domain. To the best of available knowledge, there are tag genome datasets for movies and books. The tag genome for movies is used in a movie recommender and for various purposes in recommender systems research, such as detecting filter bubbles and serendipity. Creating a diverse tag genome dataset requires an effective machine learning solution, as manual assessment of item-tag relevance scores is impractical. The current state-of-the-art solution, called TagDL, uses features extracted from user-generated tags, reviews, and ratings to employ a multilayer perceptron architecture to predict the item-tag relevance scores. This study aims to enhance TagDL by extracting more features from the embeddings of textual content, namely tags, user reviews, and item titles, using Bidirectional Encoder Representations from Transformers (BERT). The results show that features based on BERT embeddings have a potential positive impact on item-tag relevance score prediction. However, the results do not generalize to both tag genome datasets, improving the results only for the movie dataset. This may indicate that the new features have a stronger impact if the amount of available training data is smaller, as with the movie dataset. Moreover, this thesis discusses future work ideas and implementation possibilities.
  • Du, Mian (2012)
    Pattern-based Understanding and Learning System (PULS) can be considered as one key component of a large distributed news surveillance system. It is formed by the following three parts, 1. an Information Extraction (IE) system running on the back-end, which receives news articles as plain text in RSS feeds arriving continuously in real-time from several partner systems, processes and extracts information from these feeds and stores the information into the database; 2. a Web-based decision support (DS) system running on the front-end, which visualizes the information for decision-making and user evaluation; 3. both of them share the central database, which stores the structured information extracted by IE system and visualized by decision support system. In the IE system, there is an increasing need to extend the capability of extracting information from only English articles in medical and business domain to be able to handle articles in other languages like French, Russian and in other domains. In the decision support system, several new ways of Information Visualization and user evaluation interfaces are required by users for getting better decision support. In order to achieve these new features, a number of approaches including Information Extraction, machine learning, Information Visualization, evolutionary delivery model, requirements elicitation, modelling, database design approach and a variety of evaluation approaches have been investigated and adopted. Besides, various programming languages such as Lisp, Java, Python, JavaScript/Jquery, etc. have been used. More importantly, appropriate development process has been followed. This thesis reports on the whole process followed to achieve the required improvements made to PULS.
  • Trangcasanchai, Sathianpong (2024)
    Large language models (LLMs) have been proven to be state-of-the-art solutions for many NLP benchmarks. However, LLMs in real applications face many limitations. Although such models are seen to contain real-world knowledge, it is kept implicitly in their parameters that cannot be revised and extended unless expensive additional training is performed. These models can hallucinate by confidently producing human-like texts which might contain misleading information. The knowledge limitation and the tendency to hallucinate cause LLMs to struggle with out-of-domain settings. Furthermore, LLMs lack transparency in that their responses are products of big black-box models. While fine-tuning can mitigate some of these issues, it requires high computing resources. On the other hand, retrieval augmentation has been used to tackle knowledge-intensive tasks and proven by recent studies to be effective when coupled with LLMs. In this thesis, we explore Retrieval-Augmented Generation (RAG), a framework to augment generative LLMs with a neural retriever component, in a domain-specific question answering (QA) task. Empirically, we study how RAG helps LLMs in knowledge-intensive situations and explore design decisions in building a RAG pipeline. Our findings underscore the benefits of RAG in the studied situation by showing that leveraging retrieval augmentation yields significant improvement on QA performance over using a pre-trained LLM alone. Furthermore, incorporating RAG in an LLM-driven QA pipeline results in a QA system that accompanies its predictions with evidence documents, leading to a more trustworthy and grounded AI applications.
  • Aldana, Miguel Francisco (2021)
    Accuracy and general performance of weather radar measurements are of great importance to society due to their use in quantitative precipitation estimation and its role on flood hazard risks prevention, agriculture or urban planning, among others. However, radars normally suffer from systematic errors such as attenuation, misscalibration in Z field or bias in Zdr field, or random errors such as clutter, beam blockage, noise, non-meteorological echoes or non-uniform beam filling, which affect directly the rain rate estimates or any other relevant product to meteorologists. Impact of random errors is reduced by exploiding the polarimetric properties of polarimetric radars by identifying and classifying measurements according to their signature and a classification scheme based on the available polarimetric variables, but systematic errors are more difficult to address as they require a ’’true’’ or reference value in order to be corrected. The reference value can either be absolute or obtained from another radar variable. In reality, an absolute reference value is not feasible because we normally do not know what we are observing with the radar. Therefore, a way of assesing this issue is by elaborating theoretical relations between radar variables based on their consistency when measuring a volume with hydrometeors of known characteristics such as size and concentration. This procedure is known as self-consistency theory and it is a powerful tool for checking radar measurements quality and correcting offsets causing bias, misscalibration or attenuation. The theoretical radar variables themselves can be simulated using available T-Matrix scattering algorithms, that estimate the scattered phase and amplitude for a given distribution of drops of a given size. Information of distribution of drops of a given size, commonly referred as drop size distributions, can be obtained, for instance, from gauge or disdrometer measurements. Once the theoretical relations among radar variables are established, it is possible to check the consistency of, for instance, measured differential reflectivity with respect to differential reflectivity calculated as function of measured reflectivity, assuming the latter has been filtered properly, and any discrepancy between the observed and theoretical differential reflectivity can be thus attributed to offsets in the radar. This work thus presents a methodology for the revision of radar measurements filtering and quality for their improvement by correcting bias and calibration, using theoretical relations between radar variables through self-consistency theory. Furthermore, as the aforementioned issues are easier to track and resolve in the liquid rain regime of precipitation, this work presents a detailed description of methodologies to exclude ice-phased hydrometeors such as the melting layer detection algorithm and its operational implementation along with other complementary filters suggested in the literature. Examples of the melting layer detection and filtering as well as self-consistency curves for radar measurement performance evaluation are also provided.
  • Lauha, Patrik (2021)
    Automatic bird sound recognition has been studied by computer scientists since late 1990s. Various techniques have been exploited, but no general method, that could even nearly match the performance of a human expert, has been developed yet. In this thesis, the subject is approached by reviewing alternative methods for cross-correlation as a similarity measure between two signals in template-based bird sound recognition models. Template-specific binary classification models are fit with different methods and their performance is compared. The contemplated methods are template averaging and procession before applying cross-correlation, use of texture features as additional predictors, and feature extraction through transfer learning with convolutional neural networks. It is shown that the classification performance of template-specific models can be improved by template refinement and utilizing neural networks’ ability to automatically extract relevant features from bird sound spectrograms.
  • Vänskä, Risto (2020)
    Blood glucose monitoring (BGM) is an essential part of diabetes management. Currently available BGM technologies are invasive by nature, and require either sampling blood or an invasive needlelike sensor to be inserted under the skin. Extraction of interstitial fluid using magnetohydrodynamics is proposed as a technology to enable truly non-invasive glucose monitoring. Presently, only direct current (DC) has been investigated to be employed for magnetohydrodynamic extraction of interstitial fluid. In this study we studied in a skin model whether the extraction rate of magnetohydrodynamic extraction can be increased by using a unipolar square wave, with same time-averaged current as DC extraction used as a control. The results indicate a 2.3-fold increase in the average extraction rate, when unipolar square wave of double intensity is applied, as compared to DC. The results serve towards reducing the measurement time in non-invasive glucose monitoring systems utilizing magnetohydrodynamic extraction.
  • Sokka, Iris (2019)
    Cancer is a worldwide health problem; in 2018 9.6 million people died of cancer, meaning that about 1 in 6 deaths was caused by it. The challenge with cancer drug therapy has been the development of cancer drugs that are effective against cancer but are not harmful to the healthy cells. One of the solutions to this has been antibody-drug conjugates (ADCs), where a cytotoxic drug is bound to an antibody. The antibody binds to specific antigen present on the surface of the cancer cell, thus working as a vessel to carry the drug specifically to the cancer cells. Monomethyl auristatin E (MMAE) and monomethyl auristatin F (MMAF) are mitosis preventing cancer drugs. The auristatins are pentapeptides that were developed from dolastatin 10. MMAE consist of monomethyl valine (MeVal), valine (Val), dolaisoleiune (Dil), dolaproine (Dap) and norephedrine (PPA). MMAF has otherwise similar structure, but norephedrine is replaced by phenylalanine (Phe). They prevent cell division and cancer cell proliferation by binding to microtubules and are thus able to kill any kind of cell. By attaching the auristatin to an antibody that targets cancer cells, they can effectively be used in the treatment of cancer. MMAE and MMAF exist as two conformers in solution, namely as cis- and trans-conformers. The trans-conformer resembles the biologically active conformer. It was recently noted that in solution 50-60 % of the MMAE and MMAF-molecules exist in the biologically inactive cis-conformer. The molecule changes from one conformer to the other by the rotation of an amide bond. However, this takes several hours in body temperature. As the amount of the cis-conformer is significant, the efficacy of the drug is decreased, and the possibility of side effects is increased. It is possible that the molecule leaves the cancer cell in its inactive form, migrates to healthy cells and tissue, and transforms to the active form there, damaging the healthy cell. The goal of this study was to modify the structure of the auristatins so that the cis/trans-equilibrium would change to favor the biologically active trans-conformer. The modifications were done virtually, and the relative energies were computed using high-level quantum chemical methods, at density functional theory (DFT), 2nd order perturbation theory (MP2) and coupled cluster levels. Intramolecular interactions were analyzed computationally, employing symmetry-adapted perturbation theory and the non-covalent interactions analysis. The results suggest that simple halogenation of the benzene ring para-position is able to significantly shift the cis/trans-equilibrium to favor the trans-conformer. This is due to changes in intramolecular interactions that favor the trans-conformer after halogenation. For example, the NCI analysis shows that the halogen atom invokes stabilizing intramolecular interactions with the Dil amino acid; there is no such interaction between the para-position hydrogen and Dil in the original molecules. We also performed docking studies that show that the halogenated molecules can bind to microtubules, thus confirming that the modified structures have potential to be developed into new, more efficient and safe cancer drugs. The most promising drug candidates are Cl-MMAF, F-MMAF, and F-MMAE where 94, 90, and 79 % of the molecule is predicted to exist in the biologically active trans-conformer, respectively.
  • Björklund, Otso (2018)
    Methods for discovering repeated patterns in music are important tools in computational music analysis. Repeated pattern discovery can be used in applications such as song classification and music generation in computational creativity. Multiple approaches to repeated pattern discovery have been developed, but many of the approaches do not work well with polyphonic music, that is, music where multiple notes occur at the same time. Music can be represented as a multidimensional dataset, where notes are represented as multidimensional points. Moving patterns in time and transposing their pitch can be expressed as translation. Multidimensional representations of music enable the use of algorithms that can effectively find repeated patterns in polyphonic music. The research on methods for repeated pattern discovery in multidimensional representa- tions of music is largely based on the SIA and SIATEC algorithms. Multiple variants of both algorithms have been developed. Most of the variants use SIA or SIATEC directly and then use heuristic functions to identify the musically most important patterns. The variants do not thus typically provide improvements in running time. However, the running time of SIA and SIATEC can be impractical on large inputs. This thesis focuses on improving the running time of pattern discovery in multidimensional representations of music. The algorithms that are developed in this thesis are based on SIA and SIATEC. Two approaches to improving running time are investigated. The first approach involves the use of hashing, and the second approach is based on using filtering to avoid the computation of unimportant patterns altogether. Three novel algorithms are presented: SIAH, SIATECH, and SIATECHF. The SIAH and SIATECH algorithms, which use hashing, were found to provide great improvements in running time over the corresponding SIA and SIATEC algorithms. The use of filtering in SIATECHF was not found to significantly improve the running time of repeated pattern discovery.
  • Nordman, Kristian (2017)
    Profit Software's Profit Life and Pension (PLP) is an investment insurance management system. This means that PLP handles investment insurances from the moment they are sold to when they eventually expire. For a system that handles money, it is important that it can be trusted. Therefore, testing is a required part of PLP's development. This thesis is an investigation into PLP's testing strategy. In this thesis we analyse PLP's current testing strategy to find flaws and impediments. We then offer improvement suggestions to the identified problem areas as well as suggest additions which we found could be beneficial.
  • Kammonen, Juhana (2013)
    Biological populations arise, develop and evolve under a series of well-studied laws and fairly regular mechanisms. Population genetics is a field of science, that aims to study and model these laws and the genetic composition and diversity of populations of various types of species and life. At best, population genetic models can be of use in verifying past events of a population and eventually reconstructing unknown population histories in light of multidisciplinary evidence. An example case of this is the research concerning human population prehistory of Finland. Population simulations are a sub-branch of the rapidly developing field of bioinformatics and can be divided into two pipelines: forward-in-time and backward-in-time (coalescent). The methodologies enable in silico testing of the development of genetic composition of individuals in a well-defined population. This thesis focuses on the forward-in-time approach. Multiple pieces of software exist today for forward population simulations, and simuPOP [http://simupop.sourceforge.net] probably is the single most flexible one of them. Being able to incorporate transmission of genomes and arbitrary individual information between generations, simuPOP has potential applications even beyond population genetics. However, simuPOP tends to use an enormous amount of computer random access memory when simulating large population sizes. This thesis introduces three approaches to improve the throughput of simuPOP. These are i) introducing scripting guidelines, ii) approximating a complex simulation using the inbuilt biallelic mode of simuPOP and iii) changes in the source code of simuPOP that would enable improved throughput. A previous simuPOP script designed to simulate past demographic events of Finnish population history is used as an example. A batch of 100 simulation runs is run on three versions of the previous script: standard, modified and biallelic. As compared to the standard mode, the modified simulation script performs marginally faster. Despite doubling the user time of a single simulation run, the biallelic approximation method proves to consume three times less random access memory still being compatible from the population genetic point of view. This suggests that built-in support for the biallelic approximation could be a valuable supplement to simuPOP. Evidently, simuPOP can be applied to very complex forward population simulations. The use of individual information fields enables the user to set up arbitrary simulation scenarios. Data structure changes at source code level are likely to improve throughput even further. Besides introducing improvements and guidelines to the simulation workflow, this thesis is a standalone case study concerning the use and development of a bioinformatics software. Furthermore, an individual development version of simuPOP called simuPOP-rev is founded with the goal of implementing the source code changes suggested in this thesis. ACM Computing Classification System (CCS): D.1 [Programming techniques], G.1.6 [Optimization], H.3 [Information storage and retrieval]
  • Kuparinen, Simo (2023)
    Web development is in great demand these days. Constantly developing technologies enables to create impressive websites and mitigates the amount of development work. However, it is useful to consider the performance aspect, which affects directly to user experience. Performance in this context means website’s load times. Front end web development typically involves using Cascading Style Sheets (CSS) which is a style sheet language and a web technology that is used to describe the visual presentation of a website. This research consist of a literature review part, which contains background knowledge about how web browsers work, performance in general, performance metrics along with CSS performance optimization and an empirical part, which includes different benchmarks presented in major software industry conferences for testing the performance of a certain CSS feature, that have a possibility to improve the performance of the website. The loading times obtained from the benchmarks are reviewed and compared with each other. In addition, a few techniques are presented that do not have their own benchmark, but which may have an effect on performance. To highlight the results, CSS performance is usually not the biggest bottleneck of performance on a website, since the overall style calculation takes about a quarter of the total runtime calculation on average. However, utilizing some particular techniques and managing to shrink the style calculation costs can be valuable. Based on the benchmarks on this research, using shadow DOM and scoped styles have a positive effect on style performance. For layout, performance benefits can be achieved by utilizing CSS containment and concurrent rendering. From other practices, it can be concluded that removing unused CSS, avoiding reflow and repaint along with complex selectors and considering the usage of web fonts a better results can be achieved in terms of performance.
  • Konyushkova, Ksenia (2013)
    Imagine a journalist looking for an illustration to his article about patriotism in a database of unannotated images. The idea of a suitable image is very vague and the best way to navigate through the database is to provide feedback to the images proposed by an Image Retrieval system in order to enable the system to learn what the ideal target image of the user is. Thus, at each search iteration a set of n images is displayed and the user must indicate how relevant they are to his/her target. When considering real-life problems we must also take into account the system's time-complexity and scalability to work with Big Data. To tackle this issue we utilize hierarchical Gaussian Process Bandits with visual Self-Organizing Map as a preprocessing technique. A prototype system called ImSe was developed and tested in experiments with real users in different types of tasks. The experiments show favorable results and indicate the benefits of proposed algorithms in different types of tasks.
  • Kemppainen, Esa (2020)
    NP-hard optimization problems can be found in various real-world settings such as scheduling, planning and data analysis. Coming up with algorithms that can efficiently solve these problems can save various rescources. Instead of developing problem domain specific algorithms we can encode a problem instance as an instance of maximum satisfiability (MaxSAT), which is an optimization extension of Boolean satisfiability (SAT). We can then solve instances resulting from this encoding using MaxSAT specific algorithms. This way we can solve instances in various different problem domains by focusing on developing algorithms to solve MaxSAT instances. Computing an optimal solution and proving optimality of the found solution can be time-consuming in real-world settings. Finding an optimal solution for problems in these settings is often not feasible. Instead we are only interested in finding a good quality solution fast. Incomplete solvers trade guaranteed optimality for better scalability. In this thesis, we study an incomplete solution approach for solving MaxSAT based on linear programming relaxation and rounding. Linear programming (LP) relaxation and rounding has been used for obtaining approximation algorithms on various NP-hard optimization problems. As such we are interested in investigating the effectiveness of this approach on MaxSAT. We describe multiple rounding heuristics that are empirically evaluated on random, crafted and industrial MaxSAT instances from yearly MaxSAT Evaluations. We compare rounding approaches against each other and to state-of-the-art incomplete solvers SATLike and Loandra. The LP relaxation based rounding approaches are not competitive in general against either SATLike or Loandra However, for some problem domains our approach manages to be competitive against SATLike and Loandra.
  • Warro, Olli (2023)
    In many real-world problems, the task is to find an optimal solution within a finite set of solutions. Many of the problems, also known as combinatorial optimization problems, are NPhard. In other words, finding an optimal solution for the problems is computationally difficult. However, being important for many real-world applications, there is a demand for efficient ways to solve the problems. One approach is the declarative approach, where the problems are first encoded into a mathematical constraint language. Then, the encoded problem instance is solved by an algorithm developed for that constraint language. In this thesis, we focus on declarative pseudo-Boolean optimization (PBO). PBO is the set of integer programs (IP) where the variables can only be assigned to 0 or 1. For many real-world applications, finding an optimal solution is too time-consuming. Instead of finding an optimal solution, incomplete methods attempt to find good enough solutions in a given time limit. To the best of our knowledge, there are not many incomplete algorithms developed specifically for PBO. In this thesis, we adapt an incomplete method developed for the maximum satisfiability problem to PBO. In the adapted algorithm, which we call LS-ORACLE-PBO, a given PBO instance is solved using a form of local search that utilizes a pseudo-Boolean decision oracle when moving from one solution to another. We implement and empirically compare LS-ORACLE-PBO to another recent incomplete PBO algorithm called LS-PBO. The results show that, in general, our implementation is not competitive against LS-PBO. However, for some problem instances, our implementation provides better results than LS-PBO.
  • Löflund, Jan-Erik (2013)
    XML-tietomallin käyttö on yleistynyt mm. rakenteisissa dokumenteissa, verkkosovellusten toteuttamisessa ja Internetissä tapahtuvassa tiedonsiirrossa. Tämän myötä tarve XML-muotoisen tiedon pysyvään säilyttämiseen on kasvanut. Tähän tarkoitukseen on kehitetty XML-tietomallia tiedonsäilytys- ja käsittelymuotonaan käyttäviä XML-pohjaisia tietokantoja. XML-muotoiset dokumentit ovat usein rakenteeltaan monimuotoisia ja kooltaan suuria. Tämän vuoksi XML-tietokannanhallintajärjestelmä on suunniteltava ja toteutettava tehokkaaksi, jotta sen avulla voidaan kohtuullisin laitteistoresurssein ja lyhyin vasteajoin suorittaa suuriakin määriä tietokantakyselyitä ja -päivityksiä, jotka voivat myös olla monipuolisia ja rinnakkaisia ja kohdistua suureen määrään tietoa kerrallaan. Tässä työssä esitetään, miten XML-tietokannanhallintajärjestelmän suorituskykyä voidaan merkittävästi parantaa dokumenttien indeksoinnilla. Indeksoinnissa XML-dokumenttien elementeille luodaan yksikäsitteiset tunnisteet, joihin perustuen luodaan erilaisia indeksihakemistoja. Indeksoinnin avulla tieto voidaan tehokkaasti paikantaa tietokannan tietosivuilta ja siirtää tietokannanhallintajärjestelmän tietosivujen ja puskurin välillä, mikä nopeuttaa tietokannanhallintajärjestelmän toimintaa ja lisää sen kykyä käsitellä rinnakkaisia luku- ja kirjoituspyyntöjä. Indeksoinnin avulla voidaan myös tehostaa tietokannanhallintajärjestelmän kyselynkäsittelyalgoritmien toimintaa mahdollistamalla niiden käyttämien joukkoliitosoperaatioiden tehokas toteutus. BaseX- ja eXist ovat XML-pohjaisia tietokannanhallintajärjestelmiä, joissa käytettävissä on useita erilaisia indeksejä. Indeksien toteutus näissä järjestelmissä kuvataan, ja näiden järjestelmien tehokkuutta XML-dokumentteihin tehtävien tietokantakyselyiden suorituksessa mitataan ja arvioidaan tätä varten kehitetyn XMark-koetinkuorman avulla.
  • Wahlroos, Mika (2013)
    Tiedonhallinnassa käytetään usein metatietona tiedon sisältöä kuvaavia avainsanoja parantamaan tiedon hallittavuutta tai löydettävyyttä. Sisällön kuvailua luonnollisen kielen termein tai käsittein kutsutaan indeksoinniksi. Yhdenmukaisuuden vuoksi voidaan käyttää tarkoitusta varten laadittua asiasanastoa, joka kattaa toimialan kannalta keskeisen termistön. Semanttisessa webissä ja yhdistetyssä tiedossa käytettävät ontologiat vievät ajatuksen pitemmälle määrittelemällä termit käsitteinä ja niiden välisinä merkityssuhteina. Metatiedon tuottamisen helpottamiseksi ja tehostamiseksi on kehitetty erilaisia menetelmiä, joilla sisältöä kuvailevia termejä voidaan tuottaa tekstiaineistosta automaattisesti. Tässä tutkielmassa keskitytään avaintermien automaattiseen eristämiseen tekstistä sekä metatiedon laatuun ja sen arvioinnin menetelmiin. Esimerkkitapauksena käsitellään ontologiaa hyödyntävän Maui-indeksointityökalun käyttöä asiakirjallisen tiedon automaattiseen asiasanoittamiseen. Automaattisesti eristetyn metatiedon laatua verrataan alkuperäiseen ihmisten määrittämään asiasanoitukseen käyttäen tarkkuus- ja saantimittauksia. Lisäksi evaluointia täydennetään aihealueen asiantuntijoiden esittämillä subjektiivisilla laatuarvioilla. Tulosten perusteella selvitetään tekstin esikäsittelyn ja sanaston hierarkian merkitystä automaattisen asiasanoituksen laadun kannalta sekä pohditaan keinoja annotointimenetelmän jatkokehittämiseksi.
  • Barin Pacela, Vitória (2021)
    Independent Component Analysis (ICA) aims to separate the observed signals into their underlying independent components responsible for generating the observations. Most research in ICA has focused on continuous signals, while the methodology for binary and discrete signals is less developed. Yet, binary observations are equally present in various fields and applications, such as causal discovery, signal processing, and bioinformatics. In the last decade, Boolean OR and XOR mixtures have been shown to be identifiable by ICA, but such models suffer from limited expressivity, calling for new methods to solve the problem. In this thesis, "Independent Component Analysis for Binary Data", we estimate the mixing matrix of ICA from binary observations and an additionally observed auxiliary variable by employing a linear model inspired by the Identifiable Variational Autoencoder (iVAE), which exploits the non-stationarity of the data. The model is optimized with a gradient-based algorithm that uses second-order optimization with limited memory, resulting in a training time in the order of seconds for the particular study cases. We investigate which conditions can lead to the reconstruction of the mixing matrix, concluding that the method is able to identify the mixing matrix when the number of observed variables is greater than the number of sources. In such cases, the linear binary iVAE can reconstruct the mixing matrix up to order and scale indeterminacies, which are considered in the evaluation with the Mean Cosine Similarity Score. Furthermore, the model can reconstruct the mixing matrix even under a limited sample size. Therefore, this work demonstrates the potential for applications in real-world data and also offers a possibility to study and formalize identifiability in future work. In summary, the most important contributions of this thesis are the empirical study of the conditions that enable the mixing matrix reconstruction using the binary iVAE, and the empirical results on the performance and efficiency of the model. The latter was achieved through a new combination of existing methods, including modifications and simplifications of a linear binary iVAE model and the optimization of such a model under limited computational resources.
  • Lång, Jone (2022)
    Test automation has a crucial role in modern software development. Automated tests are immensely helpful in quality assurance, catching bugs and giving information on the state of the software. There are many existing frameworks that are designed to assist in creating automated tests. The frameworks can have massively varying purposes and targeted applications and technologies. In this paper, we aim to study a selected group of Behavior Driven Development (BDD) testing frameworks, compare them, identify their strengths and shortcomings, and implement our own testing framework to answer the discovered challenges. Finally, we will evaluate the resulting framework and see if it can meet its requirements. As a result we’ll have a better understanding in what kind of tools there are for automating behavior driven tests, what type of different approaches have been and can be taken to implement such frameworks, and what are the benefits and suitable uses of each tool.
  • Lahermaa, Petri (2022)
    Kimberlites are a primary source of diamonds. However, not all kimberlites contain diamonds, let alone in the abundance that enable economically profitable exploitation. To assess the diamond potential of kimberlites, the study of kimberlite/diamond indicator minerals (i.e. mantle-derived xenocrysts) can be utilized. In this study the diamond potential of Liqhobong kimberlite cluster in Lesotho, Southern Africa, was studied. Indicator mineral grains (chromian diopside, garnet, chromite, ilmenite) from Liqhobong kimberlites were analysed for major and minor elements using electron microprobe. Results were used to examine formation conditions (P/T) and chemical characteristics of indicator minerals, to define local geotherm and diamond window and to provide general overview to diamond potential of the Liqhobong kimberlite cluster. Based on single clinopyroxene thermobarometry (using Cr-diopside grains as a thermobarometer), the local Liqhobong geotherm is estimated to be ~41 mW/m². It corresponds reasonably accurately to the reference geotherm 40 mW/m², which is widely considered being a decent “average” geotherm at Kaapvaal-Kalahari Craton area. Analysed xenocrysts are principally from the peridotitic source rock. Geochemistry of the indicator minerals show that there is a significant diamond potential in Liqhobong kimberlites. Specifically, Ni-in-garnet thermometry, using the formation depth of the G10 garnets combined to the relevant 40 mW/m² reference geotherm, displays the existence of a significant diamond window at the depth interval of ~140-230 km below the Liqhobong area.