Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by study line "Algoritmit"

Sort by: Order: Results:

  • Laitala, Julius (2021)
    Arranging products in stores according to planograms, optimized product arrangement maps, is important for keeping up with the highly competitive modern retail market. The planograms are realized into product arrangements by humans, a process which is prone to mistakes. Therefore, for optimal merchandising performance, the planogram compliance of the arrangements needs to be evaluated from time to time. We investigate utilizing a computer vision problem setting – retail product detection – to automate planogram compliance evaluation. We introduce the relevant problems, the state-of- the-art approaches for solving them and background information necessary for understanding them. We then propose a computer vision based planogram compliance evaluation pipeline based on the current state of the art. We build our proposed models and algorithms using PyTorch, and run tests against public datasets and an internal dataset collected from a large Nordic retailer. We find that while the retail product detection performance of our proposed approach is quite good, the planogram compliance evaluation performance of our whole pipeline leaves a lot of room for improvement. Still, our approach seems promising, and we propose multiple ways for improving the performance enough to enable possible real world utility. The code used for our experiments and the weights for our models are available at
  • Fred, Hilla (2022)
    Improving the monitoring of health and well-being of dairy cows through the use of computer vision based systems is a topic of ongoing research. A reliable and low-cost method for identifying cow individuals would enable automatic detection of stress, sickness or injury, and the daily observation of the animals would be made easier. Neural networks have been used successfully in the identification of cow individuals, but methods are needed that do not require incessant annotation work to generate training datasets when there are changes within a group. Methods for person re-identification and tracking have been researched extensively, with the aim of generalizing beyond the training set. These methods have been found suitable also for re-identifying and tracking previously unseen dairy cows in video frames. In this thesis, a metric-learning based re-identification model pre-trained on an existing cow dataset is compared to a similar model that has been trained on new video data recorded at Luke Maaninka research farm in Spring 2021, which contains 24 individually labelled cow individuals. The models are evaluated in tracking context as appearance descriptors in Kalman filter based tracking algorithm. The test data is video footage from a separate enclosure in Maaninka and a group of 24 previously unseen cow individuals. In addition, a simple procedure is proposed for the automatic labeling of cow identities in images based on RFID data collected from cow ear tags and feeding stations, and the known feeding station locations.
  • Hertweck, Corinna (2020)
    In this work, we seek robust methods for designing affirmative action policies for university admissions. Specifically, we study university admissions under a real centralized system that uses grades and standardized test scores to match applicants to university programs. For the purposes of affirmative action, we consider policies that assign bonus points to applicants from underrepresented groups with the goal of preventing large gaps in admission rates across groups, while ensuring that the admitted students are for the most part those with the highest scores. Since such policies have to be announced before the start of the application period, there is uncertainty about which students will apply to which programs. This poses a difficult challenge for policy-makers. Hence, we introduce a strategy to design policies for the upcoming round of applications that can either address a single or multiple demographic groups. Our strategy is based on application data from previous years and a predictive model trained on this data. By comparing this predictive strategy to simpler strategies based only on application data from, e.g., the previous year, we show that the predictive strategy is generally more conservative in its policy suggestions. As a result, policies suggested by the predictive strategy lead to more robust effects and fewer cases where the gap in admission rates is inadvertently increased through the suggested policy intervention. Our findings imply that universities can employ predictive methods to increase the reliability of the effects expected from the implementation of an affirmative action policy.
  • Ikkala, Tapio (2020)
    This thesis presents a scalable method for identifying anomalous periods of non-activity in short periodic event sequences. The method is tested with real world point-of-sale (POS) data from grocery retail setting. However, the method can be applied also to other problem domains which produce similar sequential data. The proposed method models the underlying event sequence as a non-homogeneous Poisson process with a piecewise constant rate function. The rate function for the piecewise homogeneous Poisson process can be estimated with a change point detection algorithm that minimises a cost function consisting of the negative Poisson log-likelihood and a penalty term that is linear to the number of change points. The resulting model can be queried for anomalously long periods of time with no events, i.e., waiting times, by defining a threshold below which the waiting time observations are deemed anomalies. The first experimental part of the thesis focuses on model selection, i.e., in finding a penalty value that results in the change point detection algorithm detecting the true changes in the intensity of the arrivals of the events while not reacting to random fluctuations in the data. In the second experimental part the performance of the anomaly detection methodology is measured against stock-out data, which gives an approximate ground truth for the termination of a POS event sequence. The performance of the anomaly detector is found to be subpar in terms of precision and recall, i.e., the true positive rate and the positive predictive value. The number of false positives remains high even with small threshold values. This needs to be taken into account when considering applying the anomaly detection procedure in practice. Nevertheless, the methodology may have practical value in the retail setting, e.g., in guiding the store personnel where to focus their resources in ensuring the availability of the products.
  • Duong, Quoc Quan (2021)
    Discourse dynamics is one of the important fields in digital humanities research. Over time, the perspectives and concerns of society on particular topics or events might change. Based on the changing in popularity of a certain theme different patterns are formed, increasing or decreasing the prominence of the theme in news. Tracking these changes is a challenging task. In a large text collection discourse themes are intertwined and uncategorized, which makes it hard to analyse them manually. The thesis tackles a novel task of automatic extraction of discourse trends from large text corpora. The main motivation for this work lies in the need in digital humanities to track discourse dynamics in diachronic corpora. Machine learning is a potential method to automate this task by learning patterns from the data. However, in many real use-cases ground truth is not available and annotating discourses on a corpus-level is incredibly difficult and time-consuming. This study proposes a novel procedure to generate synthetic datasets for this task, a quantitative evaluation method and a set of benchmarking models. Large-scale experiments are run using these synthetic datasets. The thesis demonstrates that a neural network model trained on such datasets can obtain meaningful results when applied to a real dataset, without any adjustments of the model.
  • Ikonen, Eetu (2023)
    The maximum constraint satisfaction problem (MaxCSP) is a combinatorial optimization problem in which the set of feasible solutions is expressed using decision variables and constraints on how the variables can be assigned. It can be used to represent a wide range of other combinatorial optimization problems. The maximum satisfiability problem (MaxSAT) is a restricted variant of the maximum constraint satisfaction problem with the additional restrictions that all variables must be Boolean variables, and all constraints must be logical Boolean formulas. Because of this, expressing problems using MaxSAT can be unintuitive. The known solving methods for the MaxSAT problem are more efficient than the known solving methods for MaxCSP. Therefore, it is desirable to express problems using MaxSAT. However, every MaxCSP instance that only has finite-domain variables can be encoded into an equivalent MaxSAT instance. Encoding a MaxCSP instance to a MaxSAT instance allows users to combine the strengths of both approaches by expressing problems using the more intuitive MaxCSPs but solving them using the more efficient MaxSAT solving methods. In this thesis, we overview three common MaxCSP to MaxSAT encodings, the sparse, log, and order encodings, that differ in how they encode an integer variable into a set of Boolean variables. We use correlation clustering as a practical example for comparing the encodings. We first represent correlation clustering problems using MaxCSPs, and then encode them into MaxSATs instances. State-of-the-art MaxSAT solvers are then used to solve the MaxSAT instances. We compare the encodings by measuring the time it takes to encode a MaxCSP instance into a MaxSAT instance and the time it takes to solve the MaxSAT instance. The scope of our experiments is too small to draw general conclusions but in our experiments, the log encoding was the best overall choice.
  • Vidjeskog, Martin (2022)
    The traditional way of computing the Burrows-Wheeler transform (BWT) has been to first build a suffix array, and then use this newly formed array to obtain the BWT. While this approach runs in linear time, the space requirement is far from optimal. When the length of the input string increases, the required working space quickly becomes too large for normal computers to handle. To overcome this issue, researchers have proposed many different types of algorithms for building the BWT. In 2009, Daisuke Okanohara and Kunihiko Sadakane presented a new linear time algorithm for BWT construction. This algorithm is relatively fast and requires far less working space than the traditional way of computing the BWT. It is based on a technique called induced sorting and can be seen as a state-of-the-art approach for internal memory BWT construction. However, a proper exploration of how to implement the algorithm efficiently has not been undertaken. One 32-bit implementation of the algorithm is known to exist, but due to the limitations of 32-bit programs, it can only be used for input strings under the size of 4 GB. This thesis introduces the algorithm from Okanohara and Sadakane and implements a 64-bit version of it. The implemented algorithm can in theory support input strings that are thousands of gigabytes in size. In addition to the explanation of the algorithm, the time and space requirements of the 64-bit implementation are compared to some other fast BWT algorithms.
  • Luopajärvi, Kalle (2022)
    In independent component analysis the data is decomposed into its statistically independent components. In recent years, statistical models have been developed that solve a non-linear version of the independent component analysis. This thesis focuses on the estimation methods of a particular non-linear independent component analysis model called iVAE. It is shown on simulated data that the generative adversarial networks can significantly improve the iVAE model estimation compared with the previously used default iVAE estimation method. The improved model estimation might enable new applications for the iVAE model.
  • Luopajärvi, Kalle (2022)
    In independent component analysis the data is decomposed into its statistically independent components. In recent years, statistical models have been developed that solve a non-linear version of the independent component analysis. This thesis focuses on the estimation methods of a particular non-linear independent component analysis model called iVAE. It is shown on simulated data that the generative adversarial networks can significantly improve the iVAE model estimation compared with the previously used default iVAE estimation method. The improved model estimation might enable new applications for the iVAE model.
  • Aula, Kasimir (2019)
    Air pollution is considered to be one of the biggest environmental risks to health, causing symptoms from headache to lung diseases, cardiovascular diseases and cancer. To improve awareness of pollutants, air quality needs to be measured more densely. Low-cost air quality sensors offer one solution to increase the number of air quality monitors. However, they suffer from low accuracy of measurements compared to professional-grade monitoring stations. This thesis applies machine learning techniques to calibrate the values of a low-cost air quality sensor against a reference monitoring station. The calibrated values are then compared to a reference station’s values to compute error after calibration. In the past, the evaluation phase has been carried out very lightly. A novel method of selecting data is presented in this thesis to ensure diverse conditions in training and evaluation data, that would yield a more realistic impression about the capabilities of a calibration model. To better understand the level of performance, selected calibration models were trained with data corresponding to different levels of air pollution and meteorological conditions. Regarding pollution level, using homogeneous training and evaluation data, the error of a calibration model was found to be even 85% lower than when using diverse training and evaluation pollution environment. Also, using diverse meteorological training data instead of more homogeneous data was shown to reduce the size of the error and provide stability on the behavior of calibration models.
  • Kilpinen, Arttu (2022)
    The objective of the shortest common superstring problem is to find a string of minimum length that contains all keywords in the given input as substrings. Shortest common superstrings have many applications in the fields of data compression and bioinformatics. For example, a common superstring can be seen as a compressed form of the keywords it is generated from. Since the shortest common superstring problem is NP-hard, we focus on the approximation algorithms that implement a so-called greed heuristic. It turns out that the actual shortest common superstring is not always needed. Instead, it is often enough to find an approximate solution of sufficient quality. We provide an implementation of the Ukkonen's linear time algorithm for the greedy heuristic. The practical performance of this implementation is measured by comparing it to another implementation of the same heuristic. We also hypothesize that shortest common superstrings can be potentially used to improve the compression ratio of the Relative Lempel-Ziv data compression algorithm. This hypothesis is examined and shown to be valid.
  • Jylhä-Ollila, Pekka (2020)
    K-mer counting is the process of building a histogram of all substrings of length k for an input string S. The problem itself is quite simple, but counting k-mers efficiently for a very large input string is a difficult task that has been researched extensively. In recent years the performance of k-mer counting algorithms have improved significantly, and there have been efforts to use graphics processing units (GPUs) in k-mer counting. The goal for this thesis was to design, implement and benchmark a GPU accelerated k-mer counting algorithm SNCGPU. The results showed that SNCGPU compares reasonably well to the Gerbil k-mer counting algorithm on a mid-range desktop computer, but does not utilize the resources of a high-end computing platform as efficiently. The implementation of SNCGPU is available as open-source software.
  • Franssila, Fanni (2023)
    Magnetic reconnection is a phenomenon occurring in plasma and related magnetic fields when magnetic field lines break and rejoin, leading to the release of energy. Magnetic reconnections take place, for example, in the Earth’s magnetosphere, where they can affect the space weather and even damage systems and technology on and around the Earth. Another site of interest is in fusion reactors, where the energy released from reconnection events can cause instability in the fusion process. So far, 2D magnetic reconnection has been widely studied and is relatively well-understood, whereas the 3D case remains more challenging to characterize. However, in real-world situations, reconnection occurs in three dimensions, which makes it essential to be able to detect and analyse 3D magnetic reconnection, as well. In this thesis, we examine what potential signs of 3D magnetic reconnection can be identified from the topological elements of a magnetic vector field. To compute the topological elements, we use the Visualization Toolkit (VTK) Python package. The topology characterizes the behaviour of the vector field, and it may reveal potential reconnection sites, where the topological elements can change as a result of magnetic field lines reconnecting. The magnetic field data used in this thesis is from a simulation of the nightside magnetosphere produced using Vlasiator. The contributions of this thesis include analysis of the topological features of 3D magnetic reconnection and topological representations of nightside reconnection conditions to use in potential future machine learning approaches. In addition, a modified version of the VTK function for computing the critical points of the topology is created with the purpose of gearing it more towards magnetic vector fields instead of vector fields in general.
  • Kivivuori, Eve (2023)
    In this thesis, we discuss the Relative Lempel-Ziv (RLZ) lossless compression algorithm, our implementation of it, and the performance of RLZ in comparison to more traditional lossless compression programs such as gzip. Like the LZ77 compression algorithm, the RLZ algorithm compresses its input by parsing it into a series of phrases, which are then encoded as a position+length number pair describing the location of the phrase within the text. Unlike ordinary LZ77, where these pairs refer to earlier points in the same text and thus decompression must happen sequentially, in RLZ the pairs point to an external text called the dictionary. The benefit of this approach is faster random access to the original input given its compressed form: with RLZ, we can rapidly (in linear time with respect to the compressed length of the text) begin decompression from anywhere. With non-repetitive data, such as the text of a single book, website, or one version of a program's source code, RLZ tends to perform poorer than traditional compression methods, both in terms of compression ratio and in terms of runtime. However, with very similar or highly repetitive data, such as the entire version history of a Wikipedia article or many versions of a genome sequence assembly, RLZ can compress data better than gzip and approximately as well as xz. Dictionary selection requires care, though, as compression performance relies entirely on it.
  • Vuorenkoski, Lauri (2024)
    There are two primary types of quantum computers: quantum annealers and circuit model computers. Quantum annealers are specifically designed to tackle particular problems, as opposed to circuit model computers, which can be viewed as universal quantum computers. Substantial efforts are underway to develop quantum-based algorithms for various classical computational problems. The objective of this thesis is to implement algorithms for solving graph problems using quantum annealer computers and analyse these implementations. The aim is to contribute to the ongoing development of algorithms tailored for this type of machine. Three distinct types of graph problems were selected: all pairs shortest path, graph isomorphism, and community detection. These problems were chosen to represent varying levels of computational complexity. The algorithms were tested using the D-Wave quantum annealer Advantage system 4.1, equipped with 5760 qubits. D-Wave provides a cloud platform called Leap and a Python library, Ocean tools, through which quantum algorithms can be designed and run using local simulators or real quantum computers in the cloud. Formulating graph problems to be solved on quantum annealers was relatively straightforward, as significant literature already contains implementations of these problems. However, running these algorithms on existing quantum annealer machines proved to be challenging. Even though quantum annealers currently boast thousands of qubits, algorithms performed satisfactorily only on small graphs. The bottleneck was not the number of qubits but rather the limitations imposed by topology and noise. D-Wave also provides hybrid solvers that utilise both the Quantum Processing Unit (QPU) and CPU to solve algorithms, which proved to be much more reliable than using a pure quantum solver.
  • Kopio, Ville (2023)
    Coupled empowerment maximization (CEM) is an action selection policy for artificial agents. It utilizes the empowerment values of different agents in order to determine the best possible action to take. Empowerment quantifies the potential an agent has to impact the world around it. For example, an agent in an open field has a higher empowerment compared to an agent that is locked in a cage as in an open field the agent has a higher freedom of movement. This kind of action selection policy does not rely on the agent behavior to be explicitly programmed which makes it particularly promising as a non-player character policy for procedurally generated video games. To research the feasibility of CEM agents in practice, they should be studied in a large variety of different situations and games. Some studies have already been performed with a CEM agent that is implemented in the Python programming language. The agent ran in small game environments built on top of the Griddly game engine, but the computational performance of the agent has not been the focus. Scaling the experiments to larger environments using the old implementation is not feasible as conducting the experiments would take an enormous amount of time. Thus, the focus in this thesis is on lowering the time complexity of the agent so that there are more avenues for further research. This is reached with a new CEM agent implementation that (1) has a more modular architecture making future changes easier, (2) simulates future game states with an improved forward model which keeps track of already visited game states, and (3) uses an optimized Griddly version which has improved environment cloning performance. Our approach is around 200 times faster compared to the old implementation using environments and parametrization that potentially are used in future quantitative and qualitative experiments. The old implementation also has some bugs that are now resolved in the new implementation.
  • Kemppainen, Esa (2020)
    NP-hard optimization problems can be found in various real-world settings such as scheduling, planning and data analysis. Coming up with algorithms that can efficiently solve these problems can save various rescources. Instead of developing problem domain specific algorithms we can encode a problem instance as an instance of maximum satisfiability (MaxSAT), which is an optimization extension of Boolean satisfiability (SAT). We can then solve instances resulting from this encoding using MaxSAT specific algorithms. This way we can solve instances in various different problem domains by focusing on developing algorithms to solve MaxSAT instances. Computing an optimal solution and proving optimality of the found solution can be time-consuming in real-world settings. Finding an optimal solution for problems in these settings is often not feasible. Instead we are only interested in finding a good quality solution fast. Incomplete solvers trade guaranteed optimality for better scalability. In this thesis, we study an incomplete solution approach for solving MaxSAT based on linear programming relaxation and rounding. Linear programming (LP) relaxation and rounding has been used for obtaining approximation algorithms on various NP-hard optimization problems. As such we are interested in investigating the effectiveness of this approach on MaxSAT. We describe multiple rounding heuristics that are empirically evaluated on random, crafted and industrial MaxSAT instances from yearly MaxSAT Evaluations. We compare rounding approaches against each other and to state-of-the-art incomplete solvers SATLike and Loandra. The LP relaxation based rounding approaches are not competitive in general against either SATLike or Loandra However, for some problem domains our approach manages to be competitive against SATLike and Loandra.
  • Warro, Olli (2023)
    In many real-world problems, the task is to find an optimal solution within a finite set of solutions. Many of the problems, also known as combinatorial optimization problems, are NPhard. In other words, finding an optimal solution for the problems is computationally difficult. However, being important for many real-world applications, there is a demand for efficient ways to solve the problems. One approach is the declarative approach, where the problems are first encoded into a mathematical constraint language. Then, the encoded problem instance is solved by an algorithm developed for that constraint language. In this thesis, we focus on declarative pseudo-Boolean optimization (PBO). PBO is the set of integer programs (IP) where the variables can only be assigned to 0 or 1. For many real-world applications, finding an optimal solution is too time-consuming. Instead of finding an optimal solution, incomplete methods attempt to find good enough solutions in a given time limit. To the best of our knowledge, there are not many incomplete algorithms developed specifically for PBO. In this thesis, we adapt an incomplete method developed for the maximum satisfiability problem to PBO. In the adapted algorithm, which we call LS-ORACLE-PBO, a given PBO instance is solved using a form of local search that utilizes a pseudo-Boolean decision oracle when moving from one solution to another. We implement and empirically compare LS-ORACLE-PBO to another recent incomplete PBO algorithm called LS-PBO. The results show that, in general, our implementation is not competitive against LS-PBO. However, for some problem instances, our implementation provides better results than LS-PBO.
  • Conati, Ari (2023)
    Judgment aggregation (JA) offers a generic formal framework for modeling various settings involving information aggregation by social choice mechanisms. For many judgment aggregation rules, computing collective judgments is computationally notoriously hard. The central outcome determination problem, in particular, is often complete for higher levels of the polynomial hierarchy. This complexity barrier makes it challenging to develop practical, generic algorithmic approaches to outcome determination. In this work, we develop practical algorithms for outcome determination by harnessing Boolean satisfiability (SAT) based solvers as the underlying reasoning engines, under a range of the most central JA rules. For the Kemeny, Slater, MaxHamming, Young, and Dodgson rules, we detail a direct approach based on maximum satisfiability (MaxSAT) solving, using propositional logic as a declarative language. For the Reversal scoring, Condorcet, Ranked agenda, and LexiMax rules, we develop iterative SAT-based algorithms, including algorithms based on the counterexample-guided abstraction refinement (CEGAR) paradigm. The procedures we develop for these settings make use of recent advances in incremental MaxSAT solving and preferential SAT-based reasoning. We provide an open-source implementation of the algorithms, and empirically evaluate them using both real-world and synthetic preference data. We compare the performance of our implementation to a recent approach which makes use of declarative solver technology for answer set programming (ASP). The results demonstrate that our SAT-based approaches scale significantly beyond the reach of the ASP-based algorithms for all of the judgment aggregation rules considered.
  • Hauhio, Iikka (2022)
    Kielimallit ovat tilastollisia malleja, jotka ennustavat kuinka todennäköinen annettu jono sanoja on. Kielimallin tuottamasta jakaumasta on mahdollista arpoa sanoja, lauseita ja kokonaisia tekstejä. Viime aikoina kielimalleja on onnistuneesti käytetty taiteen, kuten tarinoiden, runojen ja musiikkikappaleiden generointiin. Tutkijat ovat kutsuneet kielimallien tuottamia tekstejä erinomaisiksi, mutta samalla niiden luovuutta on myös kyseenalaistettu. Tutkielmassa esittelen riittävän kriteeristön kielimallien luovuuden arviointiin. Kriteereiksi määrittelen teosten arvokkuus ja uutuus sekä kielimallin ennustamattomuus. Tutkin kriteerit täyttävän kielimallin luovia tekoja FACE-mallin avulla. Poissuljen sen mahdollisuuden, että kaikki luovat teot olisivat mallin kouluttajan tai lähdeaineston tai koulutusalgoritmin tekijöiden tekemiä. Lopputulokseni on, että kriteerit täyttävä kielimalli tekee useita luovia tekoja. Kriteerien toteutumisen mittaaminen voi olla käytännössä haastavaa, sillä kriteerien tarkka määrittely on tehtävä tapauskohtaisesti, eikä niiden mittaamiseen ole yhtä selkeää tapaa.