Browsing by study line "Algoritmer"

Now showing items 41-60 of 64

Knowledge Distillation with Multiple Teachers for NLP

Wichmann, Nic (2022)

Building a Neural Language Model from scratch involves a big number of different design decisions. You need to decide, among other things, the network structure, input and output encodings, regularization methods and optimization algorithm. In addition, the number of options for each decision is constantly increasing due to new ones being invented. The trend when it comes to Neural Language Models is also to simply increase the number of parameters in the model in order to increase the performance of inference. With more and more parameters to learn, a growing amount of expensive hardware is needed to teach the models. A method called Knowledge Distillation can be used to start from an existing, already trained, Neural Language Model instead. The existing model is used as a teacher in a way that aligns the knowledge of the student model with it. We build a student Neural Language Model that uses multiple teacher models. In this scenario each teacher model is seen as an expert in a specific language. The student model is taught all the languages of the teacher models. In our experiment we use four existing language models, each specialized in a different language. We train the student model using Knowledge Distillations with some additional techniques which enable using multiple teachers. We use the XNLI benchmark to show that the student model is able to successfully learn all of the languages.
Line-segment insertion in constrained triangulations with applications in path-planning

Mäkinen, Pyry (2023)

The two-dimensional path-planning problem involves determining the shortest path between two points on a plane while adhering to a set of polygonal constraints. In this thesis, we show how the path-planning problem in dynamically updated environments gives rise to a need for an edge-flipping based line-segment insertion procedure for constrained Delaunay triangulations, and present an efficient algorithm for the problem. The algorithm operates on convex vertices situated along the boundary of the channel of triangles that intersect with the edge that is to be inserted. We show that such vertices can be removed from the channel using a finite series of edge-flip operations, and that the repeated application of the vertex removal procedure results in the insertion of the requested edge in the triangulation. Furthermore, we present a working sample implementation of the resulting algorithm.
Machine Learning Monitoring and Maintenance : A Multivocal Literature Review

Nguyen, Thang (2023)

Machine learning (ML) systems are often deployed in complex, dynamic and uncertain environments, where their performance can degrade over time due to changes in data distribution or user behaviour. Monitoring and maintenance are essential processes to ensure the reliability and validity of ML systems in production. However, these processes pose many challenges for ML practitioners, such as choosing appropriate metrics, detecting anomalies, and deciding when to retrain models. In this thesis, we conduct a multivocal literature review to investigate the current state-of-the-art and best practices for ML model monitoring and maintenance. We aim to provide a comprehensive and systematic overview of the existing methods, tools, and challenges for ML model monitoring and maintenance, as well as to identify the gaps and future directions for research in this area. Through an examination of 28 sources, we made insights into common challenges of ML models post-deployment including but not all: concept drift, data pipeline and input data integrity, and complex models feedback loop. The corresponding monitoring and maintenance methods to mitigate some of those challenges were also extracted. In addition, we explored overall trends and demographics of the topic which show an increasing interest, as well as demonstrating how grey literature can provide practical and diverse experiences and opinions from industry practitioners.
Methods for investigating the external and internal validity of machine learned signals

Hrin, Adam (2023)

Understanding Machine Learning models’ behaviour is becoming increasingly important as models are growing in complexity. This thesis proposes a framework for validating machine learned signals using performance metrics and model explainability tools, applied to the context of Digital Humanities and Social Sciences. The framework allows for investigation whether the real-world problem that the model tries to represent is well-defined and whether the model accurately captures the phenomena at hand. Explainability techniques such as SHAP, LIME and Gradient-based methods have been used. These produce feature importance scores that the model bases its decisions on. The cases presented in this thesis are related to the research in Computational History and Historical Discourse Analysis with High Performance Computing. The subject of analysis is the large language model BERT fine-tuned on Eighteenth Century Collections Online (ECCO) documents that classifies books into genres. Investigating the performance of the classifier with precision-recall curves suggests that the class signals might be overlapping and not clearly delineated. Further results do not suggest that the noise elements present in the data caused by the OCR digitising process have significant importance for the decision making of the model. The explainability techniques helped uncover the model’s inner workings by showing that the model gets its signal mostly from the beginnings of samples. In a proxy task, a simpler linear model was trained to perform a projection from keywords to genres and showed inconsistency in the explainability method. Different subsets of data have been investigated as given by cells of a confusion matrix, the confidence in prediction probability or additional metadata. Investigating individual samples allows for qualitative analysis as well as more detailed signal understanding.
Mixture of experts for image classification

Jin, Qianyue (2021)

Mixture of Experts (MoE) is a machine learning tool that utilizes multiple expert models to solve machine learning tasks. By combining perspectives of individual experts using a product of their output, the overall system can produce comprehensive decisions. The hope is that by doing this, the individual experts can focus on modeling different aspects of the data. In this thesis, we study MoEs in the context of deep learning and image classification using empirical comparisons. The different datasets, gating and expert networks, numbers of experts and objective functions for the gate are controlled in the experiments, and the performance (classification accuracy) and the behavior (how the gate distributes the data over the experts) are obtained as the results. Based on the result that the experimented mixtures of networks are performing mostly on par with the single network baseline, we conclude that either the mixture of experts is not suitable to be learned for the image classification tasks or it requires some different engineering in the architecture and/or optimization algorithm selection.
Modeling knowledge states in language learning

Vu, Anh-Duc (2020)

Artificial intelligence (AI) is being increasingly applied in the field of intelligent tutoring systems (ITS). Knowledge space theory (KST) aims to model the main features of the process of learning new skills. Two basic components of ITS are the domain model and the student model. The student model provides an estimation of the state of the student’s knowledge or proficiency, based on the student’s performance on exercises. The domain model provides a model of relations between the concepts/skills in the domain. To learn the student model from data, some ITSs use the Bayesian Knowledge Tracing (BKT) algorithm, which is based on hidden Markov models (HMM). This thesis investigates the applicability of KST to constructing these models. The contribution of the thesis is twofold. Firstly, we learn the student model by a modified BKT algorithm, which models forgetting of skills (which the standard BKT model does not do). We build one BKT model for each concept. However, rather than treating a single question as a step in the HMM, we treat an entire practice session as one step, on which the student receives a score between 0 and 1, which we assume to be normally distributed. Secondly, we propose algorithms to learn the “surmise” graph—the prerequisite relation between concepts—from “mastery data,” estimated by the student model. The mastery data tells us the knowledge state of a student on a given concept. The learned graph is a representation of the knowledge domain. We use the student model to track the advancement of students, and use the domain model to propose the optimal study plan for students based on their current proficiency and targets of study.
Modelling a Distributed Data Acquisition System

Lång, John (2021)

This thesis discusses the formal modelling and verification of certain non-real-time aspects of correctness of a mission-critical distributed software system known as the ALICE Data Point Service (ADAPOS). The domain of this distributed system is data acquisition from a particle detector control system in experimental high energy particle physics research. ADAPOS is part of the upgrade effort of A Large Ion Collider Experiment (ALICE) at the European Organisation for Nuclear Research (CERN), near Geneva in France/Switzerland, for the third run of the Large Hadron Collider (LHC). ADAPOS is based on the publicly available ALICE Data Point Processing (ADAPRO) C++14 framework and works within the free and open source GNU/Linux ecosystem. The model checker Spin was chosen for modelling and verifying ADAPOS. The model focuses on the general specification of ADAPOS. It includes ADAPOS processes, a load generator process, and rudimentary interpretations for the network protocols used between the processes. For experimenting with different interpretations of the underlying network protocols and also for coping with the state space explosion problem, eight variants of the model were developed and studied. Nine Linear Temporal Logic (LTL) properties were defined for all those variants. Large numbers of states were covered during model checking even though the model turned out to have a reachable state space too large to fully exhaust. No counter-examples were found to safety properties. A significant amount of evidence hinting that ADAPOS seems to be safe, was obtained. Liveness properties and implementation-level verification among other possible research directions remain open.
Monocular 3D Object Detection And Tracking in Industrial Settings

Leinonen, Matti (2021)

3D Object detection and tracking are computer vision methods used in many applications. It is necessary for autonomous vehicles and robots to be able to reliably extract 3D localization information about objects in their environment to operate safely. Currently most 3D object detection and tracking algorithms use high quality LiDAR-sensors which are very expensive. This is why research into methods that use cheap monocular camera images as inputs is an active field in computer vision research. Most current research into monocular 3D object detection and tracking is focused in autonomous driving. This thesis investigates how well current monocular methods are suited for use in industrial settings where the environment and especially the camera perspective can be very different compared to what it is in an automobile. This thesis introduces some of the most used 3D object detection and tracking methods and techniques and tests one detection method on a dataset where the environment and the point of view differs from what it would be in autonomous driving. This thesis also analyzes the technical requirements for a detection and tracking system that could be be used for autonomous robots in an industrial setting and what future research would be necessary to develop such a system.
Multi-task learning in Computer Vision

Kutvonen, Konsta (2020)

With modern computer vision algorithms, it is possible to solve many different kinds of problems, such as object detection, image classification, and image segmentation. In some cases, like in the case of a camera-based self-driving car, the task can't yet be adequately solved as a direct mapping from image to action with a single model. In such situations, we need more complex systems that can solve multiple computer vision tasks to understand the environment and act based on it for acceptable results. Training each task on their own can be expensive in terms of storage required for all weights and especially for the inference time as the output of several large models is needed. Fortunately, many state-of-the-art solutions to these problems use Convolutional Neural Networks and often feature some ImageNet backbone in their architecture. With multi-task learning, we can combine some of the tasks into a single model, sharing the convolutional weights in the network. Sharing the weights allows for training smaller models that produce outputs faster and require less computational resources, which is essential, especially when the models are run on embedded devices with constrained computation capability and no ability to rely on the cloud. In this thesis, we will present some state-of-the-art models to solve image classification and object detection problems. We will define multi-task learning, how we can train multi-task models, and take a look at various multi-task models and how they exhibit the benefits of multi-task learning. Finally, to evaluate how training multi-task models changes the basic training paradigm and to find what issues arise, we will train multiple multi-task models. The models will mainly focus on image classification and object detection using various data sets. They will combine multiple tasks into a single model, and we will observe the impact of training the tasks in a multi-task setting.
Non-diagonal Metrics for Efficient Stochastic Gradient Geometric MCMC

Yu, Hanlin (2023)

Bayesian inference tells us how we can incorporate information from the data into the parameters. In practice, this can be carried out using Markov Chain Monte Carlo (MCMC) methods which draw approximate samples from the posterior distribution, but using them for complex models like neural networks remains challenging. The most commonly used methods in these cases are Stochastic Gradient Markov Chain Monte Carlo (SGMCMC) methods based on mini-batches. This thesis presents improvements for this family of algorithms. We focus on the specific algorithm of Stochastic Gradient Riemannian Langevin Dynamics (SGRLD). The core idea of it is to perform sampling on a suitably defined Riemannian manifold characterized by a Riemannain metric, which allows utilizing information of the curvature of the target distribution for improved efficiency. While SGRLD has nice theoretical properties, for arbitrary Riemannian geometries, the algorithm is slow due to the need of repeatedly calculating the inverse and inverse square root of the metric, which is high-dimensional for large neural networks. For the task of efficiently sampling from an arbitrary neural network with a large number of parameters, the current algorithms overcome the issue by using diagonal metrics, which enable fast computations, but unfortunately lose some advantages of the Riemannian formulation. This thesis proposes the first computationally efficient algorithms with non-diagonal metrics applicable for training an arbitrary large neural network for this task. We propose two alternative metrics, and name the resulting algorithms MongeSGLD and ShampooSGLD, respectively. We demonstrate that with both of them we can achieve improvements upon the existing ones in terms of convergence, especially when using neural networks with priors that result in good performances. We also propose to use the average curvature of the obtained samples as an evaluation measure.
On Unsupervised Domain Adaptation of Convolutional Neural Networks for Semantic Segmentation

Laakso, Joosua (2023)

Semantic segmentation is a computer vision problem of partitioning an image based on what type of an object each part represents, with pixel-level precision. Producing labeled datasets to train deep learning models for semantic segmentation can be laborious due to the demand for pixel-level precision. On the other hand, a deep learning model trained on one dataset might have inferior performance when applied on another dataset, depending on how different those datasets are. Unsupervised domain adaptation attempts to narrow this performance gap by adapting the model to the other dataset, even if ground-truth labels for that dataset are not available. In this work, we review some of the pre-existing methods for unsupervised domain adaptation in semantic segmentation. We then present our own efforts to develop novel methods for the problem. Those include a new type of loss function for unsupervised output shaping, unsupervised training of the model backbone based on the feature statistics and a method for unsupervised adaptation of the model backbone using an auxiliary network that attempts to mimic the gradients of supervised training. We present empirical results of the performance of these methods. We additionally present our findings on the effects of changes in the statistics of the batch normalization layers on domain adaptation performance.
Optimal allotment of student scholarships

Pulsa, Veikka (2023)

The scholarship allotment problem describes the goal of strategically offering scholarships to prospective students of a university in a way that optimises the expected return for that investment. The goal of this thesis is to formulate the scholarship allotment problem in multiple variations of increasing complexity while also introducing algorithms to solve those variations optimally as efficiently as possible. The thesis also offers some insight into the way more complex variations and generalisations heighten the difficulty of finding an optimal solution in a reasonable amount of time. The main focus and the main tool used to tackle these problems is the classic knapsack algorithm and different variations of it, like multiple-choice knapsack and multidimensional knapsack. In addition to the theoretical side, the thesis contains an empirical study into the performance and feasibility of the algorithms introduced. Concrete implementations of the algorithms discussed are all available on a public GitHub repository online: https://github.com/SirVeggie/scholarship-allotment.
Postings List Compression with Run-length and Zombit Encodings

Siipola, Ossi (2022)

Inverted indices is a core index structure for different low-level structures, like search engines and databases. It stores a mapping from terms, numbers etc. to list of location in document, set of documents, database, table etc. and allows efficient full-text searches on indexed structure. Mapping location in the inverted indicies is usually called a postings list. In real life applications, scale of the inverted indicies size can grow huge. Therefore efficient representation of it is needed, but at the same time, efficient queries must be supported. This thesis explores ways to represent postings lists efficiently, while allowing efficient nextGEQ queries on the set. Efficient nextGEQ queries is needed to implement inverted indicies. First we convert postings lists into one bitvector, which concatenates each postings list's characteristic bitvector. Then representing an integer set efficiently converts to representing this bitvector efficiently, which is expected to have long runs of 0s and 1s. Run-length encoding of bitvector have recently led to promising results. Therefore in this thesis we experiment two encoding methods (Top-k Hybrid coder, RLZ) that encode postings lists via run-length encodes of the bitvector. We also investigate another new bitvector compression method (Zombit-vector), which encodes bitvectors by finding redundancies of runs of 0/1s. We compare all encoding to current state-of-the-art Partitioned Elisa-Fano (PEF) coding. Compression results on all encodings were more efficient than the current state-of-the-art PEF encoding. Zombit-vector nextGEQ query results were slighty more efficient than PEF's, which make it more attractive with bitvectors that have long runs of 0s and 1s. More work is needed with Top-k Hybrid coder and RLZ, so that those encodings nextGEQ can be compared to Zombit-vector and PEF.
Practical Aspects of Implementing a Suffix Array-based Lempel-Ziv Data Compressor

Utoslahti, Aki (2022)

Lempel-Ziv factorization of a string is a fundamental tool that is used by myriad data compressors. Despite its optimality regarding the number of produced factors, it is rarely used without modification, for reasons of its computational cost. In recent years, Lempel-Ziv factorization has been a busy research subject, and we have witnessed the state-of-the-art being completely changed. In this thesis, I explore the properties of the latest suffix array-based Lempel-Ziv factorization algorithms, while I experiment with turning them into an efficient general-purpose data compressor. The setting of this thesis is purely exploratory, guided by reliable and repeatable benchmarking. I explore all aspects of the suffix array-based Lempel-Ziv data compressor. I describe how the chosen factorization method affects the development of encoding and other components of a functional data compressor. I show how the chosen factorization technique, together with capabilities of modern hardware, allows determining the length of the longest common prefix of two strings over 80% faster compared to the baseline approach. I also present a novel approach to optimizing the encoding cost of the Lempel-Ziv factorization of a string, i.e., bit-optimality, using a dynamic programming approach to the Single-Source Shortest Path problem. I observed that, in its current state, the process of suffix array construction is a major computational bottleneck in suffix array-based Lempel-Ziv factorization. Additionally, using a suffix array to produce a Lempel-Ziv factorization leads to optimality regarding the number of factors, which does not necessarily correspond to bit-optimality. Finally, a comparison with common third-party data compressors revealed that relying exclusively on Lempel-Ziv factorization prevents reaching the highest compression efficiency. For these reasons, I conclude that current suffix array-based Lempel-Ziv factorization is unsuitable for general-purpose data compression.
Predicting Field Value with Interpretable Models

Kailamäki, Kalle (2022)

This thesis explores predicting current prices of individual agricultural fields in Finland based on historical data. The task is to predict field prices accurately with the data we have available while keeping model predictions interpretable and well explainable. The research question is to find which out of several different models we try out is most optimal for the task. The motivation behind this research is the growing agricultural land market and the lack of publicly available field valuation services that can assist market participants to determine and identify reasonable asking prices. Previous studies on the topic have used standard statistics to establish relevant factors that affect field prices. Rather than creating a model whose predictions can be used on their own in every case, the primary purpose of previous works has indeed been to identify information that should be considered in manual field valuation. We, on the other hand, focus on the predictive ability of models that do not require any manual labor. Our modelling approaches focus mainly but not exclusively on algorithms based on Markov–Chain Monte Carlo. We create a nearest neighbors model and four hierarchical linear models of varying complexity. Performance comparisons lead us to recommend a nearest neighbor -type model for this task.
Programmatic and Efficient GPU Acceleration of CPU Stencil Codes

Puro, Touko (2023)

GPUs have become an important part of large-scale and high-performance physics simulations, due to their superior performance [11] and energy effiency [23] over CPUs. This thesis examines how to accelerate an existing CPU stencil code, that is originally parallelized through message passing, with GPUs. Our first research question is how to utilize the CPU cores alongside GPUs when the bulk of the computation is happening on GPUs. Secondly, we investigate how to address the performance bottleneck of data movement between CPU and GPU when there is a need to perform computational tasks originally intended to be executed on CPUs. Lastly, we investigate how the performance bottleneck of communication between processes can be alleviated to make better use of the available compute resources. In this thesis we approach these problems by building a preprocessor designed for making an existing CPU codebase suitable for GPU acceleration and the communication bottleneck is alleviated through extending a existing GPU oriented library Astaroth. We improve its task scheduling system and extend its own domain specific language (DSL) for stencil computations. Our solutions are demonstrated by making an existing CPU based astrophysics simulation code Pencil Code [4] suitable for GPU acceleration with the use of our preprocessor and the Astaroth library. Our results show that we are able to utilize CPU cores to perform useful work alongside the GPUs. We also show that we are able to circumvent the CPU-GPU data movement bottleneck by making code suitable for offloading through OpenMP offloading and code translation to GPU code. Lastly, we show that in certain cases Astaroth’s communication performance is increased by around 21% through smaller message sizes — with the added benefit of 14% lower memory usage, which corresponds to around 18% improvement in overall performance. Furthermore, we show benefits of the improved tasking and a identified memoryperformance trade-off.
Pseudo-Boolean Optimization by Implicit Hitting Sets

Smirnov, Pavel (2022)

There are many computationally difficult problems where the task is to find a solution with the lowest cost possible that fulfills a given set of constraints. Such problems are often NP-hard and are encountered in a variety of real-world problem domains, including planning and scheduling. NP-hard problems are often solved using a declarative approach by encoding the problem into a declarative constraint language and solving the encoding using a generic algorithm for that language. In this thesis we focus on pseudo-Boolean optimization (PBO), a special class of integer programs (IP) that only contain variables that admit the values 0 and 1. We propose a novel approach to PBO that is based on the implicit hitting set (IHS) paradigm, which uses two separate components. An IP solver is used to find an optimal solution under an incomplete set of constraints. A pseudo-Boolean satisfiability solver is used to either validate the feasibility of the solution or to extract more constraints to the integer program. The IHS-based PBO algorithm iteratively invokes the two algorithms until an optimal solution to a given PBO instance is found. In this thesis we lay out the IHS-based PBO solving approach in detail. We implement the algorithm as the PBO-IHS solver by making use of recent advances in reasoning techniques for pseudo-Boolean constraints. Through extensive empirical evaluation we show that our PBO-IHS solver outperforms other available specialized PBO solvers and has complementary performance compared to classical integer programming techniques.
Reasoning over Assumption-Based Argumentation Frameworks via Answer Set Programming

Lehtonen, Tuomo (2019)

Formal argumentation is a vibrant research area within artificial intelligence, in particular in knowledge representation and reasoning. Computational models of argumentation are divided into abstract and structured formalisms. Since its introduction in 1995, abstract argumentation, where the structure of arguments is abstracted away, has been much studied and applied. Structured argumentation formalisms, on the other hand, contain the explicit derivation of arguments. This is motivated by the importance of the construction of arguments in the application of argumentation formalisms, but also makes structured formalisms conceptually and often computationally more complex than abstract argumentation. The focus of this work is on assumption-based argumentation (ABA), a major structured formalism. Specifically we address the relative lack of efficient computational tools for reasoning in ABA compared to abstract argumentation. The computational efficiency of ABA reasoning systems has been markedly lower than the systems for abstract argumentation. In this thesis we introduce a declarative approach to reasoning in ABA via answer set programming (ASP), drawing inspiration from existing tools for abstract argumentation. In addition, we consider ABA+, a generalization of ABA that incorporates preferences into the formalism. The complexity of reasoning in ABA+ is higher than in ABA for most problems. We are able to extend our declarative approach to some ABA+ reasoning problems. We show empirically that our approach vastly outperforms previous reasoning systems for ABA and ABA+.
Refined Core Relaxations for Core-Guided Maximum Satisfiability Algorithms

Ihalainen, Hannes (2022)

The so-called declarative approach has proven to be a viable paradigm for solving various real-world NP-hard optimization problems in practice. In the declarative approach, the problem at hand is encoded using a mathematical constraint language, and an algorithm for the specific language is employed to obtain optimal solutions to an instance of the problem. One of the most viable declarative optimization paradigms of the last years is maximum satisfiability (MaxSAT) with propositional logic as the constraint language. So-called core-guided MaxSAT algorithms are arguably one of the most effective MaxSAT-solving paradigms in practice today. Core-guided algorithms iteratively detect and rule out (relax) sources of inconsistencies (so-called unsatisfiable cores) in the instance being solved. Especially effective are recent algorithmic variants of the core-guided approach which employ so-called soft cardinality constraints for ruling out inconsistencies. In this thesis, we present a structure-sharing technique for the cardinality-based core relaxation steps performed by core-guided MaxSAT solvers. The technique aims at reducing the inherent growth in the size of the propositional formula resulting from the core relaxation steps. Additionally, it enables more efficient reasoning over the relationships between different cores. We empirically evaluate the proposed technique on two different core-guided algorithms and provide open-source implementations of our solvers employing the technique. Our results show that the proposed structure-sharing can improve the performance of the algorithms both in theory and in practice.
Rental Price Prediction on Greater Helsinki Apartments

Karanko, Lauri (2022)

Determining the optimal rental price of an apartment is typically something that requires a real estate agent to gauge the external and internal features of the apartment, and similar apartments in the vicinity of the one being examined. Hedonic pricing models that rely on regression are commonplace, but those that employ state of the art machine learning methods are still not widespread. The purpose of this thesis is to investigate an optimal machine learning method for predicting property rent prices for apartments in the Greater Helsinki area. The project was carried out at the behest of a client in the real estate investing business. We review what external and inherent apartment features are the most suitable for making predictions, and engineer additional features that result in predictions with the least error within the Greater Helsinki area. Combining public demographic data from Tilastokeskus (Statistics Finland) and data from the online broker Oikotie Oy gives rise to a model that is comparable to contemporary commercial solutions offered in Finland. Using inverse distance weighting to interpolate and generate a price for the coordinates of the new apartment was also found to be crucial in developing an performant model. After reviewing models, the gradient boosting algorithm XGBoost was noted to fare the best for this regression task.

Now showing items 41-60 of 64

Browsing by study line "Algoritmer"

Yhteystiedot

HELSINGIN YLIOPISTO