Skip to main content
Login | Suomeksi | På svenska | In English

Faculty of Science


Recent Submissions

  • Ovaskainen, Osma (2024)
    Abstract Objective The objective of this thesis is to create methods to transform the most accessible digitalized version of an apartment, the floor plan, into a format that can be analyzed by statistical modeling and use the created data to find if there are any spatial or temporal effects in the geometry of apartments floor plans. Methods The first part of the thesis was created using a mix of computer vision image manipulation methods combined with text recognition. The second portion was performed using a oneway ANOVA model. Results With the computer vision portion, we were able to successfully classify a portion of the data, however, there is a lot of room for improvement due to the recognition had a lot of room for improvement. From the created data, we were able to identify some key differences concerning our parameters, location, and year of construction. The analysis however sufferers from a quite limited dataset, where few housing corporations play a large role in the final results, so it would be wise to repeat this experiment with a more comprehensive dataset for more accurate results
  • Savola, Mikko (2024)
    In this study we use mutual information to characterise statistical dependencies of seed and rel- ativistic electron fluxes in the Earth’s radiation belts on ultra-low frequency (ULF) wave power measured on the ground and at geostationary orbit. The benefit of mutual information, in com- parison to measures such as the Pearson correlation, lies in its capacity to distinguish non-linear dependencies from linear ones. We replicate the methodology in Simms et al. [2014] andwe also calculatethe conditional mutual information, CMI, between ULF spectral power and the electron fluxes, when conditioned on the solar wind speed V . The results of the replication are similar to those presented in Simms et al. [2014] and show low to moderate correlations between ULF Pc5 waves (2-7 mHz electromagnetic waves) and both relativistic and seed electron fluxes, with the correlations ranging from 0.18 to 0.65. The mutual information between ULF Pc5 spectral power and the relativistic electron flux is between 0.17 and 0.22 depending on whether it is evaluated for a storm’s main or recovery phase. The corresponding values of mutual information between ULF Pc5 spectral power and the seed electron flux are between 0.33 and 0.41. All the values of mutual information are statistically significant with a confidence interval of at least 8 standard deviations, except for the mutual information between ground-based ULF measurements and the relativistic electron flux. Highest values of CMI are obtained for roughly V > 600km/s, which also give the largest significance ratios for CMI, implying that under conditions with high solar wind speeds the dependence between ULF Pc5 spectral power and the electron fluxes at the outer radiation belts is higher than for lower solar wind speeds. The mutual information and CMI between the ULF spectral power and the seed electron fluxes is larger and up to twice as high as between ULF spectral power and the relativistic electron flux. The mutual information between average ULF spectral power and the peak electron flux after a storm is also higher than the regular mutual information, giving indication of a dependence whose timing might vary on a scale of days.
  • Diseth, Anastasia Chabounina (2024)
    Combinatorial optimization problems arise in many applications. Finding solutions that are as good as possible, ideally optimal, respect to given criteria is important. Additionally, many real-world combinatorial optimization problems are NP-hard. The so-called declarative approach to solving combinatorial optimization problems has proven to be successful in practice. In this work we focus on the the implicit hitting set-based (IHS) maximum satisfiability (MaxSAT) paradigm to solving combinatorial optimization problems declaratively. In the MaxSAT paradigm the problem at hand is formulated as a linear objective function to minimize subject to a set of constraints expressed in the language of propositional logic. In the IHS approach the problem is solved by alternating calls to two subroutines. An optimizer procedure computes optimal solutions over the variables in the objective function without the constraints available and a feasibility oracle verifies the solutions in terms of the constraints. In this work we study alternative divisions of constraints of a given problem formulation between the optimizer and the oracle. We allow the optimizer to compute solutions over any variables of the problem instance, thus extending the hitting set formulations of the IHS-based MaxSAT. We focus on two specific combinatorial optimization problems and existing MaxSAT encodings of these problems. The problems focus on are computing the treewidth of a graph and finding an optimal k-undercover Boolean matrix factorization. We have also extended a state-of-the-art IHS-based MaxSAT solver to support extended divisions of encodings and provide the implementation as open source.
  • Tulijoki, Juha-Pekka (2024)
    A tag is a freely chosen keyword that a user attaches to an item. Offering a simple, cheap, and natural way to describe content, tagging has become popular in contemporary web applications. The tag genome is a data structure that contains item-tag relevance scores, i.e., continuous scale numbers from 0 to 1 indicating how relevant a tag is for an item. For example, the tag romantic comedy has a relevance score of 0.97 for the movie Love Actually. With sufficient data, a tag genome dataset can be constructed for any domain. To the best of available knowledge, there are tag genome datasets for movies and books. The tag genome for movies is used in a movie recommender and for various purposes in recommender systems research, such as detecting filter bubbles and serendipity. Creating a diverse tag genome dataset requires an effective machine learning solution, as manual assessment of item-tag relevance scores is impractical. The current state-of-the-art solution, called TagDL, uses features extracted from user-generated tags, reviews, and ratings to employ a multilayer perceptron architecture to predict the item-tag relevance scores. This study aims to enhance TagDL by extracting more features from the embeddings of textual content, namely tags, user reviews, and item titles, using Bidirectional Encoder Representations from Transformers (BERT). The results show that features based on BERT embeddings have a potential positive impact on item-tag relevance score prediction. However, the results do not generalize to both tag genome datasets, improving the results only for the movie dataset. This may indicate that the new features have a stronger impact if the amount of available training data is smaller, as with the movie dataset. Moreover, this thesis discusses future work ideas and implementation possibilities.
  • Pelvo, Nasti (2024)
    Object detection and multi-object tracking are crucial components of computer vision systems aiming for comprehensive scene understanding and reliable autonomous decision making. While methods developed for visual input data are widely studied, they are susceptible to environmental factors such as poor lighting and weather conditions. Thermal imaging, on the other hand, is robust against most adversarial environmental conditions and thus presents an intriguing alternative to visual photography. Due to the characteristics of thermal images, current state-of-the-art object detection and tracking methods perform poorly when presented with thermal input. Open source thermal data for training large neural network models is not widely available: existing datasets are small and homogenenous, and the resulting models lack the generalizability required for their application on real world input data. The effect is especially relevant for transformer-based methods, which exhibit a lack of visual inductive bias and thus require large-scale training. This thesis presents the first in-depth literature review and experimental study into transformer-based object detection and tracking on challenging thermal and aerial data. By conducting an analysis on existing transformer-based multi-object tracking methods, we argue for the application of the joint detection and tracking paradigm, where multi-object tracking is treated as an end-to-end problem. Our experiments on two transformer-based multi-object tracking models confirm that fully exploiting multi-frame input can increase the stability of object detection and enforce robustness against the domain issues prevalent in thermal images. Due to the high training data requirement of transformers, the methods are, however, held back by the lack of open source training data. We thus introduce two novel data augmentation techniques which aim to supplement and diversify existing training data, and thus improve the transferability of detection and tracking methods between the visual and thermal domains.