Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by study line "no specialization"

Sort by: Order: Results:

  • Joosten, Rick (2020)
    In the past two decades, an increasing amount of discussions are held via online platforms such as Facebook or Reddit. The most common form of disruption of these discussions are trolls. Traditional trolls try to digress the discussion into a nonconstructive argument. One strategy to achieve this is to give asymmetric responses, responses that don’t follow the conventional patterns. In this thesis we propose a modern machine learning NLP method called ULMFiT to automatically detect the discourse acts of online forum posts in order to detect these conversational patterns. ULMFiT finetunes the language model before training its classifier in order to create a more accurate language representation of the domain language. This task of discourse act recognition is unique since it attempts to classify the pragmatic role of each post within a conversation compared to the functional role which is related to tasks such as question-answer retrieval, sentiment analysis, or sarcasm detection. Furthermore, most discourse act recognition research has been focused on synchronous conversations where all parties can directly interact with each other while this thesis looks at asynchronous online conversations. Trained on a dataset of Reddit discussions, the proposed model achieves a matthew’s correlation coefficient of 0.605 and an F1-score of 0.69 to predict the discourse acts. Other experiments also show that this model is effective at question-answer classification as well as showing that language model fine-tuning has a positive effect on both classification performance along with the required size of the training data. These results could be beneficial for current trolling detection systems.
  • Lange, Moritz Johannes (2020)
    In the context of data science and machine learning, feature selection is a widely used technique that focuses on reducing the dimensionality of a dataset. It is commonly used to improve model accuracy by preventing data redundancy and over-fitting, but can also be beneficial in applications such as data compression. The majority of feature selection techniques rely on labelled data. In many real-world scenarios, however, data is only partially labelled and thus requires so-called semi-supervised techniques, which can utilise both labelled and unlabelled data. While unlabelled data is often obtainable in abundance, labelled datasets are smaller and potentially biased. This thesis presents a method called distribution matching, which offers a way to do feature selection in a semi-supervised setup. Distribution matching is a wrapper method, which trains models to select features that best affect model accuracy. It addresses the problem of biased labelled data directly by incorporating unlabelled data into a cost function which approximates expected loss on unseen data. In experiments, the method is shown to successfully minimise the expected loss transparently on a synthetic dataset. Additionally, a comparison with related methods is performed on a more complex EMNIST dataset.
  • Jokinen, Olli (2024)
    The rise of large language models (LLMs) has revolutionized natural language processing, par- ticularly through transfer learning and fine-tuning paradigms that enhance the understanding of complex textual data. This thesis builds upon the concept of fine-tuning to improve the under- standing of Finnish Wikipedia articles. Specifically, a BERT-based language model is fine-tuned to create high-quality document representations from Finnish texts. The learned representations are applied to downstream tasks, where the model’s performance is evaluated against baseline models. This thesis draws on the SPECTER paper, published in 2020, which introduced a training frame- work for fine-tuning a general-purpose document embedder. SPECTER was trained using a document-level training objective that leveraged document link information. Originally, SPECTER was designed for scientific articles, utilizing citations between articles. The training instances con- sisted of triplets of query, positive, and negative papers, with the aim of capturing the semantic similarity of the documents. This work extends the SPECTER framework to Finnish Wikipedia data. While scientific articles have citations, Wikipedia’s cross-references are used to build a document graph that captures the relatedness between articles. Additionally, Wikipedia data is publicly available as a full data dump, making it an attractive choice for the dataset in this thesis. One of the objectives is to demonstrate the flexibility of the SPECTER framework on a new dataset that has a similar networked structure to that of scientific articles. The fine-tuned model can be used as a general-purpose tool for various tasks and applications; however, in this thesis, its performance is measured in topic classification and cross-reference ranking. The Transformer-based language model produces fixed-length embeddings, which are used as features in the topic classification task and as vectors to measure the L2 distance of article vectors in the cross-reference prediction task. This thesis shows that the proposed model, WikiSpecter, optimized with a document-level objective, outperformed baseline models in both tasks. The performance indicates that Finnish Wikipedia provides relevant cross-references that help the model capture relationships across a range of topics.
  • Rautsola, Iiro (2019)
    Multimodality imaging is an efficient, non-invasive method for investigation of molecular and cellular processes in vivo. However, the potential of multimodality imaging in plant studies is yet to be fully realized, largely due to the lack of research into suitable molecular tracers and instrumentation. Iodine has PET- and SPECT-compatible radioisotopes that have significant advantages over other radioisotopes applied in plant radioisotope imaging, and can be incorporated into small molecules via a variety of reactions. In this master’s thesis, a radioiodination method exploiting a novel, Dowex® H+-mediated addition of iodine for terminal alkynes was optimized and tested on two D-glucose analogues. The goal of the sugar analogue radioiodination was to develop a radioiodinated molecular tracer for plant carbohydrate metabolism studies. The parameters under optimization were activation Dowex® by HCl, reaction temperature, carrier amount, solvent, and evaporation of excess water. The most optimal results were achieved under the following conditions: Dowex® HCl-activated, reaction temperature 95 °C, amount of carrier 3.0 µmol of carrier, cyclohexanol as solvent, and excess water evaporated. The Dowex® approach was compared to electrophilic reactions with Chloramine T and Iodogen, and it was concluded that the Dowex® approach leads to superior radiochemical yields under the optimized conditions. The Dowex® method was successfully tested on the sugar analogues, resulting in a single main product at a satisfactory 50 – 56 % radiochemical yield. The main products were successfully characterized with NMR, and in addition the method was indicated to be regioselective. It is plausible that the developed method may be improved further in terms of radiochemical yield and molar activity, and that the method could prove to be a useful tool for developing novel radiodinated molecular tracers for plant studies.
  • Kangas, Pinja (2022)
    Sulfuric acid has a central role in atmospheric chemistry, as it is considered to have a significant contribution in cloud formation and acid rain. In the gas phase, hydrolysis of SO3 catalysed by a single water molecule is contemplated to be the primary pathway to form sulfuric acid in the atmosphere. However, in previous studies it has been calculated that when the hydrolysis reaction is catalysed by a formic acid (FA) molecule, the potential energy barrier is significantly lower than for the water molecule catalysed reaction. In this work, the role of dynamic and steric effects for both reactions were studied through ab initio molecular dynamics (AIMD) collision simulations. The simulations were done by either colliding FA or a water molecule with SO3-H2O complex or a water dimer with the SO3-molecule. Altogether 230 trajectories were calculated at PBE/6-311+G(2pd,2df) level of theory, 70 for the collision of a water dimer and SO3, and 80 for both the collision of a water molecule or FA with SO3-H2O. The collision of FA with SO3-H2O led to the formation of sulfuric acid in 5 % of the simulations, whereas for the collision of a water molecule with SO3-H2O the reaction does not occur within the simulation time. Additionally, the SO3-H2O-FA pre-reactive complex formed in the simulations is shown to be more stable, most likely due to a less constrained ring structure. The collision of a water dimer with SO3 most commonly leads to the formation of SO3-H2O, and either sticking or evading of the second water molecule of the dimer. Based on the simulation results, strictly in terms of dynamic and steric effects, the FA-catalysed mechanism seems to be favored over the H2O-catalysed one
  • Joensuu, Juhana (2022)
    Currency risk is an important yet neglected consideration for investors holding internationally diversified investment portfolios. The foreign exchange market is an extremely liquid and efficient market, with daily transaction volumes exceeding the equivalent of several trillion euros. International investors have to decide upon the level of exposure on various currency risks typically by hedging some or all of the underlying currency exposure with currency derivative contracts. The currency overlay refers to an approach where the aggregate currency exposure from the investment portfolio is managed with a separate derivatives strategy, aimed at improving the overall portfolio’s risk adjusted returns. In this thesis, we develop a novel systematic, data-driven approach to manage the currency risk of investors holding diversified bond-equity portfolios, accounting for both risk minimization and expected returns maximization objectives on the portfolio level. The model is based upon modern portfolio theory, leveraging findings from prior literature in covariance modelling and expected currency returns. The focus of this thesis is in ensuring efficient risk diversification through the use of accurate covariance estimates fed by high-frequency data on exchange rates, bonds and equity indexes. As to the expected returns estimate, we identify purchasing power parity (PPP) and carry signals as credible alternatives to improve the expected risk-adjusted returns of the strategy. A block bootstrap simulation methodology is used to conduct empirical tests on different specifications of the developed dynamic overlay model. We find that dynamic risk-minimizing strategies significantly decrease portfolio risk relative to either unhedged or fully hedged portfolios. Using high-freqency data based returns covariance estimates is likely to improve portfolio diversification relative to a simple daily data-based estimator. The empirical results are much less clear in terms of risk adjusted returns. We find tentative evidence that the tested dynamic strategies improve risk adjusted returns. Due to the limited data sample used in this study, the findings regarding expected returns are highly uncertain. Nevertheless, considering evidence from prior research covering much longer time-horizons, we expect that both the risk-minimizing and returns maximizing components of the developed model are likely to improve portfolio-level risk adjusted returns. We recommend using the developed model as an input to support the currency risk management decision for investors with globally diversified investment portfolios, along with other relevant considerations such as solvency or discretionary market views.
  • Pakkanen, Noora (2021)
    In Finland, the final disposal of spent nuclear fuel will start in the 2020s where spent nuclear fuel will be disposed 400-450 meters deep into the crystalline bedrock. Disposal will follow Swedish KBS-3 principle where spent nuclear fuel canisters will be protected by multiple barriers, which have been planned to prevent radionuclides´ migration to the surrounding biosphere. With multiple barriers, failure of one barrier will not endanger the isolation of spent nuclear fuel. Insoluble spent nuclear fuel will be stored in ironcopper canisters and placed in vertical tunnels within bedrock. Iron-copper canisters are surrounded with bentonite buffer to protect them from groundwater and from movements of the bedrock. MX-80 bentonite has been proposed to be used as a bentonite buffer in Finnish spent nuclear fuel repository. In a case of canister failure, bentonite buffer is expected to absorb and retain radionuclides originating from the spent nuclear fuel. If salinity of Olkiluoto island´s groundwater would decrease, chemical erosion of bentonite buffer could result in a generation of small particles called colloids. Under suitable conditions, these colloids could act as potential carriers for immobile radionuclides and transport them outside of facility area to the surrounding biosphere. Object of this thesis work was to study the effect of MX-80 bentonite colloids on radionuclide migration within two granitic drill core columns (VGN and KGG) by using two different radionuclides 134Cs and 85Sr. Batch type sorption and desorption experiments were conducted to gain information of sorption mechanisms of two radionuclides as well as of sorption competition between MX-80 bentonite colloids and crushed VGN rock. Colloids were characterized with scanning electron microscopy (SEM) and particle concentrations were determined with dynamic light scattering (DLS). Allard water mixed with MX-80 bentonite powder was used to imitate groundwater conditions of low salinity and colloids. Strontium´s breakthrough from VGN drill core column was found to be successful, whereas caesium did not breakthrough from VGN nor KGG columns. Caesium´s sorption showed more irreversible nature than strontium and was thus retained strongly within both columns. With both radionuclides, presence of colloids did not seem to enhance radionuclide´s migration notably. Breakthrough from columns was affected by both radionuclide properties and colloid filtration within tubes, stagnant pools and fractures. Experiments could be further complemented by conducting batch type sorption experiments with crushed KGG and by introducing new factors to column experiments. The experimental work was carried out at the Department of Chemistry, Radiochemistry in the University of Helsinki.
  • Mickwitz, Valter (2022)
    Utveckling inom masspektrometri har varit en av de drivande faktorerna för de senaste decenniernas framsteg inom förståelsen av atmosfärens kemi. Den data som samlas in med hjälp av masspektrometri är en av de största tillgångarna för fortsatt utveckling av kunskapen inom detta område. Dock är analysen av denna data en långsam och arbetsdryg process, och nya metoder krävs för att göra tillgänglig all den information som finns att utnyttja inom denna data. Den här avhandlingens mål var att utveckla en algoritm för automatisk identifiering av kemiska sammansättningar ur masspektrum med begränsad resolution. Målsättningen för algoritmen är att avsevärt minska på den tid som krävs för analys av masspektrum. Algoritmen fungerar genom att välja sammansättningar som maximerar sannolikheten att observera den data som observerats ($\chi^2$-anpassning) och väljer sedan den mest kostnadseffektiva modellen. Den mest kostnadseffektiva modellen syftar på den modell som nöjaktigt kan förklara data med så få sammansättningar som möjligt. För att identifiera den mest kostnadseffektiva modellen användes en modifierad version av det Bayesiska informationskriteriet. Algoritmens funktionsprinciper vidareutvecklades utgående från resultaten som erhölls från test av algoritmen med syntetisk data. Den slutliga algoritmen testades med data som samlats in i samband med tidigare experiment. Algoritmens resultat jämfördes med resultaten för analysen som gjordes i samband med experimenten. På basen av resultaten fungerar algoritmen. De val algoritmen gör motiveras av data, och motsvarar i de flesta fall de val som en forskare gör vid motsvarande tillfällen. Således kan algoritmen i sin nuvarande form tillämpas för analys av masspektrum, och förväntas kunna förkorta den tid som krävs för att identifiera kemiska sammansättningar ur masspektrum betydligt. Dock identifierades också ett antal utvecklingsområden som förväntas förbättra algoritmens prestation ytterligare.
  • Pajula, Ilari (2024)
    Combining data from visual and inertial sensors effectively reduces inherent errors in each modality, enhancing the robustness of sensor-fusion for accurate 6-DoF motion estimation over extended periods. While traditional SfM and SLAM frameworks are well established in literature and real-world applications, purely end-to-end learnable SfM and SLAM networks are still scarce. The adaptability of fully trained models in system configuration and navigation setup holds great potential for future developments in this field. This thesis introduces and assesses two novel end-to-end trainable sensor-fusion models using a supervised learning approach, tested on established navigation benchmarks and custom datasets. The first model utilizes optical flow, revealing its limitations in handling complex camera movements present in pedestrian motion. The second model addresses these shortcomings by using feature point-matching and a completely original design.
  • Folestad, Magdalena (2022)
    The study is sought to study how and if the environment has changed in eastern Finnish Lapland in a long-term perspective. Variables related to the current state of the environment, are atmospheric composition and aerosols, meteorology, and biology. The study is based on measurements from Värriö Subarctic Research station for the years 1973 to 2021. Included in atmospheric composition, are the atmospheric anthropogenic gas concentrations of CO, NOx, O3 and SO2. SO2 is also used in a proxy to estimate H2SO4 concentrations. Decreasing long-term trends are found for CO, NOx, SO2 and H2SO4. The decreasing emissions from Kola peninsula, is the cause for long-term decrease of SO2, which result in decreasing H2SO4 concentrations. Results of particle size distribution show an increasing concentration of small particles and decrease of large particles. Decline of particles leads to less NPF, CCN and will resultingly influence cloud properties. Air temperature has increased 2.38 °C and snow cover days have decreased by three weeks, between 1975 and 2021. Snow depth and precipitation show less significant changes. Heat sum have from 1981 to 2021 increased with 247 °C days, indicating more active and growing trees. Birch leave development show indications of leave burst and developed leaves to occur at earlier date, over the years 1981-2021. Grouses, shorebirds, and cavity-nesters show large inter-annual variations. Some of the bird species appears to benefit from environmental changes while others appear to have difficulty adapting.
  • Mäki-Iso, Emma (2021)
    Sijoitusten markkinariskin suuruutta tarkastellaan usein riskimittojen avulla. Riskimitta on kuvaus mahdollisia tappioita kuvaavien satunnaismuuttujien joukosta reaalilukuihin. Riski- mittojen avulla erilaisten sijotusten riskillisyyttä pystytään vertailemaan helposti. Pankki- valvojat hyödyntävät riskimittoja pankkien vakavaraisuuden valvonnassa. Pisimpään ylei- sessä käytössä on ollut VaR (Value-at-Risk) niminen riskimitta. VaR kertoo suurimman tappion, joka koetaan jollain asetetulla luottamustasolla α eli se on tappiojakauman α- kvantiili. Baselin uusimassa ohjeistuksessa (Minimum capital requirements of market risk) odotettu vaje niminen riskimitta korvaa VaR-riskimitan pääomavaateen laskennassa. Odotet- tu vaje kertoo, mikä on tappion odotusarvo silloin, kun tappio on suurempi kuin VaR- riskimitan antama luku. Riskimittaa ollaan vaihtamassa, koska VaR ei ole teoreettisilta ominaisuuksiltaan yhtä hyvä kuin odotettu vaje. Tämä johtuu siitä, että VaR ei ole sub- additiivinen, mikä tarkoittaa sitä, että positioiden yhteenlaskettu riski voi olla joissain ta- pauksissa suurempi kuin yksittäisten positioiden riskien summa. Tämä johtaa siihen, että hajauttamattoman sijoitussalkun riski voi olla pienempi kuin hajautetun. Odotettu vaje-riskimitta ei kuitenkaan ole täysin ongelmaton, koska se ei ole konsistentisti pisteytyvä, mikä tarkoittaa, että sille ei ole olemassa pisteytysfunktiota, jonka avulla voi- taisiin verrata estimoituja ja toteutuneita arvoja konsistentisti. Lisäksi se, että odotetun vajeen suuruus riippuu kaikista häntään jäävistä tappioista, tekee siitä herkän hännässä olevien tappioiden virheille. Tämä ei ole kovin hyvä ominaisuus, koska tappiojakaumien häntien estimointiin liittyy paljon epävarmuutta. Koska riskien estimointiin liittyy epävarmuutta, sääntely velvoittaa pankkeja toteumates- taamaan regulatiivisen pääomavaateen laskennassa käytettyjä riskiestimaatteja. Toteuma- testaamisella tarkoitetaan prosessia, jossa estimoituja riskilukuja verrataan toteutuneisiin tappioihin. VaR-estimaattien toteumatestaus perustuu niiden päivien lukumäärälle tes- tausjaksolla, joina tappio ylittää VaR-estimaatin antaman tappiotason. Odotetulle vajeelle ei ole vielä olemassa yhtä vakiintuneita toteumatestausmenetelmiä kuin VaR-estimaateille. Tässä tutkielmassa esitellään kolme erilaista tapaa toteumatestata odotettu vaje estimaat- teja, nämä tavat esittelivät Kratz kollegoineen, Moldenhauer ja Pitera sekä Costanzino ja Curran. Menetelmissä tarkastellaan useamman VaR-tason yhtäaikaisia ylityksiä, suojatun position eli tappion ja riskiestimaatin erotuksen positiiviseen lukuun kumuloitu- vasti summautuvien havaintojen määrää ja VaR-ylityksien keskimääräistä suuruutta. Tutkielman laskennallisessa osuudessa tutkittiin antavatko VaR- ja odotettu vaje- toteumatestit samanlaisia tuloksia ja vaikuttaako riskin estimointiin käytetyn havainto- jakson pituus estimaattien suoriutumiseen toteumatesteissä. Laskelmissa havaittiin, että odotettu vaje- ja VaR-toteumatestit antoivat samanlaisia tuloksia. Markkinadatasta eri kokoisilla estimointi-ikkunoilla lasketut estimaatit saivat toteumatestissä erikokoisia tes- tisuureiden arvoja, ja hyväksyivät väärän mallin tai hylkäsivät oikean malli eri todennä- köisyyksillä. Kun käytettiin puhtaasti simuloitua dataa, eri kokoisilla estimointi-ikkunoilla laskettujen estimaattien tuloksissa ei ollut eroja. Näin voidaan päätellä, että testitulosten erot eri mittaisilla havaintojaksoilla laskettujen estimaattien välillä eivät johdu pelkästään havaintojen määrästä vaan myös laadusta.
  • Ranta, Topi (2024)
    For machine learning, quantum computing provides effective new computation methods. The number of states a quantum register may express is exponential compared to the classical register of the same size, and this expressivity may be used in machine learning. It has been shown that in less than exponential time, a theoretical fault-tolerant quantum computer may perform computations that cannot be run on a classical computer in a feasible time. In machine learning, however, it has been shown that a classical machine learning method may learn a target model defined by an arbitrary quantum circuit if given a sufficient number of training data. In other words, a machine learning method that utilizes quantum computing may gain a quantum prediction advantage over its classical counterpart if the number of training data is low. However, this result does not address the noise of contemporary quantum computers. In this thesis, we use a simulation of a quantum circuit to test how a gradually increased noise affects the ability of a hybrid quantum-classical machine learning system to retain the quantum prediction advantage. With a simulated quantum circuit, we embed classical data rows into the quantum Hilbert space that acts as a feature space known from classical kernel theory. We project the data back to classical space, yielding a projected dataset that differs from the original. Using kernel matrices of the original and projected datasets, we then create adversarial binary labeling. With few training data, this adversarial labeling is easy for a classical neural network to learn using the projected features but impossible for using the original data. We show that this quantum prediction advantage diminishes as a function of the error rate introduced in the simulation of the data-embedding quantum circuit. Our results suggest the noise threshold for a feasible system lies slightly above the ones of contemporary hardware, indicating our experiment should be tested on actual quantum hardware. We derive a parameter optimization scheme for an arbitrary hardware implementation such that it may be concluded whether the quantum hardware in question may produce a quantum advantage dataset beyond the simulation capability of classical computers.
  • Valkama, Bearjadat (2022)
    Above-ground biomass (AGB) estimation is an important tool for predicting carbon flux and the effects of global warming. This study describes a novel application of remote-sensing based AGB estimation in the hemi-boreal vegetation zone of Finland, using Sentinel-1, Sentinel-2, ALOS-2 PALSAR-2, and the Multi-Source National Forest Inventory by Natural Resources Institute Finland as sources of data. A novel method of extracting data from the features of the surrounding observations is proposed, and the method’s effectiveness was evaluated. The findings showed that the method showed promising results, with the model trained using the extracted features achieving the highest evaluation scores in the study. In addition, the viability of using free and highly available satellite datasets for AGB estimation in the hemi-boreal Finland was analyzed, with the results suggesting that the free Synthetic Aperture Radar (SAR) based products had a low performance. The features extracted from the optical data of Sentinel-2 produced well-performing models, although the accuracy might still be too low to be feasible.
  • Joswig, Niclas (2021)
    Simultaneous Localization and Mapping (SLAM) research is gaining a lot of traction as the available computational power and the demand for autonomous vehicles increases. A SLAM system solves the problem of localizing itself during movement (Visual Odometry) and, at the same time, creating a 3D map of its surroundings. Both tasks can be solved on the basis of expensive and spacious hardware like LiDaRs and IMUs, but in this subarea of visual SLAM research aims at replacing those costly sensors by, ultimately, inexpensive monocular cameras. In this work I applied the current state-of-the-art in end-to-end deep learning-based SLAM to a novel dataset comprising of images recorded from cameras mounted to an indoor crane from the Konecranes CXT family. One major aspect that is unique about our proposed dataset is the camera angle that resembles a classical bird’s-eye view towards the ground. This orientation change coming alongside with a novel scene structure has a large impact on the subtask of mapping the environment, which is in this work done through monocular depth prediction. Furthermore, I will assess which properties of the given industrial environments have the biggest impact on the system’s performance to identify possible future research opportunities for improvement. The main performance impairments I examined, that are characteristic for most types of industrial premise, are non-lambertian surfaces, occlusion and texture-sparse areas alongside the ground and walls.
  • Joswig, Niclas (2021)
    Simultaneous Localization and Mapping (SLAM) research is gaining a lot of traction as the available computational power and the demand for autonomous vehicles increases. A SLAM system solves the problem of localizing itself during movement (Visual Odometry) and, at the same time, creating a 3D map of its surroundings. Both tasks can be solved on the basis of expensive and spacious hardware like LiDaRs and IMUs, but in this subarea of visual SLAM research aims at replacing those costly sensors by, ultimately, inexpensive monocular cameras. In this work I applied the current state-of-the-art in end-to-end deep learning-based SLAM to a novel dataset comprising of images recorded from cameras mounted to an indoor crane from the Konecranes CXT family. One major aspect that is unique about our proposed dataset is the camera angle that resembles a classical bird’s-eye view towards the ground. This orientation change coming alongside with a novel scene structure has a large impact on the subtask of mapping the environment, which is in this work done through monocular depth prediction. Furthermore, I will assess which properties of the given industrial environments have the biggest impact on the system’s performance to identify possible future research opportunities for improvement. The main performance impairments I examined, that are characteristic for most types of industrial premise, are non-lambertian surfaces, occlusion and texture-sparse areas alongside the ground and walls
  • Suomela, Samu (2021)
    Large graphs often have labels only for a subset of nodes. Node classification is a semi-supervised learning task where unlabeled nodes are assigned labels utilizing the known information of the graph. In this thesis, three node classification methods are evaluated based on two metrics: computational speed and node classification accuracy. The three methods that are evaluated are label propagation, harmonic functions with Gaussian fields, and Graph Convolutional Neural Network (GCNN). Each method is tested on five citation networks of different sizes extracted from a large scientific publication graph, MAG240M-LSC. For each graph, the task is to predict the subject areas of scientific publications, e.g., cs.LG (Machine Learning). The motivation of the experiments is to give insight on whether the methods would be suitable for automatic labeling of scientific publications. The results show that label propagation and harmonic functions with Gaussian fields reach mediocre accuracy in the node classification task, while GCNN had a low accuracy. Label propagation was computationally slow compared to the other methods, whereas harmonic functions were exceptionally fast. Training of the GCNN took a long time compared to harmonic functions, but computational speed was acceptable. However, none of the methods reached a high enough classification accuracy to be utilized in automatic labeling of scientific publications.
  • Pirnes, Sakari (2023)
    The Smoluchowski coagulation equation is considered to be one of the most fundamental equations of the classical description of matter alongside with the Boltzman, Navier-Stokes and Euler equations. It has applications from physical chemistry to astronomy. In this thesis, a new existence result of measure valued solutions to the coagulation equation is proven. The proven existence result is stronger and more general than a previously claimed result. The proven result holds for a generic class of coagulation kernels, including various kernels used in applications. The coagulation equation models binary coagulation of objects characterized by a strictly positive real number called size, which often represents mass or volume in applications. In binary coagulation, two objects can merge together with a rate characterized by the so-called coagulation kernel. Time evolution of the size distribution is given by the coagulation equation. Traditionally the coagulation equation has two forms, discrete and continuous, which are referring to whether the objects sizes can take discrete or continuous values. A similar existence result to the one proven in this thesis has been obtained for the continuous coagulation equation, while the discrete coagulation equation is often favored in applications. Being able to study both discrete and continuous systems and their mixtures at the same time has motivated the study of measure valued solutions to the coagulation equation. After motivating the existence result proven in this thesis, its proof is organized into four Steps described at the end of the introduction. The needed mathematical tools and their connection to the four Steps are presented in chapter 2. The precise mathematical statement of the existence result is given in chapter 3 together with Step 1, where the coagulation equation will be regularized using a parameter ε ∈ (0, 1) into a more manageable regularized coagulation equation. Step 2 is done in chapter 4 and it consists of proving existence and uniqueness of a solution f_ε for each regularized coagulation equation. Step 3 and Step 4 are done in chapter 5. In Step 3, it will be proven that the regularized solutions {f_ε} have a converging subsequence in the topology of uniform convergence on compact sets. Step 4 finishes the existence proof by verifying that the subsequence’s limit satisfies the original coagulation equation. Possible improvements and future work are outlined in chapter 6.
  • Vuoksenmaa, Aleksis Ilari (2020)
    Coagulation equations are evolution equations that model the time-evolution of the size-distribution of particles in systems where colliding particles stick together, or coalesce, to form one larger particle. These equations arise in many areas of science, most prominently in aerosol physics and the study of polymers. In the former case, the colliding particles are small aerosol particles that form ever larger aerosol particles, and in the latter case, the particles are polymers of various sizes. As the system evolves, the density of particles of a specified size changes. The rate of change is specified by two competing factors. On one hand there is a positive contribution coming from smaller particles coalescing to form particles of this specific size. On the other hand, particles of this size can coalesce with other particles to form larger particles, which contributes negatively to the density of particles of this size. Furthermore, if there is no addition of new particles into the system, then the total mass of the particles should remain constant. From these considerations, it follows that the time-evolution of the coagulation equation is specified for every particle size by a difference of two terms which preserve the total mass of the system. The physical properties of the system affect the time evolution via a coagulation kernel, which determines the rate at which particles of different sizes coalesce. A variation of coagulation equations is achieved when we add an injection term to the evolution equation to account for new particles injected into the system. This results in a new evolution equation, a coagulation equation with injection, where the total mass of the system is no longer preserved, as new particles are added into the system at each point in time. Coagulation equations with injection may have non-trivial solutions that are independent of time. The existence of non-trivial stationary solutions has ramifications in aerosol physics, since these might map to observations that the particle size distribution in the air stays approximately constant. In this thesis, it will be demonstrated, following Ferreira et al. (2019), that for any good enough injection term and for suitably picked, compactly supported coagulation kernels, there exists a stationary solution to a regularized version of the coagulation equation. This theorem, which relies heavily on functional analytic tools, is a central step in the proof that certain asymptotically well-behaved kernels have stationary solutions for any prescribed compactly supported injection term.
  • Kilpeläinen, Wille Julius (2020)
    Inductively coupled mass spectrometry (ICP-MS) is a state-of-the-art technique for elemental analysis. The technique allows fast and simultaneous analysis of multiple elements with a wide dynamic range and low detection limits. However, multiple adjustable parameters and the complex nature ICP-MS instruments can make the development of new analysis methods a tedious process. Design of experiments (DOE) or experimental design is a statistical approach for conducting multi- variate experiments in a way that gives maximal amount of information from each experiment. By using DOE the number of experiments needed for analytical method optimization can be minimized and information about interrelations of di↵erent experimental variables can be obtained. The aim of this thesis is to address the utilization of DOE for ICP-MS method developement as a more e cient mean to optimize analytical methods. The first part of this two part thesis gives an overview on the basics of ICP-MS and DOE. Then a literature review on applying experimental design for ICP-MS method optimization is given and the current state of the research is discussed. In the second part, two new ICP-MS methods for simultaneous determination of 28 elements from six middle distillate fuels, diluted with xylene or kerosine, are presented. The method developement involved optimization of the integration times and optimization of test sample dilution ratios and viscosities using univariate techniques. In addition, experimental designs were succesfully utilized together with desirability approach in multivariate optimizations of the plasma conditions and sample matrix compositions to achieve the best possible analyte recoveries from various matrices.
  • Gunnlaugsdóttir, Eyrún Gyða (2022)
    Biological soil crust, biocrust, is a significant contributor to biogeochemical cycles through nitrogen and carbon cycling. Further, it stabilizes soil, facilitates water infiltration, and mitigates soil erosion. The global biocrust cover is believed to decrease by about 25-40% in the next 60 years due to climate change and intensification in land use. Research on biocrust in arctic and subarctic regions is limited, much of the knowledge comes from lower latitudes in arid and semiarid ecosystems. Cold-adapted biocrust might respond differently to increasing temperatures when compared with warm-adapted biocrust. Therefore, it is fundamental to research biocrust in arctic and subarctic regions when looking at how fast the climate is changing in the Northern hemisphere. Temporal variations of soil respiration in subarctic biocrust have not been studied systematically before. This research project focuses on the effects of warming on soil respiration in biocrust, on a diurnal and a seasonal scale. It also focuses on species composition changes of vascular plants in the warming experiment where warming was induced with open-top chambers (OTCs). Soil respiration, temperature, soil water content, as well as plant species composition changes were measured during three field trips that each lasted four days during the growing season of 2021. The results show that soil respiration was lower in September when compared with measurements done in June and July. The highest values of soil respiration were observed during mid-day and the lowest during evenings and nights. The temperatures of OTC plots were, on average, 1.16 °C higher than control plots, and OTC plots had significantly lower soil water content than control plots. During this research, the soil respiration increased with higher temperature but was not different between control and OTC plots during any time of day or month measured. Soil water content did not affect soil respiration significantly, while temperature did. These findings might be explained by less soil water content within warmer plots, but warmth and moisture have been shown to increase soil respiration. In other words, less soil water content might countereffect the increase of soil respiration due to warming. Some vascular plant species were more likely to be found within or outside the warming plots. Dwarf willow, Salix herbacea, decreased in cover within OTC plots. Previous research has shown that warming significantly reduces pollen shed and time of pollen shedding for S. herbacea, which might decrease its abundance within OTC plots. Alpine bistort, Bistorta vivipara, increased in cover within OTC plots compared to control plots. Warming experiments on B. vivipara have shown positive effects on reproductive parameters, which might increase its abundance within warmed OTC plots. Sheep also prefer grazing on B. vivipara. Therefore, it might have less cover in control plots, given that OTCs exclude grazing and that many sheep roam the studied site during the growing season. Vascular plant cover was greater within control plots when compared with warmed plots. Previous results at the same site after one year of warming, from summer 2019, showed more vascular plant cover within the OTC plots when compared with control plots. The results of this research might indicate that vascular plants are gradually affected by the warming and are transitioning towards a new equilibrium. The results of this research are ground for further studies on subarctic ecosystems dominated by biocrust. Many biotic and abiotic factors affect carbon cycles. For future modelling of predicted effects of climate change, having better knowledge on how subarctic ecosystems respond to warming is essential for a better understanding of the functions and feedbacks in a global context.