Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Title

Sort by: Order: Results:

  • Rensing, Fabian (2024)
    Accurately predicting a ship’s fuel consumption is essential for an efficient shipping operation. A prediction model has to be regularly retrained to minimize drift between its predictions and the actual consumption of the ship since a ship’s performance is constantly changing because of weather influences and constant hull fouling. Continuous Learning (CL) promises repeated retraining of an ML model while also mitigating catastrophic forgetting. The so-called catastrophic forgetting happens when a model is trained on new data without proper measures to “remind” the model of its previous knowledge. In the context of Ship Performance Prediction, this might be previously encountered weather or performance patterns in certain conditions. This thesis explores the adaptability of CL to set up a production-ready training pipeline to regularly retrain a model that predicts a ship’s fuel consumption.
  • Prittinen, Taneli (2017)
    Tässä työssä kehitettiin SQUID-pohjainen laitteisto helium-3:lla tehtäviä NMR-mittauksia varten ja suoritettiin mittauksia sekä nk. jatkuvan aallon (continous wave) NMR:llä että pulssitetun aallon (pulsed wave) menetelmällä. Helium-3:n korkean hinnan (n. 5000 euroa/litra) takia työssä käytettiin testitarkoituksiin NMR-materiaaleina myös fluoria sisältävää teflonia ja vetyä sisältävää jäätä. Laitteisto suunniteltiin ja rakennettiin Aalto-yliopiston O.V. Lounasmaa -laboratoriossa, nykyiseltä nimeltään Low Temperature Laboratory. NMR eli ydinmagneettinen resonanssi on ilmiö jossa ydinspinilliset atomiytimet sijoitetaan staattiseen magneettikenttään ja viritetään niitä ulkoisella sähkömagneettisella säteilyllä, jonka jälkeen niiden viritystila purkautuu vapauttaen NMR-signaalin. Tällä tavalla pystytään tutkimaan monia aineen eri ominaisuuksia. SQUID eli Superconducting Quantum Interference Detector taas on nimensä mukaisesti kvantti-interferenssiin perustuva laite, joka kykenee havaitsemaan erittäin pieniä magneettikenttiä. NMR:n yhteydessä se on tehokas esivahvistin, jonka avulla voidaan havaita hyvin pieniäkin signaaleja. Tässä työssä sillä on tarkoitus parantaa signaali-kohinasuhdetta verrattuna perinteisiin puolijohde-esivahvistimiin ja saada aikaan ilmaisin jolla voidaan mitata myös matalammilla taajuuksilla kuin tutkimusryhmällä on nykyisin käytössä. Suoritettujen mittausten perusteella laitteisto kykeni havaitsemaan NMR-signaalin jatkuvan aallon menetelmällä jokaisesta tutkitusta aineesta. Pulssitettuja mittauksia ei vielä toistaiseksi onnistuttu tekemään onnistuneesti johtuen heliumin pitkähköstä, n. 30 sekunnin, relaksaatioajasta joka teki pidemmistä mittaussarjoista vaikeita toteuttaa. Vastaavasti kahdella kiinteällä aineella, teflonilla ja jäällä, resonanssin leveys oli niin suuri että energian absorbointi pulsseilla näytteeseen olisi hankalaa ja tuottaisi signaaleja joiden pienuus tekisi niistä hankalasta havaittavia, joten näitä aineita tutkittiin tässä työssä vain jatkuvan aallon menetelmällä.
  • Wang, Sai (2015)
    Robustness testing is an important aspect in web service testing. It focuses on the service's ability to deal with invalid input. Therefore, the test cases of robustness testing aims at good coverage on input conditions. Behaviours of participate services are described in BPEL contract. Services communicate with each other by sending SOAP messages. BPEL process is seen as a graph with nodes and edges which stand for activities and messages. Due to the feature of business process, we extend the robustness of web services in SOA ecosystems based on the traditional robustness definition. The robustness test case generation focuses on test paths or message sequences generation and test data in SOAP messages generation. Web service contract contains information related to test case generation. In this thesis, we divide the contracts into three levels: document level contract, model level contract and implementation level contract. Model level contract provides the information for test case generation. BPEL contract helps test paths generation and WSDL contract helps test data generation. By analysing the contents in contract, test cases can be generated. Petri net and graph-based method are chosen as a method for message sequences generation. Data perturbation technology is used for invalid test data generation.
  • Lipsanen, Mikko (2022)
    The thesis presents and evaluates a model for detecting changes in discourses in diachronic text corpora. Detecting and analyzing discourses that typically evolve over a period of time and differ in their manifestations in individual documents is a challenging task, and existing approaches like topic modeling are often not able to reach satisfactory results. One key problem is the difficulty of properly evaluating the results of discourse detection methods, due in large part to the lack of annotated text corpora. The thesis proposes a solution where synthetic datasets containing non-stable discourse patterns are generated from a corpus of news articles. Using the news categories as a proxy for discourses allows both to control the complexity of the data and to evaluate the model results based on the known discourse patterns. The complex task of extracting topics from texts is commonly performed using generative models, which are based on simplifying assumptions regarding the process of data generation. The model presented in the thesis explores instead the potential of deep neural networks, combined with contrastive learning, to be used for discourse detection. The neural network model is first trained using supervised contrastive loss function, which teaches the model to differentiate the input data based on the type of discourse pattern it belongs to. This pretrained model is then employed for both supervised and unsupervised downstream classification tasks, where the goal is to detect changes in the discourse patterns at the timepoint level. The main aim of the thesis is to find out whether contrastive pretraining can be used as a part of a deep learning approach to discourse change detection, and whether the information encoded into the model during contrastive training can generalise to other, closely related domains. The results of the experiments show that contrastive pretraining can be used to encode information that directly relates to its learning goal into the end products of the model, although the learning process is still incomplete. However, the ability of the model to generalise this information in a way that could be useful in the timepoint level classification tasks remains limited. More work is needed to improve the model performance, especially if it is to be used with complex real world datasets.
  • Kangasaho, Vilma Eveliina (2018)
    The goal of this study is to ascertain whether methane (CH4) emissions can be estimated source-wise by utilising stable isotope observations in the CarbonTracker Data Assimilation System (CTDAS). The global CH4 budget is poorly known and there are uncertainties in the spatial and temporal distributions as well as in the magnitude of different sources. In this study CTDAS-13CH4 atmospheric inverse model is developed. CTDAS-13CH4 is based on ensemble Kalman filer (EnKF) and used to estimate CH4 fluxes on a region and weekly resolution by implementing CH4 and δ13C-CH4 observations. Anthropogenic biogenic emissions (rice cultivation, landfills and waste water treatments and enteric fermentation and manure management) and anthropogenic non-biogenic emissions (coal, residential and oil and gas) are optimised. Different emission sources can be identified by using process-specific isotopic signature values, δ13C-CH4, because different processes produce CH4 with different isotopic ratio. Optimisation of anthropogenic biogenic emissions increased the total emissions from the prior in eastern North America by 34%, while the optimisation of anthropogenic non-biogenic emissions increased only by 14%. In western North America the corresponding changes were −39% and 9%, respectively. In western parts of Europe, total emissions from prior increased in anthropogenic biogenic optimisation by 18% and decreased in non-biogenic by 3%. Optimisation of anthropogenic biogenic and non-biogenic emissions in the total CH4 budget did not give complete emission estimates, because the optimisation did not include all emission sources and source-specific δ13C-CH4 values were assumed not to vary regionally. However, the modelled concentrations from the optimisation of anthropogenic non-biogenic emissions agreed with the observations of CH4 concentration and δ13C-CH4 values better. Therefore, one could say that the optimisation of anthropogenic non-biogenic emissions was more successful. This study provides reliable information of the magnitude of anthropogenic biogenic and non-biogenic emissions in regions with sufficient observational coverage. The next step in evaluating the spatial and temporal distributions and magnitude of different CH4 sources will be optimising all emission sources simultaneously in a multi-year simulation.
  • Nummelin, Aleksi (Helsingin yliopistoHelsingfors universitetUniversity of Helsinki, 2012)
    The Meridional overturning circulation (MOC) is one crucial component in Earth's climate system, redistributing heat round the globe. The abyssal limb of the MOC is fed by the deep water formation near the poles. A basic requirement for any successful climate model simulation is the ability to reproduce this circulation correctly. The deep water formation itself, convection, occurs on smaller scales than the climate model grid size. Therefore the convection process needs to be parameterized. It is, however, somewhat unclear how well the parameterizations which are developed for turbulence can reproduce the deep convection and associated water mass transformations. The convection in the Greenland Sea was studied with 1-D turbulence model GOTM and with data from three Argo floats. The model was run over the winter 2010-2011 with ERA-Interim and NCEP/NCAR atmospheric forcings and with three different mixing parameterizations, k-e, k-kL (Mellor-Yamada) and KPP. Furthermore, the effects of mesoscale spatial variations in the atmospheric forcing data were tested by running the model with forcings taken along the floats' paths (Lagrangian approach) and from the floats' median locations (Eulerian approach). The convection was found to happen by gradual mixed layer deepening. It caused salinity decrease in the Recirculating Atlantic Water (RAW) layer just below the surface while in the deeper layers salinity and density increase was clearly visible. A slight temperature decrease was observed in whole water column above the convection depth. Atmospheric forcing had the strongest effect on the model results. ERA-interim forcing produced model output closer to the observations, but the convection begun too early with both forcings and both generated too low temperatures in the end. The salinity increase at mid-depths was controlled mainly by the RAW layer, but also atmospheric freshwater flux was found to affect the end result. Furthermore, NCEP/NCAR freshwater flux was found to be large enough (negative) to become a clear secondary driving factor for the convection. The results show that mixing parameterization mainly alters the timing of convection. KPP parameterization produced clearly too fast convection while k-e parameterization produced output which was closest to the observations. The results using Lagrangian and Eulerian approaches were ambiguous in the sense that neither of them was systematically closer to the observations. This could be explained by the errors in the reanalyzes arising from their grid size. More conclusive results could be produced with the aid of finer scale atmospheric data. The results, however, clearly indicate that atmospheric variability in scales of 100 km produces quantifiable differences in the results.
  • Rimo, Eetu (2023)
    In this thesis I have examined wind gust cases in Finland that have occurred during the summer season between 2010 and 2021. The main goal of the thesis was to find convective wind gust cases of non-tornadic origin, also known as damaging straight-line winds, and find out whether the gust on the surface could have been, in theory, solely caused by the slow advection of strong upper-level winds to the surface or whether another factor, such as a strong downdraft, must have played a role in the creation of the gust. Convective wind gusts occur in Finland every summer, but despite this, the amount of research on them and the damage they can cause has been relatively small in the past compared to gusts caused by extratropical cyclones, for example. To find suitable wind gust cases, weather data from the Finnish Meteorological Institute (FMI) was downloaded. After scanning through the data to find cases, which were suspected of being convective origin, ERA5 reanalysis data developed by the European Centre for Medium-Range Weather Forecasts (ECMWF) was downloaded from the locations and times of the gusts’ occurrence. Also chosen for further examination, for comparison purposes, were wind gust cases suspected of being caused by extratropical cyclones. The FMI wind gust speed and wind speed data was visualized in line charts, while the ERA5 data values of wind speed, equivalent potential temperature and relative humidity were tabulated and visualized in vertical cross sections. The visualization was done with the help of Python’s matplotlib.pyplot library and the MetPy toolbox. The results indicated that the differences between gust cases caused by convection and gust cases caused by extratropical cyclones can be clearly seen from the reanalysis data. As for the convective cases themselves, the data indicated that in several of them the gust could have been caused by the slow advection of strong upper-level winds to the surface on its own, in theory at least. However, in the majority of the cases the data indicated that the gust was likely the result of a strong downdraft or possibly by a combination of a downdraft and advection. Besides this, the values of the examined parameters and their visualization revealed that damaging straight-line winds can occur under various conditions in Finland.
  • Ruiz, Paloma (2020)
    High-quality thin films deposited by Atomic layer deposition (ALD) are key components in numerous modern technological applications. The technique is extensively used in the semiconductor and photovoltaic industry, for example. ALD is an excellent technique for thin film deposition due to its characteristic self-limiting surface reactions that allow a reproducible, conformal, and precisely controlled coating on a substrate. Numerous new ALD materials are developed each year to advance technological innovations to new levels. However, occasionally a desired material cannot be produced directly by ALD. To still obtain the impressive features of ALD, an ALD thin film can be transformed chemically or physically in a manner that preserves the film-like structure of the original layer. This thesis explores these types of ALD film transformations, or conversions, by attempting the conversions of atomic layer deposited Al2O3 from a two-dimensional film to a tree-dimensional film structure, Ru to RuO2, Re to ReO3, ZnO to ZIF-8, and ZrOx to UiO-66 films. Current preparation methods, common applications, and general properties of these five materials are explored in the literature review. This provides an insight on some of the key features of the fabrication of these materials and what value the thin film structure brings them. It also highlights the challenges that are encountered regarding these processes. If the material has been obtained through conversion of an ALD thin film, the process is reviewed with detail. Additionally, the literature review explains the basics of conversion reactions as well as fundamentals of ALD. The experimental section focuses on studying and optimizing the distinct challenges as well as explores new methods for fabricating the materials through the conversion of ALD thin films. Conversion of zirconium oxide thin films to UiO-66 under terephthalic acid vapor was attempted with modest results. Ru thin films were converted into crystalline RuO2 under ambient and O2 atmospheres successfully and the processes seem promising for further research. Re and ReNx films were partially converted to ReO3 under O2, O3 and humid environments. The continuity of these films proved to be problematic. Factors affecting the formation of ZIF-8 and Al2O3 “grass" in the conversions of ZnO under 2-methylimidazole vapor and Al2O3 under heated water, respectively, were assessed, and the optimization of these processes was studied.
  • Peltonen, Jussi (2019)
    FINIX is a nuclear fission reactor fuel behaviour module developed at VTT Technical Research Centre of Finland since 2012. It has been simplified in comparison to the full-fledged fuel performance codes to improve its usability in coupled applications, by reducing the amount of required input information. While it has been designed to be coupled on a source-code level with other reactor core physics solvers, it can provide accurate results as a stand-alone solver as well. The corrosion that occurs on the interface between nuclear fuel rod cladding and reactor coolant is a limiting factor for the lifespan of a fuel rod. Of several corrosion phenomena, oxidation of the cladding has been studied widely. It is modelled in other fuel performance codes using semiempirical models based on several decades of experimental data. This work aims to implement cladding oxidation models in FINIX and validate them against reference data from experiments and the state-of-the-art fuel performance code FRAPCON-4.0. In addition to this, the models of cladding-coolant heat transfer and coolant conditions are updated alongside to improve the accuracy of the oxidation predictions in stand-alone simulations. The theory of the cladding oxidation, water coolant models and general structure of FINIX and reactor analysis will be studied and discussed. The results of the initially implemented cladding oxidation models contain large errors, which indicates that FINIX does not account for the axial temperature difference between the bottom and the top of the rod in the coolant. This was corrected with the updates to the coolant models, which calculate various properties of a water coolant based on International Association for the Properties of Water and Steam (IAWPS) industrial water correlations to solve the axial temperature increase in a bulk coolant. After these updates the predictions of cladding oxidation improved and the validity of the different oxidation models were further analyzed in the context of FINIX.
  • Knuutinen, Janne (2017)
    Copuloista on tullut yleinen työkalu finanssimaailman käyttötarkoituksiin. Tämän työn tavoitteena on esitellä copuloiden teoriaa ja sen soveltamista rahoitusriskien mallintamiseen. Copulat määritellään ja niihin liittyvää keskeistä teoriaa käydään läpi. Tärkeimpiä korrelaatiokonsepteja esitellään, muun muassa tunnusluvut Kendallin tau ja Spearmanin rho. Lisäksi copulaperheet, joihin eniten käytetyt copulat kuuluvat, määritellään. Copuloiden parametreja voi estimoida eri metodien avulla. Kolme tärkeintä loguskottavuusfunktioon perustuvaa metodia käydään läpi, samoin kuin Monte Carlo -menetelmä, jolla voidaan simuloida muuttujia copuloista. Esitellään häntäriippuvuus, joka on hyödyllinen käsite äärimmäisiä ilmiöitä mallinnettaessa. Value at Risk eli VaR on yksi tärkeimmistä sijoitusriskien riskimitoista. Uudelleenjärjestelyalgoritmiin perustuvan menetelmän avulla voidaan laskea huonoimmat ja parhaat VaR:n arvot. Menetelmän toimintaa havainnollistetaan järjestelemällä eräs matriisi algoritmin avulla niin, että nähdään huonoimman VaR:n yläraja. Menetelmää sovelletaan vielä kolmen eri osakkeen, Nokian, Samsungin ja Danske Bankin, useamman vuoden päivittäisistä tappioista koostetun matriisin uudelleenjärjestelyyn. Näin saatu huonoimman VaR:n yläraja on suurempi kuin historiallisen VaR:n arvo, joka laskettiin toisella menetelmällä. Tutkielman teorian käytännön soveltamista jatketaan vielä laskemalla osakkeiden tappioiden välisiä korrelaatioita. Nokian ja Deutsche Bankin tappioiden välisen korrelaatiokertoimen huomataan olevan arvoltaan suurin, ja todettaan, että niiden välistä riippuvuusrakennetta voidaan kuvata parhaiten t-copulalla.
  • Yi, Xinxin (2015)
    Problem: Helsinki psychotherapy study (HPS) is a quasi-experimental clinical trial, which is designed to compare the effects of different treatments (i.e. psychotherapy and psychoanalysis) on patients with mood and anxiety disorders. During its 5-year follow-ups from the year 2000 to 2005, repeated measurements were carried out at 0, 12, 24, 36, 48, 60 months. However, some individuals did not show up at certain data collection points or dropped out of the study forever, leading to the occurrence of missing values. This will prevent the applications of further statistical methods and violate the intention-to-treat (ITT) principle in longitudinal clinical trials (LCT). Method: Multiple Imputation (MI) has many claimed advantages in handling missing values. This research will compare different MI methods i.e. Markov chain Monte Carlo (MCMC), Bayesian Linear Regression (BLR), Predictive Mean Matching (PMM), Regression Tree (RT), Random Forest (RF) in their treatments of HPS missing data. The statistical software is SAS PROC MI procedure (version 9.3) and R MICE package (version 2.9). Results: MI has better performance than the ad-hoc methods such as listwise deletion in the detections of potential relationships and the reduction of potential biases in parameter estimations if missing completely at random (MCAR) assumption is not satisfied. PMM, RT and RF have better performance in generating imputed values inside the range of the observed data than BLR and MCMC. The machine learning methods i.e. RT and RF are preferable than the regression methods such as PMM and BLR since the imputed data have quite similar distribution curves and other features (e.g. median, interguatile, skewness of distribution) as the observed data. Implications: It is suggestive to use MI methods to replace those ad-hoc methods in the treatments of missing data, if additional efforts and time are not a problem. The machine learning methods such as RT and RF are more preferable than those relatively arbitrary user-specified regression methods such as PMM and BLR according to our data, but further research are required to approve this indication. R is more flexible than SAS where RT and RF can be applied.
  • Kumar, Ajay Anand (2012)
    Due to next generation of sequencing technologies the amount of public sequence data is exponentially growing, however the rate of sequence annotation is lagging behind. There is need for development of robust computational tools for correct assignment of annotation to protein sequences. Sequence homology based inference of molecular function assignment and subsequent transfer of the annotation is the traditional way of annotating genome sequences. TF-IDF based methodology of mining informative description of high quality annotated sequences can be used to cluster functionally similar and dissimilar protein sequences. The aim of this thesis work is to perform the correlation analysis of TF-IDF methodology with standard methods of Gene Ontology (GO) semantic similarity measures. We have developed and implemented a high-throughput tool named GOParGenPy for effective and faster analysis related to Gene Ontology. It incorporates any Gene Ontology linked annotation file and generates corresponding data matrices, which provides a useful interface for any downstream analysis associated with Gene Ontology across various mathematical platforms. Finally, the correlation evaluation between TF-IDF and standard Gene Ontology semantic similarity methods validates the effectiveness of TF-IDF methodology in order to cluster functionally similar protein sequences.
  • Lipsanen, Veera (2024)
    The constant outflow of solar wind from the Sun and possible larger structures within it influence the Earth's magnetosphere. These large structures include interplanetary coronal mass ejections (ICMEs) and high speed streams (HSSs). They can contain substructures: fast enough ICMEs can have a turbulent sheath region in front of them, while a HSS can interact with the slower ambient solar wind and form a stream interaction region (SIR). Pc5 Ultra-low frequency (ULF) waves have a frequency range of 2–7 mHz and they are important in energy transfer from solar wind to the magnetosphere and they affect energetic electrons in the radiation belts. ULF waves in the magnetosphere are generated by multiple mechanisms. For example fluctuations in solar wind's dynamic pressure create waves on the dayside, Kelvin-Helmholtz instability, often caused by HSSs, on the magnetopause flanks and substorms on the nightside. This makes ULF waves MLT dependent. For this thesis a new ground-based ULF index that is MLT dependent and has a resolution of 1 min is constructed using the wavelet analysis method. The aim of this thesis is to study how this new index correlates with multiple solar wind parameters, geomagnetic indices and an already existing ULF index during substructures of four events: weak HSS and ICME and strong HSS and ICME. The ULF power is found to peak at the sheath–ejecta boundary during ICMEs and the stream interface during HSSs, primarily driven by dawn and night ULF powers. The AE index is found to correlate with ULF power during all of the events, which indicates that even the non-geoeffective event produces some kind of substorm activity. Solar wind speed is found to correlate well with the ULF power during SIRs. It is important to take into account the MLT dependence of ULF waves since their generation mechanisms are different at different parts of the magnetosphere. In addition, we found that the ULF power in the four MLT sectors can behave differently at the same moment.
  • Tauriainen, Juha (2023)
    Software testing is an important part of ensuring software quality. Studies have shown that having more tests results in a lower count of defects. Code coverage is a tool used in software testing to find parts of the software that require further testing and to learn which parts have been tested. Code coverage is generated automatically by the test suites during test execution. Many types of code coverage metrics exist, the most common being line coverage, statement coverage, function coverage, and branch coverage metrics. These four common metrics are usually enough, but there are many specific coverage types for specific purposes, such as condition coverage which tells how many boolean conditions have been evaluated as true and false. Each different metric gives hints on how the codebase is tested. A common consensus amongst practitioners is that code coverage does not correlate much with software quality. The correlation of software quality with code coverage is a historically broadly researched topic, which has importance both in academia and professional practice. This thesis investigates if code coverage correlates with software quality by performing a literature review. Surprising results are derived from the literature review, as most studies included in this thesis point towards code coverage correlating with software quality. This positive correlation comes from 22 studies conducted between 1995-2021, which include Academic and Industrial studies, with studies put into multiple categories, such as Correlation or No correlation based on the key finding, and categories such as Survey studies, Case studies, Open-source studies, based on the study type. Each category has most studies pointing towards a correlation. This finding is in contradiction with the opinions of professional practitioners.
  • Tuna, Yasemin (2023)
    Nuclear power plant decommissioning is a difficult process that combines industrial decommissioning techniques, radiation safety standards, and legal requirements for the final disposal of nuclear waste. The goal of nuclear decommissioning is to completely purge the plant of all radioactive material so that it can be released from regulatory oversight. The range of corrosion products generated on the steel surface are known to have a significant impact on the corrosion process of steel. Corrosion products have a complicated structure. The corrosion products are created when metallic components, mostly iron, react with oxygen and water that are drawn from the atmosphere, and their structure is then significantly influenced by environmental factors. Quantitative characterisation of the atomic scale structure of corrosion products is critically needed for identifying the corrosion products reliably. This thesis provides the characterization process of corrosion products formed on the steel surfaces and this process was executed with the help of XRD (X-ray Diffraction), SEM/EDS Scanning Electron Microscope/ Energy Dispersive Spectrometry, and Raman spectroscopy. Within the scope of this project, besides characterization of steel samples, Loviisa ground water and synthetic water samples which have been in a long-term contact with activated steel samples were also examined. Separation processes was carried out for determining Fe-55 and Ni-63 in the waters and the presence of Co-60 was removed from the samples before the activity determination of Fe-55 and Ni-63 by LSC (Liquid Scintillation Counting). This master's thesis has been carried out in connection with the DEMONI project, which has been a coordinated project of VTT and the University of Helsinki (KYT2022 Research Program). The outcome of the thesis will benefit possible decommissioning and disposal strategies for the nuclear power plant's reactor pressure vessels.
  • Sandhu, Jaspreet (2013)
    This thesis aims to cover the central aspects of the current research and advancements in cosmic topology from a topological and observational perspective. Beginning with an overview of the basic concepts of cosmology, it is observed that though a determinant of local curvature, Einstein's equations of relativity do not constrain the global properties of space-time. The topological requirements of a universal space time manifold are discussed, including requirements of space-time orientability and causality. The basic topological concepts used in classification of spaces, i.e. the concept of the Fundamental Domain and Universal covering spaces are discussed briefly. The manifold properties and symmetry groups for three dimensional manifolds of constant curvature for negative, positive and zero curvature manifolds are laid out. Multi-connectedness is explored as a possible explanation for the detected anomalies in the quadrupole and octopole regions of the power spectrum, pointing at a possible compactness along one or more directions in space. The statistical significance of the evidence, however, is also scrutinized and I discuss briefly the bayesian and frequentist interpretation of the posterior probabilities of observing the anomalies in a ΛCDM universe. Some of the major topologies that have been proposed and investigated as possible candidates of a universal manifold are the Poincare Dodecahedron and Bianchi Universes, which are studied in detail. Lastly, the methods that have been proposed for detecting a multi-connected signature are discussed. These include ingenious observational methods like the circles in the sky method, cosmic crystallography and theoretical methods which have the additional advantage of being free from measurement errors and use the posterior likelihoods of models. As of the recent Planck mission, no pressing evidence of a multi connected topology has been detected.
  • Zhao, Zhao (2023)
    This thesis aims to offer a practical solution for making cost-effective decisions regarding weather routing deployment to optimize computational costs. The study focuses on developing three collaborative model components that collectively address the challenge of rerouting decision-making. Model 1 involves training a neural network-based Ship Performance Model, which forms the foundation for the weather routing model. Model 2 is centered around constructing a time-dependent path-finding model that integrates real-time weather forecasts. This model optimizes routing within a designated experimental area, generating simulation training samples. Model 3 utilizes the outcomes of Model 2 to train a practical machine learning decision-making model. This model seeks to address the question: should the weather routing system be activated and the route be adjusted based on updated weather forecasts? The integration of these models supports informed maritime decision-making. While these methods represent a preliminary step towards optimizing weather routing deployment frequencies, they hold the potential for enhancing operational efficiency and responsible resource usage in maritime sector.
  • Berg, Jeremias (2014)
    Clustering is one of the core problems of unsupervised machine learning. In a clustering problem we are given a set of data points and asked to partition them into smaller subgroups, known as clusters, such that each point is assigned to exactly one cluster. The quality of the obtained partitioning (clustering) is then evaluated according to some objective measure dependent on the specific clustering paradigm. A traditional approach within the machine learning community to solving clustering problems has been focused on approximative, local search algorithms that in general can not provide optimality guarantees of the clusterings produced. However, recent advances in the field of constraint optimization has allowed for an alternative view on clustering, and many other data analysis problems. The alternative view is based on stating the problem at hand in some declarative language and then using generic solvers for that language in order to solve the problem optimally. This thesis contributes to this approach to clustering by providing a first study on the applicability of state-of-the-art Boolean optimization procedures to cost-optimal correlation clustering under constraints in a general similarity-based setting. The correlation clustering paradigm is geared towards classifying data based on qualitative--- as opposed to quantitative similarity information of pairs of data points. Furthermore, correlation clustering does not require the number of clusters as input. This makes it especially well suited to problem domains in which the true number of clusters is unknown. In this thesis we formulate correlation clustering within the language of propositional logic. As is often done within computational logic, we focus only on formulas in conjunctive normal form (CNF), a limitation which can be done without loss of generality. When encoded as a CNF-formula the correlation clustering problem becomes an instance of partial Maximum Satisfiability (MaxSAT), the optimization version of the Boolean satisfiability (SAT) problem. We present three different encodings of correlation clustering into CNF-formulas and provide proofs of the correctness of each encoding. We also experimentally evaluate them by applying a state-of-the-art MaxSAT solver for solving the resulting MaxSAT instances. The experiments demonstrate both the scalability of our method and the quality of the clusterings obtained. As a more theoretical result we prove that the assumption of the input graph being undirected can be done without loss of generality, this justifies our encodings being applicable to all variants of correlation clustering known to us. This thesis also addresses another clustering paradigm, namely constrained correlation clustering. In constrained correlation clustering additional constraints are used in order to restrict the acceptable solutions to the correlation clustering problem, for example according to some domain specific knowledge provided by an expert. We demonstrate how our MaxSAT-based approach to correlation clustering naturally extends to constrained correlation clustering. Furthermore we show experimentally that added user knowledge allows clustering larger datasets, decreases the running time of our approach, and steers the obtained clusterings fast towards a predefined ground-truth clustering.
  • Kuivaniemi, Esa (2024)
    Machine Learning (ML) has experienced significant growth, fuelled by the surge in big data. Organizations leverage ML techniques to take advantage of the data. So far, the focus has predominantly been on increasing the value by developing ML algorithms. Another option would be to optimize resource consumption to reach cost optimality. This thesis contributes to cost optimality by identifying and testing frameworks that enable organizations to make informed decisions on cost-effective cloud infrastructure while designing and developing ML workflows. The two frameworks we introduce to model Cost Optimality are: "Cost Optimal Query Processing in the Cloud" for data pipelines and "PALEO" for ML model training pipelines. The latter focuses on estimating the training time needed to train a Neural Net, while the first one is more generic in assessing cost-optimal cloud setup for query processing. Through the literature review, we show that it is critical to consider both the data and ML training aspects when designing a cost-optimal ML workflow. Our results indicate that the frameworks provide accurate estimates about cost-optimal hardware configuration in the cloud for ML workflow. There are deviations when we dive into the details: our chosen version of the Cost Optimal Model does not consider the impact of larger memory. Also, the frameworks do not provide accurate execution time estimates: PALEO estimates our accelerated EC2 instance to execute the training workload with half of the time it took. However, the purpose of the study was not to provide accurate execution or cost estimates, but we aimed to see if the frameworks estimate the cost-optimal cloud infrastructure setup among the five EC2 instances that we chose to execute our three different workloads.
  • Koskinen, Miikka (2018)
    Työssä tarkastellaan majoriteettikvanttorien ilmaisuvoimaa sanamallien kontekstissa. Kuten eksistenssikvanttori (∃) ja universaalikvanttori (∀), majoriteettikvanttori on looginen kvanttori. Sillä voidaan ilmaista väitteen pätevän yli puolelle tarkasteltavan mallin perusjoukon alkioista. Deskriptiivisen vaativuusteorian näkökulmasta uniformi TC⁰-piirivaativuusluokka vastaa ensimmäisen kertaluvun logiikkaa yhteenlaskulla, kertolaskulla ja majoriteettikvanttorilla varustettuna. Työssä tutkitaan TC⁰-luokan sisäistä rakennetta rajoittamalla tarkastelu loogiseen fragmenttiin, jossa käyttettävissä on vain majoriteettikvanttori ja järjestysrelaatio. Työssä osoitetaan, että sekä eksistenssi- että universaalikvanttoria voidaan simuloida majoriteettikvanttorin ja järjestysrelaation avulla. Myös yhteenlasku ja perusjoukon parillisuus ovat ilmaistavissa. Sen sijaan kertolasku ei ole ilmastavissa yksipaikkaisella majoriteettikvanttorilla. Lisäksi työssä osoitetaan, että kertolasku voidaan ilmaista kaksipaikkaisella majoriteettikvanttorilla. Tästä seuraa, että kaksipaikkainen majoriteettikvanttori on aidosti voimakkaampi kuin yksipaikkainen majoriteettikvanttori.