Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Title

Sort by: Order: Results:

  • Lempiäinen, Tuomo (2014)
    In this thesis we study the theoretical foundations of distributed computing. Distributed computing is concerned with graphs, where each node is a computing unit and runs the same algorithm. The graph serves both as a communication network and as an input for the algorithm. Each node communicates with adjacent nodes in a synchronous manner and eventually produces its own output. All the outputs together constitute a solution to a problem related to the structure of the graph. The main resource of interest is the amount of information that nodes need to exchange. Hence the running time of an algorithm is defined as the number of communication rounds; any amount of local computation is allowed. We introduce several models of distributed computing that are weaker versions of the well-established port-numbering model. In the port-numbering model, a node of degree d has d input ports and d output ports, both numbered with 1, 2, ..., d such that the port numbers are consistent. We denote by VVc the class of all graph problems that can be solved in this model. We define the following subclasses of VVc, corresponding to the weaker models: VV: Input and output port numbers are not necessarily consistent. MV: Input ports are not numbered; nodes receive a multiset of messages. SV: Input ports are not numbered; nodes receive a set of messages. VB: Output ports are not numbered; nodes broadcast the same message to all neighbours. MB: Combination of MV and VB. SB: Combination of SV and VB. This thesis presents a complete classification of the computational power of the models. We prove that the corresponding complexity classes form the following linear order: SB ⊈ MB = VB ⊈ SV = MV = VV ⊈ VVc. To prove SV = MV, we show that any algorithm receiving a multiset of messages can be simulated by an algorithm that receives only a set of messages. The simulation causes an additive overhead of 2∆ - 2 communication rounds, where ∆ is an upper bound for the maximum degree of the graph. As a new result, we prove that the simulation is optimal: it is not possible to achieve a simulation overhead smaller than 2∆ - 2. Furthermore, we construct a graph problem that can be solved in one round of communication by an algorithm receiving a multiset of messages, but requires at least ∆ rounds when solved by an algorithm receiving only a set of messages.
  • Seth, Arpita (2020)
    Traditional flat classification methods (e.g., binary, multiclass, and multi-label classification) seek to associate each example with a single class label or a set of labels without any structural dependence among them. Although, there are some problems in which classes can be divided or grouped into subclasses or superclasses respectively. Such a scenario demands the application of methods prepared to deal with hierarchical classification. An algorithm for hierarchical classification uses the information related to structure present in the class hierarchy and then improves the predictive performance . The freedom to perform a more generic classification, but with higher reliability, gives the process a greater versatility. Several studies have shown that, in solving a hierarchical classification problem, flat models are mostly overcome by hierarchical ones, regardless of the approach – local (including its derivations) or global – chosen. This thesis aims to compare the most popular hierarchical classification methods (local and global) empirically, reporting their performance – measured using hierarchical evaluation indexes. To do so, we had to adapt the global hierarchical models to conduct single path predictions, starting from the root class and moving towards a leaf class within the hierarchical structure. Further, we applied hierarchical classification on data streams by detecting concept drift. We first study data streams, various types of concept drifts, and state-of-the-art concept drift detection methods. Then we implement Global-Model Hierarchical Classification Naive Bayes (GNB) with three concept drift detectors: (i) Kolmogorov-Smirnov test, (ii) Wilcoxon test, and (iii) Drift Detection Method (DDM). A fixed-size sliding window was used to estimate the performance of GNB online. Finally, we must highlight that this thesis contributes to the task of automatic insect recognition.
  • Zhao, Chenhui (2023)
    In recent years, classical neural networks have been widely used in various applications and have achieved remarkable success. However, with the advent of quantum computing, there is a growing interest in quantum neural networks (QNNs) as a potential alternative to classical machine learning. In this thesis, we study the architectures of quantum and classical neural networks. We also investigate the performance of QNNs compared to classical neural networks from various aspects, such as vanishing gradient, trainability, expressivity. Our experiments demonstrate that QNNs have the potential to outperform classical neural networks in specific scenarios. While more powerful QNNs exhibit improved performance compared to classical neural networks, our findings also indicate that less powerful QNNs may not always yield significant improvements. This suggests that the effectiveness of QNNs in surpassing classical approaches is contingent on factors such as network architecture, optimization techniques, problem complexity.
  • Huang, Biyun (2018)
    Text classification, also known as text categorization, is a task to classify documents into predefined sets. As the prosperity of the social networks, a large volume of unstructured text is generated exponentially. Social media text, due to its limited length, extreme imbalance, high dimensionality, and multi-label characteristic, needs special processing before being fed to machine learning classifiers. There are all kinds of statistics, machine learning, and natural language processing approaches to solve the problem, of which two trends of machine learning algorithms are the state of the art. One is the large-scale linear classification which deals with large sparse data, especially for short social media text; the other is the active deep learning techniques, which takes advantage of the word order. This thesis provided an end-to-end solution to deal with large-scale, multi-label and extremely imbalanced text data, compared both the active trends and discussed the effect of balance learning. The results show that deep learning does not necessarily work well in this context. Well-designed large linear classifiers can achieve the best scores. Also, when the data is large enough, the simpler classifiers may perform better.
  • Alonso, Pedro (2015)
    The purpose of this thesis is to compare different classification methods, on the basis of the results for accuracy, precision and recall. The methods used are Logistic Regression (LR), Support Vector Machines (SVM), Neural Networks (NN), Naive Bayes(NB) and a full Bayesian network(BN). Each section describes one of the methods, including the main idea of the methods used, the explanation of each one, the intuition underpinning each method, and their application to simple data sets. The data used in this thesis comprises 3 different sets used previously when learning the Logistic Regression model and the Support vector Machines one, then applied also to the Bayes counterparts, also to the Neural Networks model. The results show that the Bayesian methods are well suited to the classification task they are as good as their counterparts, some times better. While the Support Vectors Machine and Neural Networks are still the best all around, the Bayesian approach can have comparable performance, and, makes a good approximate to the traditional method's power. The results were Logistic Regression has the lowest performance of the methods for classification, then Naive Bayes, next Bayesian networks, finally Support Vector Machines and Neural Networks are the best.
  • Hyvönen, Elvira (2013)
    This study presents some of the available methods for haplotype reconstruction and evaluates the accuracy and efficiency of three different software programs that utilize these methods. The analysis is performed on the QTLMAS XII common dataset, which is publicly available. The program LinkPHASE 5+, rule-based software, considers pedigree information (deduction and linkage) only. HiddenPHASE is a likelihood-based software, which takes into account molecular information (linkage disequilibrium). The DualPHASE software combines both of the above mentioned methods. We will see how usage of different available sources of information as well as the shape of the data affects the haplotype inference.
  • Oksanen, Miika (2018)
    In software product line engineering (SPLE), parts of developed software is made variable in order to be able to build a whole range of software products at the same time. This is widely known to have a number of potential benefits such as saving costs when the product line is large enough. However, managing variability in software introduces challenges that are not well addressed by tools used in conventional software engineering, and specialized tools are needed. Research questions: 1) What are the most important requirements for SPLE tools for a small-to-medium sized organisation aiming to experiment with SPLE? 2) How well those requirements are met in two specific SPLE tools, Pure::Variants and Clafer tools? 3) How do the studied tools compare against each other when it comes to their suitability for the chosen context (a digital board game platform)? 4) How common requirements for SPL tools can be generalized to be applicable for both graphical and text-based tools? A list of requirements is first obtained from literature and then used as a basis for an experiment where support for each requirement is tried out with both tools. Then a part of an example product line is developed with both tools and the experiences reported on. Both tools were found to support the list of requirements quite well, although there were some usability problems and not everything could be tested due to technical issues. Based on developing the example, both tools were found to have their own strengths and weaknesses probably partly resulting from one being GUI-based and one textual. ACM Computing Classification System (CCS): (1) CCS → Software and its engineering → Software creation and management → Software development techniques → Reusability → Software product lines (2) CCS → Software and its engineering → Software notations and tools → Software configuration management and version control systems
  • Zhang, Tinghan (2022)
    Air ions can play an important role in new particle formation (NPF) process and consequently influence the atmospheric aerosols, which affect climate and air quality as potential cloud condensation nuclei. However, the air ions and their role in NPF have not been comprehensively investigated yet, especially in polluted area. To explore the air ions in polluted environment, we compared the air ions at SORPES site, a suburban site in polluted eastern China, with those at SMEAR II, a well-studied boreal forest site in Finland, based on the air ion number size distribution (0.8-42 nm) measured with Neutral Cluster and Air Ion Spectrometer (NAIS) during 7 June 2019 to 31 August 2020. Air ions were size classified into three size ranges: cluster (0.8-2 nm), intermediate (2-7 nm), and large (7-20 nm). Median concentration of cluster ions at SORPES (217 cm−3) was about 6 times lower than that at SMEAR II (1268 cm−3) due to the high CS and pre-existing particle loading in polluted area, whereas the median large ion concentration at SORPES (197 cm−3) was about 3 times higher than that of SMEAR II (67 cm−3). Seasonal variations of ion concentration differed with ion sizes and ion polarity at two sites. High concentration of cluster ions was observed in the evening in the spring and autumn at SMEAR II, while the cluster ion concentration remained at a high level all day in the same seasons. The NPF events occurred more frequently at SORPES site (SMEAR II 16% ; SORPES: 39%), and the highest values of NPF frequency at both sites were in spring ((SMEAR II: spring: 43%; SORPES: spring: 56%). During the noon time on NPF event day, the concentration of intermediate ions were 8-14 times higher than same ours on non-event days, indicating that can be used as an indicator for NPF in SMEAR II and SORPES. The median formation rate of 1.5 nm at SMEAR II were higher then that at SORPES, while higher formation rate of 3 nm ions were observed at SORPES. At 3 nm, the formation rate of charged particles was only 11% and 1.6% of the total rate at SMEAR II and SORPES respectively, which supports the current view that neutral ways dominate the new particle process in continental boundary. However, higher ratio between charged and total formation rate of 3 nm particle at SMEAR II indicates ion-induced nucleation can have a bigger contribution to NPF in clear area in comparison to polluted area. Higher median GR of 3-7 nm (SMEAR II: 3.1 nm h−1; SORPES: 3.7 nm h−1) and 7-20 nm (SMEAR II: 5.5 nm h−1; SORPES: 6.9 nm h−1) ions at SORPES were found in comparison to SMEAR II, suggesting the higher availability of condensing vapors at SORPES. This study presented a comprehensive comparison of air ions in completely different environments, and highlighted the need for long-term ion measurements to improve the understanding of air ions and their role in NPF in polluted area like eastern China
  • Lehtola, Jussi (Helsingin yliopistoHelsingfors universitetUniversity of Helsinki, 2008)
    The molecular level structure of mixtures of water and alcohols is very complicated and has been under intense research in the recent past. Both experimental and computational methods have been used in the studies. One method for studying the intra- and intermolecular bindings in the mixtures is the use of the so called difference Compton profiles, which are a way to obtain information about changes in the electron wave functions. In the process of Compton scattering a photon scatters inelastically from an electron. The Compton profile that is obtained from the electron wave functions is directly proportional to the probability of photon scattering at a given energy to a given solid angle. In this work we develop a method to compute Compton profiles numerically for mixtures of liquids. In order to obtain the electronic wave functions necessary to calculate the Compton profiles we need some statistical information about atomic coordinates. Acquiring this using ab-initio molecular dynamics is beyond our computational capabilities and therefore we use classical molecular dynamics to model the movement of atoms in the mixture. We discuss the validity of the chosen method in view of the results obtained from the simulations. There are some difficulties in using classical molecular dynamics for the quantum mechanical calculations, but these can possibly be overcome by parameter tuning. According to the calculations clear differences can be seen in the Compton profiles of different mixtures. This prediction needs to be tested in experiments in order to find out whether the approximations made are valid.
  • Lindblom, Otto (2020)
    Due to its exceptional thermal properties and irradiation resistance, tungsten is the material of choice for critical plasma-facing components in many leading thermonuclear fusion projects. Owing to the natural retention of hydrogen isotopes in materials such as tungsten, the safety of a fusion device depends heavily on the inventory of radioactive tritium in its plasma-facing components. The proposed methods of tritium removal typically include thermal treatment of massive metal structures for prolonged timescales. A novel way to either shorten the treatment times or lower the required temperatures is based performing the removal under an H-2 atmosphere, effectively exchanging the trapped tritium for non-radioactive protium. In this thesis, we employ molecular dynamics simulations to study the mechanism of hydrogen isotope exchange in vacancy, dislocation and grain boundary type defects in tungsten. By comparing the results to simulations of purely diffusion-based tritium removal methods, we establish that hydrogen isotope exchange indeed facilitates faster removal of tritium for all studied defect types at temperatures of 500 K and above. The fastest removal, when normalising based on the initial occupation of the defect, is shown to occur in vacancies and the slowest in grain boundaries. Through an atom level study of the mechanism, we are able to verify that tritium removal using isotope exchange depends on keeping the defect saturated with hydrogen. This study also works to show that molecular dynamics indeed is a valid tool for studying tritium removal and isotope exchange in general. Using small system sizes and spatially-parallelised simulation tools, we have managed to model isotope exchange for timescales extending from hundreds of nanoseconds up to several microseconds.
  • Flinck, Oliver (2022)
    In this thesis, sputtering of several low- and high-index tungsten surface crystal directions are investigated. The molecular dynamics study is conducted using the primary knock-on atom method, which allows for an equal energy deposition for all surface orientations. The energy is introduced into the system on two different depths, on the surface and on a depth of 1 nm. Additionally to the sputtering yield of each surface orientation, the underlying sputtering process is investigated. Amorphous target materials are often used to compare sputtering yields of polycrystalline materials with simulations. Therefore, an amorphous surface is also investigated to compare it's sputtering yield and process with crystalline surface orientations. When the primary knock-on atom was placed on the surface all surface orientations had a cosine shaped angular distribution with little variation in the sputtering yield for most of the surface orientations. Linear collision sequences were observed to have a large impact on the sputtering yield when the energy was introduced deeper inside the material. In these linear collision sequences the recoils are traveling along the most close packed atom rows in the material. The distance from the origin of the collision cascade to the surface in the direction of the most close packed row is therefore crucial for the sputtering yield of the surface. Surface directions with high angles between this direction and the surface normal hence show a reduction in the sputtering yield. The amorphous material had a little lower sputtering yield than the crystalline materials when the primary knock-on atoms was placed on the surface whereas the difference rose into several orders of magnitude when the energy was given at 1 nm. It is impossible for linear collision sequences to propagate long distances in the amorphous material and therefore the angular distribution in both cases is cosine shaped. The amorphous material has no long range order and was therefore unable to reproduce the linear collision sequences, which are characteristic for the crystalline materials. The difference in the sputtering yield was hence up to several orders of magnitude as a result when the energy was introduced at 1 nm depth.
  • Puranen, Ilari (2018)
    We introduce a new model for contingent convertibles. The write-down, or equity conversion, and default of the contingent convertible are modeled as states of conditional Markov process. Valuation formulae for different financial contracts, like CDS and different types of contingent convertibles, are derived. The Model can be thought of as an extension to reduced form models with an additional state. For practical applications, this model could be used for new type of contingent convertible derivatives in a similar fashion than reduced form models are used for credit derivatives.
  • Chiariello, Alessandro (Helsingin yliopistoUniversity of HelsinkiHelsingfors universitet, 2006)
    The aim of this work was the assessment about the structure and use of the conceptual model of occlusion in operational weather forecasting. In the beginning a survey has been made about the conceptual model of occlusion as introduced to operational forecasters in the Finnish Meteorological Institute (FMI). In the same context an overview has been performed about the use of the conceptual model in modern operational weather forecasting, especially in connection with the widespread use of numerical forecasts. In order to evaluate the features of the occlusions in operational weather forecasting, all the occlusion processes occurring during year 2003 over Europe and Northern Atlantic area have been investigated using the conceptual model of occlusion and the methods suggested in the FMI. The investigation has yielded a classification of the occluded cyclones on the basis of the extent the conceptual model has fitted the description of the observed thermal structure. The seasonal and geographical distribution of the classes has been inspected. Some relevant cases belonging to different classes have been collected and analyzed in detail: in this deeper investigation tools and techniques, which are not routinely used in operational weather forecasting, have been adopted. Both the statistical investigation of the occluded cyclones during year 2003 and the case studies have revealed that the traditional classification of the types of the occlusion on the basis of the thermal structure doesn’t take into account the bigger variety of occlusion structures which can be observed. Moreover the conceptual model of occlusion has turned out to be often inadequate in describing well developed cyclones. A deep and constructive revision of the conceptual model of occlusion is therefore suggested in light of the result obtained in this work. The revision should take into account both the progresses which are being made in building a theoretical footing for the occlusion process and the recent tools and meteorological quantities which are nowadays available.
  • Pulkka, Robert (2022)
    In recent years, the concept of Metaverse has become a popular buzzword in the media and different communities. In 2021, the company behind Facebook rebranded itself into Meta Platforms, inc. in order to match their new vision of developing the Metaverse. The Metaverse is becoming reality as intersecting technologies, including head-mounted virtual reality displays (HMDs) and non-fungible tokens (NFTs), have been developed. Different communities, such as media, researchers, consumers and companies have different perspectives on the Metaverse and its opportunities and problems. Metaverse technology has been researched thoroughly, while little to none research has been done on gray literature, i.e. non-scientific sources, to gain insight on the ongoing hype. The conducted research analyzed 44 sources in total, ranging from news articles to videos and forum discussions. The results show that people are seeing opportunities in Metaverse entrepreneurship in the changing career landscape. However, the visions of Meta Platforms, inc. also receive a fair amount of critique in the analyzed articles and threads. The results suggest that most of the consumers are only interested in a smaller subset of features than what is being marketed. The conducted research gives insight on how different sources are seeing the Metaverse and can therefore be used as a starting point for more comprehensive gray literature studies on the Metaverse. While making innovations to the underlying technology is important, studying people’s viewpoints is a requirement for the academia to understand the phenomenon and for the industry to produce a compelling product.
  • Halonen, Roope (2016)
    The first order phase transition, the nucleation process, of a thermodynamic system is one of the basic physical phenomena and it has significant relevance on several scientific fields. Despite the importance of the nucleation process, the theoretical understanding is still imperfect. The emergence of a new phase, liquid or solid cluster, in the metastable gas phase is mainly treated with classical nucleation theory (CNT) by using known macroscopic thermodynamic properties of the studied substance, but the theory often fails in predicting the nucleation process adequately. The failure of describing the nucleation event by CNT has shifted the theoretical focus on molecular-level nucleation studies to improve the prediction and understanding of the origin of the failure. This thesis examines one of the key assumptions behind CNT, the constrained equilibrium hypothesis, by approaching it from statistical mechanics and thermodynamic point of view. The main tools in this work are computational: both Monte Carlo (MC) and molecular dynamics (MD) simulations have been used to simulate the homogeneous nucleation processes of Lennard-Jones argon. Two separate studies are presented: At first we compare the nucleation rates obtained by MC (based on thermodynamic equilibrium) and molecular dynamics simulations using the nonisothermal nucleation theory and then the constrained equilibrium hypothesis is invalidated by studying the kinetics of Lennad-Jones argon clusters from size of 4 up to 31 molecules at 50 K. In addition to the actual study, the thesis includes a systematic overview of the theoretical treatment of homogeneous nucleation from thermodynamic liquid drop model to applicable molecular-level simulation techniques.
  • Genjang, Nevil Nuvala (2012)
    The thesis is written on the catalytic activation of carbon dioxide. It includes a literature part and an experimental part. In the literature part, a review on metal (salen) complexes in relation to their electronic and geometric properties is presented. Salicylidene-aminates are included considering similarity to the salens. Also included from literature is a selective review focusing on the mechanistic aspects in the carboxylation of epoxides by metal (salen) complexes. Some applications of iron (salen) complexes as catalyst are mentioned. In the experimental part, the bis(phenoxyiminato) chlorido iron(III)complexes are synthesized, characterized and applied on carbon dioxide/epoxide coupling reactions. Characterization is done by UV-vis, infra-red, nuclear magnetic resonance and electron impact mass spectroscopy and elemental analysis for C, H, and N. Thermogravimetric analysis for the complexes, DFT calculation for the most active species (L11)2Fe(III)Cl and X-ray for (L6)2Fe(III)Cl are also presented. X-ray crystallography reveals the space group of (L6)2Fe(III)Cl to be orthorhombic, Pbcn; a = 29.0038(14) Å, b = 8.6123(8) Å, c = 10.7843(9) Å; α = β = γ = 90 o. The ML2Cl complexes are observed to have M-O and M-N bonds involving the phenolic oxygen and azomethine nitrogen. Correlation study between spin state and the Fe-N bond length indicates a high-spin state for Fe(III) nucleus. The geometry around the metal nucleus is distorted square pyramidal. Reaction conditions for catalytic activity were fine-tuned envisaging the exclusive production of cyclic carbonates. Propylene and styrene oxides show high reactivity. The ketiminato complexes show better activity over the aldiminato complexes. Optimal result is obtained in dimethyl formamide at a temperature of 145 oC and carbon dioxide pressure of 10 bars in the presence of tetrabutylphosphonium bromide as co-catalyst. A TOF of 572/h is observed for propylene oxide. Three reaction mechanisms are proposed. Comparatively the Co(III) analogues are more active, and iodide as a halogen ligand produces a more active complex than chloride. Improving nucleophilicity of Fe(III), elimination of intramolecular H-bond and improving on solubility could yield a more active complex. Iron is a cheap and environmentally benign metal. The use of iron complexes is an attractive alternative to other transition metals which are expensive and/or toxic. The complexes are robust and show high thermal stability. Surprisingly, oligomers of styrene carbonate were noticed at the reaction temperature and pressure known to favor exclusive production of cyclic carbonates. These observations suggest the complexes for a promising study and application in future research for copolymerization. Such copolymers may have useful charateristics for diverse applications.
  • Weber, Sean (2020)
    We present Active Faceted Search, a technique which allows a user to iteratively refine search results by selecting from a set of facets that is dynamically refined with each iteration. The facets presented are selected using a contextual multi armed bandit model. We first describe the computational model of a system which implements Active Faceted Search. We also create a web application to demonstrate an example of a system that can use an active faceted search component along with more traditional search elements such as a typed query and sidebar component. We perform simulations to compare the performance of the system under different parameters. Finally, we present a user experiment in which users are instructed to perform tasks in order to compare Active Faceted Search to traditional search techniques.
  • Mubarok, Mohamad Syahrul (2017)
    A Bayesian Network (BN) is a graphical model applying probability and Bayesian rule for its inference. BN consists of structure, that is a directed acyclic graph (DAG), and parameters. The structure can be obtained by learning from data. Finding an optimal BN structure is an NP-Hard problem. If an ordering is given, then the problem becomes simpler. Ordering means the order of variables (nodes) for building the structure. One of structure learning algorithms that uses variable ordering as the input is K2 algorithm. The ordering determines the quality of resulted network. In this work, we apply Cuckoo Search (CS) algorithm to find a good node ordering. Each node ordering is evaluated by K2 algorithm. Cuckoo Search is a nature-inspired metaheuristic algorithm that mimics the aggressive breeding behavior of Cuckoo birds with several simplifications. It has outperformed Genetic Algorithms and Particle Swarm Optimization algorithm in finding an optimal solution for continuous problems, e.g., functions of Michalewicz, Rosenbrock, Schwefel, Ackley, Rastrigin, and Griewank. We conducted experiments on 35 datasets to compare the performances of Cuckoo Search to GOBNILP that is a Bayesian network learning algorithm based on integer linear programming and it is well known to be used as benchmark. We compared the quality of obtained structures and the running times. In general, CS can find good networks although all the obtained networks are not the best. However, it sometimes finds only low-scoring networks, and the running times of CS are not always very fast. The results mostly show that GOBNILP is consistently faster and can find networks of better quality than CS. Based on the experiment results, we conclude that the approach is not able to guarantee obtaining an optimal Bayesian network structure. Other heuristic search algorithms are potentially better to be used for learning Bayesian network structures that we have not compared to our works, for example the ordering-search algorithm by Teyssier and Koller [41] that combines greedy local hill-climbing with random restarts, a tabu list, caching computations, and a heuristic pruning procedure.
  • Kalaja, Eero (2020)
    Nowadays the amount of data collected on individuals is massive. Making this data more available to data scientists could be tremendously beneficial in a wide range of fields. Sharing data is not a trivial matter as it may expose individuals to malicious attacks. The concept of differential privacy was first introduced in the seminal work by Cynthia Dwork (2006b). It offers solutions for tackling this problem. Applying random noise to the shared statistics protects the individuals while allowing data analysts to use the data to improve predictions. Input perturbation technique is a simple version of privatizing data, which adds noise to whole data. This thesis studies an output perturbation technique, where the calculations are done with real data, but only suffcient statistics are released. With this method smaller amount of noise is required making the analysis more accurate. Yu-Xiang Wang (2018) improves the model by introducing an adaptive AdaSSP algorithm to fix the instability issues of the previously used Sufficient Statistics Perturbation (SSP) algorithm. In this thesis we will verify the results shown by Yu-Xiang Wang (2018) and look in to the pre-processing steps more carefully. Yu-Xiang Wang has used some unusual normalization methods especially regarding the sensitivity bounds. We are able show that those had little effect on the results and the AdaSSP algorithm shows its superiority over SSP algorithm also when combined with more common data standardization methods. A small adjustment for the noise levels is suggested for the algorithm to guarantee privacy conditions set by classical Gaussian Mechanism. We will combine different pre-processing mechanisms with AdaSSP algorithm and show a comparative analysis between them. The results show that Robust private linear regression by Honkela et al. (2018) makes significant improvements in predictions with half of the data sets used for testing. The combination of AdaSSP algorithm with robust private linear regression often brings us closer to non-private solutions.
  • Rantala, Frans (2023)
    Cancer consists of heterogeneous cell populations that repeatedly undergo natural selection. These cell populations contest with each other for space and nutrients and try to generate phenotypes that maximize their ecological fitness. For achieving this, they evolve evolutionarily stable strategies. When an oncologist starts to treat cancer, another game emerges. While affected by the cellular evolution processes, modeling of this game owes to the results of the classical game theory. This thesis investigates the theoretical foundations of adaptive cancer treatment. It draws from two game theoretical approaches, evolutionary game theory and Stackelberg leader-follower game. The underlying hypothesis of adaptive regimen is that the patient's cancer burden can be administered by leveraging the resource competition between treatment-sensitive and treatment-resistant cells. The intercellular competition is mathematically modelled as an evolutionary game using the G function approach. The properties of the evolutionary stability, such as ESS, the ESS maximum principle, and convergence stability, that are relevant to tumorigenesis and intra-tumoral dynamics, are elaborated. To mitigate the patient's cancer burden, it is necessary to find an optimal modulation and frequency of treatment doses. The Stackelberg leader-follower game, adopted from the economic studies of duopoly, provides a promising framework to model the interplay between a rationally playing oncologist as a leader and the evolutionary evolving tumor as a follower. The two game types applied simultaneously to cancer therapy strategisizing can nourish each other and improve the planning of adaptive regimen. Hence, the characteristics of the Stackelberg game are mathematically studied and a preliminary dose-optimization function is presented. The applicability of the combination of the two games in the planning of cancer therapy strategies is tested with a theoretical case. The results are critically discussed from three perspectives: the biological veracity of the eco-evolutionary model, the applicability of the Stackelberg game, and the clinical relevance of the combination. The current limitations of the model are considered to invite further research on the subject.