Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Title

Sort by: Order: Results:

  • Ahlskog, Niki (2019)
    Progressiivisen web-sovelluksen (Progressive Web Application, PWA) tarkoitus on hämärtää tai jo- pa poistaa raja sovelluskaupasta ladattavan sovelluksen ja normaalin verkkosivuston välillä. PWA- sovellus on kuin mikä tahansa normaali verkkosivusto, mutta se täyttää lisäksi seuraavat mitta- puut: Sovellus skaalautuu mille tahansa laitteelle. Sovellus tarjotaan salatun yhteyden yli. Sovellus on mahdollista asentaa puhelimen kotinäytölle pikakuvakkeeksi, jolloin sovellus avautuu ilman se- laimesta tuttuja navigointityökaluja ja lisäksi sovelluksen voi myös avata ilman verkkoyhteyttä. Tässä työssä käydään läpi PWA-sovelluksen rakennustekniikoita ja määritellään milloin sovellus on PWA-sovellus. Työssä mitataan PWA-sovelluksen nopeutta Service Workerin välimuistitallen- nusominaisuuksien ollessa käytössä ja ilman. PWA-sovelluksen luomista ja käyttöönottoa tarkastel- laan olemassa olevassa yksityisessä asiakasprojektissa. Projektin tarkastelussa kiinnitetään huomio- ta PWA-sovelluksen tuomiin etuihin ja kipupisteisiin. Tuloksen arvioimiseksi otetaan Google Chromen Lighthouse -työkalua käyttäen mittaukset sovel- luksen progressiivisuudesta ja nopeudesta. Lisäksi sovellusta vasten ajetaan Puppeteer-kirjastoa hyödyntäen latausnopeuden laskeva testi useita kertoja sekä tarkastellaan PWA-sovelluksen Service Workerin välimuistin hyödyllisyyttä suorituskyvyn ja latausajan kannalta. Jotta Service Workerin välimuistin käytöstä voidaan tehdä johtopäätökset, nopeuden muutosta tarkastellaan progressii- visten ominaisuuksien ollessa käytössä ja niiden ollessa pois päältä. Lisäksi tarkastellaan Googlen tapaustutkimuksen kautta Service Workerin vaikutuksia sovelluksen nopeuteen. Testitulokset osoittavat että Service Workerin välimuistin hyödyntäminen on nopeampaa kaikissa tapauksissa. Service Workerin välimuisti on nopeampi kuin selaimen oma välimuisti. Service Worker voi myös olla pysähtynyt ja odotustilassa käyttäjän selaimessa. Silti Service Workerin aktivoimi- nen ja välimuistin käyttäminen on nopeampaa kuin selaimen välimuistista tai suoraan verkosta lataaminen.
  • Ahlskog, Niki (2019)
    Progressiivisen web-sovelluksen (Progressive Web Application, PWA) tarkoitus on hämärtää tai jopa poistaa raja sovelluskaupasta ladattavan sovelluksen ja normaalin verkkosivuston välillä. PWA-sovellus on kuin mikä tahansa normaali verkkosivusto, mutta se täyttää lisäksi seuraavat mittapuut: Sovellus skaalautuu mille tahansa laitteelle. Sovellus tarjotaan salatun yhteyden yli. Sovellus on mahdollista asentaa puhelimen kotinäytölle pikakuvakkeeksi, jolloin sovellus avautuu ilman selaimesta tuttuja navigointityökaluja ja lisäksi sovelluksen voi myös avata ilman verkkoyhteyttä. Tässä työssä käydään läpi PWA-sovelluksen rakennustekniikoita ja määritellään milloin sovelluson PWA-sovellus. Työssä mitataan PWA-sovelluksen nopeutta Service Workerin välimuistitallennusominaisuuksien ollessa käytössä ja ilman. PWA-sovelluksen luomista ja käyttöönottoa tarkastellaan olemassa olevassa yksityisessä asiakasprojektissa. Projektin tarkastelussa kiinnitetään huomiota PWA-sovelluksen tuomiin etuihin ja kipupisteisiin. Tuloksen arvioimiseksi otetaan Google Chromen Lighthouse -työkalua käyttäen mittaukset sovelluksen progressiivisuudesta ja nopeudesta. Lisäksi sovellusta vasten ajetaan Puppeteer-kirjastoa hyödyntäen latausnopeuden laskeva testi useita kertoja sekä tarkastellaan PWA-sovelluksen Service Workerin välimuistin hyödyllisyyttä suorituskyvyn ja latausajan kannalta. Jotta Service Workerin välimuistin käytöstä voidaan tehdä johtopäätökset, nopeuden muutosta tarkastellaan progressiivisten ominaisuuksien ollessa käytössä ja niiden ollessa pois päältä. Lisäksi tarkastellaan Googlen tapaustutkimuksen kautta Service Workerin vaikutuksia sovelluksen nopeuteen. Testitulokset osoittavat että Service Workerin välimuistin hyödyntäminen on nopeampaa kaikissa tapauksissa. Service Workerin välimuisti on nopeampi kuin selaimen oma välimuisti. Service Worker voi myös olla pysähtynyt ja odotustilassa käyttäjän selaimessa. Silti Service Workerin aktivoiminen ja välimuistin käyttäminen on nopeampaa kuin selaimen välimuistista tai suoraan verkosta lataaminen.
  • Hou, Jue (2019)
    Named entity recognition is a challenging task in the field of NLP. As other machine learning problems, it requires a large amount of data for training a workable model. It is still a problem for languages such as Finnish due to the lack of data in linguistic resources. In this thesis, I propose an approach to automatic annotation in Finnish with limited linguistic rules and data of resource-rich language, English, as reference. Training with BiLSTM-CRF model, the preliminary result shows that automatic annotation can produce annotated instances with high accuracy and the model can achieve good performance for Finnish. In addition to automatic annotation and NER model training, to show the actual application of my Finnish NER model, two related experiments are conducted and discussed at the end of my thesis.
  • Kämäräinen, Matti (2013)
    The 2-meter temperature output (daily mean, minimum and maximum values) of a six-member regional climate model ensemble and the corresponding observations for three stations in Finland (Helsinki, Jyväskylä and Sodankylä) are used to produce future temperature projections. Both the observed (‘delta change’ -approach) and the model scenario ('bias correction' - approach) data series are statistically corrected with several different methods. These methods make use of the statistics of temperature between the 30-year periods of observations, model control and model scenario simulations, and vary from simple (adjusting of mean) to complex (quantile mapping). Each month is processed separately. The main projection experiments are I) from 1951-1980 to 1981-2010 and II) from 1981-2010 to 2011-2040, 2041-2070 and 2069-2098. The method-dependent and to a lesser extent the model-dependent results are evaluated by means of root mean square error, mean error (mean bias), the location of quantile points, the number of daily frequency indices, analysis of variance and sensitivity tests. In near-term projections (e.g. from 1981-2010 to 2011-2040) the more conservative delta change methods slightly outperform the bias correction methods. In mid-term (projections to 2041-2070) and especially in far-term (projections to 2069-2098) predictions the bias correction approach is better in cross validation. The complicated shape of winter-time temperature distributions emphasizes the importance of correct handling of the biases compared to southern, less snowy areas. For that reason the detailed quantile mapping type bias correction approach produces the best results with predictions scoping to the end of the century.
  • Euren, Juhani (2019)
    Pro gradu -tutkielma toteutettiin Kesko Oyj:n toimeksiantona. Toimeksiannossa tutkimuskohteena oli monivuotinen ja kolmivaiheinen cMDM-projekti, jonka tavoitteena oli selkeyttää asiakastietojen keräämistä, käyttämistä sekä asiakastietojen säilömistä. cMDM-projektin ensimmäinen vaihe oli jo vuonna 2013. Vaihe 2:si toteutettiin vuosina 2015 - 2016 ja vaihe 3 2/2018 - 4/2019. Pro gradu -tutkielmassa tutkimus toteutettiin kertomushaastattelemalla vaihe 2:n ja vaihe 3:n projektin avainasiantuntijoita. Kertomushaastatteluiden analysoinneilla tunnistettiin haasteita liittyen testaamiseen, projektivaiheesta ylläpitovaiheeseen siirtymisessä sekä henkilökunnan pysyvyyteen ylläpidossa. Tunnistettujen haasteiden perusteella Pro Gradu -tutkielmassa esitettiin ratkaisuehdotukset vaihe 2:n haasteiden välttämiseksi
  • Popova, Vera (2023)
    A dynamically evolving nature of software development (SWD) necessitates an emphasis on lifelong learning (LLL) for professionals in this field. In today’s rapidly changing technology domain LLL has transitioned from an exceptional practice to an essential norm, recognized as a fundamental skill within the industry. This thesis investigates a current state, trends, challenges, and best practices of lifelong learning in SWD workplaces. Our research draws on a review of literature spanning the past two decades, complemented by semi-structured interviews conducted with software professionals of diverse experience levels, all employed in the capital region of Finland. Additionally, this thesis contributes to an existing classification of learning methods utilized by software professionals and to a lifelong learning framework grounded in established research. Our findings underscore the importance of fostering a culture of continuous learning in software development workplaces, offering recommendations for employers to consider. Nurturing a learning mindset and personalized, employee-centered approach to learning increases learner autonomy and commitment, seamlessly integrating learning as a natural part of work and career development. Such a workplace culture enhances employee well-being and employer’s innovativeness and competitiveness, establishing lifelong learning as a mutually beneficial solution for both of the parties involved.
  • Moilanen, Simo (2014)
    The purpose of this work is to suggest an approach for holistical improvement of software development endeavor performance. The insights are a theory for mapping the key performance drivers of a software development system holistically, including the human centric factors that are involved in knowledge work, and a method for applying the theory in practice via a monitoring Instrument with case studies. The findings support that the theory complemented with the Instrument does provide holistic insights on a software development endeavor, reveals performance impediments and allows performance improvement efforts to be concentrated on the key performance drivers, i.e. where waste causes most performance decline. However further research is required for fine-tuning the Instrument for optimal performance indications. Suggestions for future work include an Instrument result coverage analysis and research on framework for quicker and easier analysis for more practical usage. Nevertheless, organizations can utilize the theory and the Instrument on as-is basis for improving software development performance without further research. The originality and value of this Thesis is centered on challenging the application of traditional management with its methods in software development endeavors and suggesting a new method for gaining higher production performance.
  • Kallava, Tomi (2019)
    This thesis is done for Wärtsilä, which is a big global actor in marine and energy markets. This thesis aims to test the feasibility of machine learning models in estimating total power- and fuel consumption of vessels' main engines and thus help in recognizing the effect of different factors on the energy consumption of vessels. This, on the other hand, helps to optimize routes and machinery concepts among other things. Another goal is to compress the engine sensor data utilizing wavelet transformation. After the introduction to the topic in the second chapter, we introduce the data we are using in this study. These include vessel location data, engine sensor data and technical specifications of the vessels. In the third chapter, we go through the mathematical formulations of the used methods. Finally, we will perform the calculations with real data and analyze the results. We'll test the performance of compression methods applied to the time series data coming from sensors. After that, we'll test different regression methods for consumption estimations and see what gives the most accurate results.
  • Halonen, Pyry (2022)
    Prostate cancer is the second most common cancer among men and the risk evaluation of the cancer prior the treatment can be critical. Risk evaluation of the prostate cancer is based on multiple factors such as clinical assessment. Biomarkers are studied as they would also be beneficial in the risk evaluation. In this thesis we assess the predictive abilities of biomarkers regarding the prostate cancer relapse. The statistical method we utilize is logistic regression model. It is used to model the probability of a dichotomous outcome variable. In this case the outcome variable indicates if the cancer of the observed patient has relapsed. The four biomarkers AR, ERG, PTEN and Ki67 form the explanatory variables. They are the most studied biomarkers in prostate cancer tissue. The biomarkers are usually detected by visual assessment of the expression status or abundance of staining. Artificial intelligence image analysis is not yet in common clinical use, but it is studied as a potential diagnostic assistance. The data contains for each biomarker a visually obtained variable and a variable obtained by artificial intelligence. In the analysis we compare the predictive power of these two differently obtained sets of variables. Due to the larger number of explanatory variables, we seek the best fitting model. When we are seeking the best fitting model, we use an algorithm glmulti for the selection of the explanatory variables. The predictive power of the models is measured by the receiver operating characteristic curve and the area under the curve. The data contains two classifications of the prostate cancer whereas the cancer was visible in the magnetic resonance imaging (MRI). The classification is not exclusive since a patient could have had both, a magnetic resonance imaging visible and an invisible cancer. The data was split into three datasets: MRI visible cancers, MRI invisible cancers and the two datasets combined. By splitting the data we could further analyze if the MRI visible cancers have differences in the relapse prediction compared to the MRI invisible cancers. In the analysis we find that none of the variables from MRI invisible cancers are significant in the prostate cancer relapse prediction. In addition, all the variables regarding the biomarker AR have no predictive power. The best biomarker for predicting prostate cancer relapse is Ki67 where high staining percentage indicates greater probabilities for the prostate cancer relapse. The variables of the biomarker Ki67 were significant in multiple models whereas biomarkers ERG and PTEN had significant variables only in a few models. Artificial intelligence variables show more accurate predictions compared to the visually obtained variables, but we could not conclude that the artificial intelligence variables are purely better. We learn instead that the visual and the artificial intelligence variables complement each other in predicting the cancer relapse.
  • Puustinen, Henna (2016)
    Biodiversity conservation has several impacts to human livelihoods. Especially in the Global South rural people depend closely on the biodiversity and on diverse ecosystems and changes in the environment can have significant impacts on local livelihoods. Protected Areas have become one of the main global strategies in the aims to conserve the world's biodiversity and in securing human livelihoods. Besides nature conservation, Protected Areas are expected to create benefits to the surrounding communities. However, impacts from Protected Areas have proved out to be the opposite in many occasions. Establishment of Protected Areas has often restricted local people's access to natural resources and hence, caused changes in livelihoods. Other costs from conservation include for instance damages caused by the increased amount wildlife. The aim of this case study has been to research the impact of biodiversity conservation on local communities. The study focuses on examining the Protected Area impacts on local livelihoods in Welioya in southern Sri Lanka. The research data was collected in areas located near to the Protected Area border. The study was conducted using qualitative research practices and the methods included semi-structured interview, open conversation and observation. The target group consisted of local people and in addition, local actors were interviewed to get information related to the local forests and Protected Areas. Employing the Sustainable Livelihoods Framework and concepts of boundaries and social equity this study tries to understand the relation of conservation and local livelihoods. Biodiversity conservation and the existence of Protected Areas has both positive and negative impacts to local people and their livelihoods in Welioya. The main benefit from the Protected Areas are gained through the preservation of ecosystem services. Local livelihoods highly rely on cultivation of rice and cultivation depends on the preservation on the forest ecosystem and area's water resources. Some local people also get benefits by collecting forests products such as firewood, fruits and medicinal plants from the forests. The study reveals that some human activities are practices illegally inside the Protected Areas. The most costs from the Protected Area are related to the restricted access to cultivation land and to forest resources. In addition, there is an obvious human-elephant conflict that features the study area. Even though Protected Areas create significant benefits to local livelihoods, the results of this case study indicate that the sustainability of local livelihoods appears to be unsure. Also, the presence of people in Protected Areas in Welioya is evident although almost all human activities inside the area have been prohibited. Consequently, local people are concerned about the preservation of forests. When considering the future of local livelihoods, deforestation and planned projects can have a remarkable influence on the forests and hence, on the local livelihoods. In order to reach the conservation goals in Welioya, management of the Protected Areas should be clarified, the role of different conservation actors specified and local people should be increasingly included in conservation management.
  • Myyrä, Antti (2019)
    Distributed systems have been a topic of discussion since the 1980s, but the adoption of microservices has raised number of system components considerably. With more decentralised distributed systems, new ways to handle authentication, authorisation and accounting (AAA) are needed, as well as ways to allow components to communicate between themselves securely. New standards and technologies have been created to deal with these new requirements and many of them have already found their way to most used systems and services globally. After covering AAA and separate access control models, we continue with ways to secure communications between two connecting parties, using Transport Layer Security (TLS) and other more specialised methods such as the Google-originated Secure Production Identity Framework for Everyone (SPIFFE). We also discuss X.509 certificates for ensuring identities. Next, both older time- tested and newer distributed AAA technologies are presented. After this, we are looking into communication between distributed components with both synchronous and asynchronous communication mechanisms, as well as into the publish/subscribe communication model popular with the rise of the streaming platform. This thesis also explores possibilities in securing communications between distributed endpoints and ways to handle AAA in a distributed context. This is showcased in a new software component that handles authentication through a separate identity endpoint using the OpenID Connect authentication protocol and stores identity in a Javascript object-notation formatted and cryptographically signed JSON Web Token, allowing stateless session handling as the token can be validated by checking its signature. This enables fast and scalable session management and identity handling for any distributed system.
  • Zosa, Elaine (2017)
    Protein function prediction aims to identify the function of a given protein using, for example, sequence data, protein-protein interaction or evolutionary relationships. The use of biomedical literature to predict protein function, however, is a relatively under-studied topic given the vast amount of readily available data. This thesis explores the use of abstracts from biomedical literature to predict protein functions using the terms specified in the Gene Ontology (GO). The Gene Ontology (GO) is a standardised method of cataloguing protein functions where the functions are organised in a directed acyclic graph (DAG). The GO is composed of three separate ontologies: cellular component (CC), molecular function (MF) and biological process (BP). Hierarchical classification is a classification method that assigns an instance to one or more classes where the classes are hierarchically related to each other, as in the GO. We build a hierarchical classifier that assigns GO terms to abstracts by training individual binary Naïve Bayes classifiers to recognise each GO term. We present three different methods of mining abstracts from PubMed. Using these methods we assembled four datasets to train our classifiers. Each classifier is tested in three different ways: (a) in the paper-centric approach, we assign GO terms to a single abstract, (b) in the protein-centric approach, we assign GO terms to a concatenation of abstracts relating to single protein; and (c) the term-centric approach is a complement of the protein-centric approach where the goal is to assign proteins to a GO term. We evaluate the performance of our method using two evaluation metrics: maximum F-measure (F-max) and minimum semantic distance (S-min). Our results show that the best dataset for training our classifier depends on the evaluation metric, the ontology and the proteins being annotated. We also find that there is a negative correlation between the F-max score of a GO term and its information content (IC) and a positive correlation between the F-max and the term's centrality in the DAG. Lastly we compare our method with GOstruct, the state-of-the-art literature-based protein annotation program. Our method outperforms GOstruct on human proteins, showing a significant improvement for the MF ontology.
  • Hakoniemi, Tuomas (2016)
    In this master's thesis we study a lower part of the so called Leibniz hierarchy in abstract algebraic logic. Abstract algebraic logic concerns itself with taxonomic study of logics. The main classification of logics is the one into the Leibniz hierarchy according to the properties that the Leibniz operator has on the lattice of theories of a given logic. The Leibniz operator is a function that maps a theory of a logic to an indiscernability relation modulo the theory. We study here two of the main classes in the Leibniz hierarchy — protoalgebraic and equivalential logics — and some of their subclasses. We state and prove the most important characterizations for these classes. We also provide new characterizations for the class of finitely protoalgebraic logics — a class that has previously enjoyed only limited attention. We recall first some basic facts from universal algebra and lattice theory that are used in the remainder of the thesis. Then we present the abstract definition of logic we work with and give abstract semantics for any logic via logical matrices. We define logics determined by a class of matrices and show how to uniformly associate with a given logic a class of matrices so that the original logic and the logic determined by the class coincide — thus providing an abstract completeness theorem for any logic. Remainder of the thesis is dedicated to the study of protoalgebraic and equivalential logics. We provide three main families of characterizations for the various classes of logics. The first characterizations are completely syntactic via the existence of sets of formuli satisfying certain properties. The second family of characterizations is via the properties that the Leibniz operator has on the lattice of theories of a given logic. The third and final family of characterizations is via the closure properties of a canonical class of matrices — the class of reduced models — that we associate to any logic.
  • Bharthuar, Shudhashil (2019)
    The High-Luminosity phase of the Large Hadron Collider (HL-LHC), that is expected to become operational in 2026, aims at increasing the luminosity of the LHC up to ten times higher than its current nominal value. This in turn calls for improving the radiation hardness of the CMS tracker detectors that will be subjected to significantly greater levels of radiation. The following thesis aims at examining the electrical properties of metal–oxide semiconductor (MOS) capacitors and silicon sensors of different structures with a design developed for CMS Beam Luminosity Telescope (BLT). These were fabricated on three different wafers and the atomic layer deposition (ALD) of alumina on the p-type silicon substrate was done by using either ozone (O3), water (H2O), or water and ozone (H2O+O3) pulsed directly one after the other, as oxygen precursor. The same study is made on Radiation Monitoring (RADMON)-type sensors with n-type silicon substrate and Titanium Nitride (TiN) based bias resistors generated on substrate by deposition of a thin TiN layer by radio-frequency sputtering with different sputtering parameters. Electrical properties of these sensors are derived by measuring their capacitance–voltage and current–voltage characteristics. The results demonstrate that BLT diodes from the three different wafers, having the same thickness, give the same value for full depletion voltage. However, structures with larger metalization area have larger surface currents. RADMON and standard BLT diodes of the same wafer do not show any significant difference in full depletion voltage. However, leakage current for the p-type sensor is higher in comparison to that of the n-type sensor. Results also show that MOS capacitor samples from the H2O+O3 wafer are more sensitive to radiation compared to those from the H2O and O3 wafers. Further, TiN based bias resistor samples produced with shorter sputtering time and lower Argon gas flow rate and a high Nitrogen gas flow rate have a higher resistance.
  • Stark, Piritta (2022)
    In this study, single-grain rutile techniques (rutile U-Pb geochronology, Zr-in rutile thermometry, rutile Nb-Cr systematics and trace element composition) are applied on Red Clay samples from Lingtai, central-southern Chinese Loess Plateau (CLP), to reveal more detailed information on provenance of the Red Clay and wind regimes responsible for transporting the particles from source to sink between 7 Ma and 2.6 Ma. The new rutile data are combined with previous zircon U-Pb data from Lingtai and nearby Chaona to strengthen the interpretations with multi-proxy approach. The results suggest that from 7.05 Ma to 6.23 Ma the westerlies and the East Asian Winter Monsoon (EAWM) were relatively equally responsible for the sediment transportation to the CLP. At 5.55 Ma, the Red Clay was mostly derived from the westerly sources. At 3.88 Ma, contribution from northeastern Tibetan Plateau was most dominant suggesting enhanced East Asian Summer Monsoon (EASM) and surficial drainage from the source regions. At 3.20 Ma, the Red Clay was mainly sourced from proximal areas and fluctuation between EAWM and EASM had begun. This study demonstrates that single-grain rutile techniques have strong potential to aid a more precise distinction between individual primary and secondary sources for aeolian dust in the CLP region, especially when combined with zircon geochronology or other single-grain techniques. However, at present the applicability of rutile in provenance studies is hindered by scarcity of rutile data from the potential primary as well as secondary source regions, and lack of truly homogenous rutile standards for the analysis.
  • Kivisaari, Tero (2015)
    Changing consumer behavior drives the demand for convenient and easy-to-use mobile applications across industries. This also impacts the financial sector. Banks are eager to offer their services as mobile applications to match the modern consumer needs. The mobile applications are not independently able to provide the required functionality; they interact with the existing core business functions by consuming secure Web Services over the Internet. The thesis analyses the problem of how a bank can enable a new secure distribution and communication channel via the mobile applications. This new channel must be able to interact with existing core systems. The problem is investigated from different axis related to Web Services protocols suitable for mobile use, security solutions for the communication protocols and the required support available in the selected mobile operating systems. The result of the analysis is an architectural description to fulfil the presented requirements. In addition to constructing the architecture, the thesis also describes some of the more advanced threats targeted against mobile apps and Web Services and provides mitigation schemes for the threats. The selected architecture contains a modular security solution that can be utilized outside of the financial context as well. ACM Computing Classification System (CCS 2012): - Information systems → Web Services - Security and privacy → Software and application security - Software and its engineering → Software architectures
  • Boström, Jani (Helsingin yliopistoUniversity of HelsinkiHelsingfors universitet, 2000)
  • Van den Broek, Daan (2024)
    Supercells are thunderstorms characterized by a persistently rotating updraft, which is separated from, and in quasi-steady state with the downdraft. The structure of a supercell allows for long-lived thunderstorms, capable of producing severe weather such as significant hail (hail with a diameter >5 cm) and tornadoes. Despite their relative rarity, supercells are responsible for a disproportionate share of thunderstorm related hazards and damage. Although uncommon at high latitudes, supercells do occur in Finland, where documented cases have led to severe weather events and substantial damage. The goal of this study is to improve our understanding on the meteorological environment in which supercells in Finland occur. Specifically, we aim to discriminate between the meteorological environment of supercell thunderstorms and ordinary thunderstorms in Finland. This is done by examining how kinematic and thermodynamical parameters from proximity soundings between both groups differ. Additionally, we inspect the difference in meteorological environments of significant hail-producing supercells (acronym HAIL) and tornado-producing supercells (acronym TOR) environments. The results indicate that bulk wind shear in various levels, as well as effective bulk wind shear (the bulk wind shear over the unstable layer), are strong discriminators between supercell and ordinary thunderstorm environments in Finland. Composite parameters such as the Energy Helicity Index (EHI) and Supercell Composite Parameter (SCP) also show some utility in distinguishing supercell and ordinary thunderstorm environments. Equilibrium Level (EL) and low-level Convective Available Potential Energy (CAPE) stand out as significant discriminators between significant hail-producing and tornado-producing supercell environments, while Lifting Condensation Level (LCL) and low-level humidity appear to show critical threshold values that may help distinguishing significant hail-producing supercell and tornado-producing supercell environments. Interestingly, the ratio of low-level CAPE to CAPE discriminates very strongly between significant hail-producing supercell and tornado-producing supercell environments. Composite parameters and Storm Relative Helicity (SRH) exhibit very limited utility in differentiating between significant hail and tornado-producing supercell environments in Finland.
  • Tyree, Juniper (2023)
    Response Surface Models (RSM) are cheap, reduced complexity, and, usually, statistical models that are fit to the response of more complex models to approximate their outputs with higher computational efficiency. In atmospheric science, there has been a continuous push to reduce the amount of training data required to fit an RSM. With this reduction in costly data gathering, RSMs can be used more ad hoc and quickly adapted to new applications. However, with the decrease in diverse training data, the risk increases that the RSM is eventually used on inputs on which it cannot make a prediction. If there is no indication from the model that its outputs can no longer be trusted, trust in an entire RSM decreases. We present a framework for building prudent RSMs that always output predictions with confidence and uncertainty estimates. We show how confidence and uncertainty can be propagated through downstream analysis such that even predictions on inputs outside the training domain or in areas of high variance can be integrated. Specifically, we introduce the Icarus RSM architecture, which combines an out-of-distribution detector, a prediction model, and an uncertainty quantifier. Icarus-produced predictions and their uncertainties are conditioned on the confidence that the inputs come from the same distribution that the RSM was trained on. We put particular focus on exploring out-of-distribution detection, for which we conduct a broad literature review, design an intuitive evaluation procedure with three easily-visualisable toy examples, and suggest two methodological improvements. We also explore and evaluate popular prediction models and uncertainty quantifiers. We use the one-dimensional atmospheric chemistry transport model SOSAA as an example of a complex model for this thesis. We produce a dataset of model inputs and outputs from simulations of the atmospheric conditions along air parcel trajectories that arrived at the SMEAR II measurement station in Hyytiälä, Finland, in May 2018. We evaluate several prediction models and uncertainty quantification methods on this dataset and construct a proof-of-concept SOSAA RSM using the Icarus RSM architecture. The SOSAA RSM is built on pairwise-difference regression using random forests and an auto-associative out-of-distribution detector with a confidence scorer, which is trained with both the original training inputs and new synthetic out-of-distribution samples. We also design a graphical user interface to configure the SOSAA model and trial the SOSAA RSM. We provide recommendations for out-of-distribution detection, prediction models, and uncertainty quantification based on our exploration of these three systems. We also stress-test the proof-of-concept SOSAA RSM implementation to reveal its limitations for predicting model perturbation outputs and show directions for valuable future research. Finally, our experiments affirm the importance of reporting predictions alongside well-calibrated confidence scores and uncertainty levels so that the predictions can be used with confidence and certainty in scientific research applications.
  • Smirnov, Pavel (2022)
    There are many computationally difficult problems where the task is to find a solution with the lowest cost possible that fulfills a given set of constraints. Such problems are often NP-hard and are encountered in a variety of real-world problem domains, including planning and scheduling. NP-hard problems are often solved using a declarative approach by encoding the problem into a declarative constraint language and solving the encoding using a generic algorithm for that language. In this thesis we focus on pseudo-Boolean optimization (PBO), a special class of integer programs (IP) that only contain variables that admit the values 0 and 1. We propose a novel approach to PBO that is based on the implicit hitting set (IHS) paradigm, which uses two separate components. An IP solver is used to find an optimal solution under an incomplete set of constraints. A pseudo-Boolean satisfiability solver is used to either validate the feasibility of the solution or to extract more constraints to the integer program. The IHS-based PBO algorithm iteratively invokes the two algorithms until an optimal solution to a given PBO instance is found. In this thesis we lay out the IHS-based PBO solving approach in detail. We implement the algorithm as the PBO-IHS solver by making use of recent advances in reasoning techniques for pseudo-Boolean constraints. Through extensive empirical evaluation we show that our PBO-IHS solver outperforms other available specialized PBO solvers and has complementary performance compared to classical integer programming techniques.