Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Title

Sort by: Order: Results:

  • Hippeläinen, Sampo (2022)
    One of the problems with the modern widespread use of cloud services pertains to geographical location. Modern services often employ location-dependent content, in some cases even data that should not end up outside a certain geographical region. A cloud service provider may however have reasons to move services to other locations. An application running in a cloud environment should have a way to verify the location of both it and its data. This thesis describes a new solution to this problem by employing a permanently deployed hardware device which provides geolocation data to other computers in the same local network. A protocol suite for applications to check their geolocation is developed using the methodology of design science research. The protocol suite thus created uses many tried-and-true cryptographic protocols. A secure connection is established between an application server and the geolocation device, during which the authenticity of the device is verified. The location of data is ensured by checking that a storage server indeed has access to the data. Geographical proximity is checked by measuring round-trip times and setting limits for them. The new solution, with the protocol suite and hardware, is shown to solve the problem and fulfill strict requirements. It improves on the results presented in earlier work. A prototype is implemented, showing that the protocol suite can be feasible both in theory and practice. Details will however require further research.
  • Lehto, Susanna (2015)
    Dagumin jakauma on jatkuva todennäköisyysjakauma, joka on saanut nimensä Camilo Dagumin mukaan tämän esitellessä jakaumaa 1970-luvulla. Dagumin jakauman kehittäminen sai alkusysäyksen, kun Camilo Dagum ei ollut tyytyväinen jo olemassa oleviin todennäköisyysjakaumiin ja alkoi kehitellä vaatimuksiaan vastaavaa mallia. Tämän kehitystyön tuloksena syntyi kolme jakaumaa, joita kutsutaan Dagumin jakauman tyypeiksi I—III. Tyyppi I on kolme parametria sisältävä jakauma, kun taas tyypit II ja III ovat keskenään hyvin samankaltaisia, neljä parametria sisältäviä jakaumia. Dagumin jakauma tyypistä riippumatta kehitettiin kuvaamaan henkilökohtaisia tuloja, ja tämän vuoksi jakauma yhdistetään usein taloustieteen tulonjako-oppiin. Lisäksi Dagumin jakauman kolme tyyppiä voidaan luokitella tilastollisiksi kokojakaumiksi, joita usein hyödynnetään etenkin taloustieteessä ja vakuutusmatematiikassa. Luku 1 koostuu johdannosta, jossa esitellään pro gradu -tutkielman rakenne pääpiirteissään sekä valotetaan syitä, miksi juuri Dagumin jakauma valikoitui tutkielman aiheeksi. Luvussa 2 esitellään lyhyesti jatkuvien todennäköisyysjakaumien yleistä teoriaa siltä osin kuin sen tunteminen on vähintäänkin tarpeellista. Tässä yhteydessä esitellään myös tärkeitä merkintöjä erityisesti luvun 3 ymmärtämiseksi. Luku 3 alkaa Dagumin jakauman kehittäjän, Camilo Dagumin, henkilöhistorialla. Tästä päästään sujuvasti syihin, jotka motivoivat Dagumia entistä paremman mallin etsimiseen ja johtivat lopulta kokonaan uuden jakauman tai jakaumaperheen syntymiseen. Aivan tuulesta Dagumin jakaumaa ei kuitenkaan ole temmattu, vaan pohjalla on Dagumin laaja-alainen asiantuntemus ja useiden eri jakaumien ja mallien tutkiminen ja testaaminen. Vaikka Dagumin jakauma tyyppeineen on aivan oma jakaumansa, sillä on myös läheisiä yhteyksiä muihin jakaumiin ja näiden yhteyksien vuoksi siitä käytetään usein myös nimeä Burr III -jakauma. Luvussa 3 valotetaan lisäksi Dagumin jakauman perusominaisuuksia, joiden esittelyn myötä katse suunnataan jakauman käyttökelpoisuuteen sovelluksissa: jakauma osoittautuu hyödylliseksi tulonjaon tasa-arvoisuuden mittaamisessa, jossa myös estimoinnilla ja päätelmien tekemisellä on tärkeä rooli. Luvun lopussa käsitellään lyhyesti ja ytimekkäästi Dagumin jakauman käyttämistä tietokoneohjelmien avulla. Vaikka luvussa 3 viitataan monessa kohtaa Dagumin jakauman sovelluksiin, vasta luvussa 4 jakauman soveltaminen käytäntöön otetaan lähempään tarkasteluun. Viimeisessä luvussa kootaan päällimmäisiä ajatuksia ja mietteitä Dagumin jakaumasta sekä haasteista tutustua siihen: yhdessä pro gradussa pystytään vasta raapaisemaan pintaa, joten työsarkaa riittäisi muillekin jakaumasta kiinnostuneille.
  • Fred, Hilla (2022)
    Improving the monitoring of health and well-being of dairy cows through the use of computer vision based systems is a topic of ongoing research. A reliable and low-cost method for identifying cow individuals would enable automatic detection of stress, sickness or injury, and the daily observation of the animals would be made easier. Neural networks have been used successfully in the identification of cow individuals, but methods are needed that do not require incessant annotation work to generate training datasets when there are changes within a group. Methods for person re-identification and tracking have been researched extensively, with the aim of generalizing beyond the training set. These methods have been found suitable also for re-identifying and tracking previously unseen dairy cows in video frames. In this thesis, a metric-learning based re-identification model pre-trained on an existing cow dataset is compared to a similar model that has been trained on new video data recorded at Luke Maaninka research farm in Spring 2021, which contains 24 individually labelled cow individuals. The models are evaluated in tracking context as appearance descriptors in Kalman filter based tracking algorithm. The test data is video footage from a separate enclosure in Maaninka and a group of 24 previously unseen cow individuals. In addition, a simple procedure is proposed for the automatic labeling of cow identities in images based on RFID data collected from cow ear tags and feeding stations, and the known feeding station locations.
  • Sassi, Sebastian (2019)
    When the standard model gauge group SU(3) × SU(2) × U(1) is extended with an extra U(1) symmetry, the resulting Abelian U(1) × U(1) symmetry introduces a new kinetic mixing term into the Lagrangian. Such double U(1) symmetries appear in various extensions of the standard model and have therefore long been of interest in theoretical physics. Recently this kinetic mixing has received attention as a model for dark matter. In this thesis, a systematic review of kinetic mixing and its physical implications is given, some of the dark matter candidates relying on kinetic mixing are considered, and experimental bounds for kinetic mixing dark matter are discussed. In particular, the process of diagonalizing the kinetic and mass terms of the Lagrangian with a suitable basis choice is discussed. A rotational ambiquity arises in the basis choice when both U(1) fields are massless, and it is shown how this can be addressed. BBN bounds for a model with a fermion in the dark sector are also given based on the most recent value of the effective number of neutrino species, and it is found that a significant portion of the FIMP regime is excluded by this constraint.
  • Hakkarainen, Janne (Helsingin yliopistoHelsingfors universitetUniversity of Helsinki, 2009)
    Data-assimilaatio on tekniikka, jossa havaintoja yhdistetään dynaamisiin numeerisiin malleihin tarkoituksena tuottaa optimaalista esitystä esimerkiksi ilmankehän muuttuvasta tilasta. Data-assimilaatiota käytetään muun muassa operaativisessa sään ennustamisessa. Tässä työssä esitellään eri data-assimilaatiomenetelmiä, jotka jakautuvat pääpiirteittäin Kalmanin suotimiin ja variaatioanaalisiin menetelmiin. Lisäksi esitellään erilaisia data-assimilaatiossa tarvittavia apuvälineitä kuten optimointimenetelmiä. Eri data-assimilaatiomenetelmien toimintaa havainnollistetaan esimerkkien avulla. Tässä työssä data-assimilaatiota sovelletaan muun muassa Lorenz95-malliin. Käytännön data-assimilaatio-ongelmana on GOMOS-instrumentista saatavan otsonin assimiloiminen käyttäen hyväksi ROSE-kemiakuljetusmallia.
  • Iipponen, Juho (2017)
    Ei ole olemassa mallia, joka pystyisi täydellisesti kuvaamaan monimutkaisen ja kaoottisen ilmakehän käyttäytymistä. Siksi mallien ennusteita on korjattava lähemmäksi ilmakehän todellista tilaa havaintojen avulla. Tässä työssä puoliempiirisen yläilmakehämallin kuvausta termosfäärin kokonaismassatiheydestä yritetään tarkentaa data-assimilaation keinoin. Mallitilan korjaamiseksi käytetään ensemble Kalman -suodinta, joka on osoittautunut hyödylliseksi työkaluksi alailmakehän data-assimilaatiojärjestelmissä. Troposfääristä poiketen termosfäärin tilan ennustamisen epävarmuus liittyy kuitenkin pitkälti epävarmuuteen termosfäärin tilaa ajavista pakotteissa. Ionosfäärin ja Auringon UV-säteilyn äkkinäiset ja vaikeasti ennustettavat muutokset voivat nopeasti muuttaa yläilmakehän tilaa tavalla, jonka aikakehitys on suurelta osin riippumaton termosfäärijärjestelmän alkutilasta. Näin ollen ei ole lainkaan selvää, että data-assimilaatio tarkentaisi mallien analyysiä tai ennustetta. Tämän työn tavoitteena on tutkia, tarkentaako havaintojen avulla korjattu malli keski- ja ylätermosfäärin massatiheydestä tehtävää analyysiä verrattuna malliin, jonka tilaa muuttavat ainoastaan yläilmakehäjärjestelmään kohdistetut pakotteet. Lisäksi tarkastellaan, onko tehdyllä analyysillä ennustearvoa seuraavan kolmen päivän aikana tehtäviin massatiheysmittauksiin nähden. Tutkimusjaksona käytetään vuotta 2003, jolloin pakotteet olivat voimakkaita ja niiden muutokset nopeita. Havaintoaineisto on tuotettu algoritmilla, joka laskee yläilmakehän tiheyden matalan maan kiertoradan satelliittien radoissa havaittujen muutosten avulla. Vaikka aineiston ajallinen erotuskyky on melko huono suhteessa pakotteiden ajamien muutosten nopeuteen, osoittautuu, että sen avulla voidaan tarkentaa yläilmakehämallin analyysiä. Sen sijaan käy ilmi, ettei näin korjatun mallitilan avulla kyetä ennustamaan järjestelmän tilan kehitystä, vaikka termosfääriä ajavien pakotteiden aikakehitys olisi tarkkaan tiedossa. Tämän arvellaan johtuvan siitä, että analyysin avulla tuotettu korjaus on voimakkaasti riippuva pakotteen muutoksista sen ajanjakson aikana, jolta havaintoja analyysiä varten kerätään. Näin ollen korjaus ei ole enää paras mahdollinen seuraavien päivien aikana, jolloin ilmakehän tila on pakotteiden seurauksena muuttunut. Ensemble Kalman -suotimen analyysiin tuoma tarkennus, vaikkakin tilastollistesti merkitsevä, ei ole kovin suuri. Pakotteisiin ja havaintoaineistoon liittyvien epävarmuuksien lisäksi on mahdollista, että suotimen suorituskykyä heikentävät mallin ennakkokentässä esiintyvät harhaanjohtavat korrelaatiot, tai työssä käytetty hyvin yksinkertainen kovarianssin inflaatiomenetelmä.
  • Laine, Maisa (2019)
    Data assimilaatio on estimointi menetelmä, jolla voidaan yhdistää informaatiota useista eri lähteistä. Data assimilaatio menetelmien hyödyllisyys näkyy erityisesti kun yhdistetään epäsuoria havaintoja mallin tilaan. Tässä tutkielmassa keskitytään sekventiaalisiin data assimilaatio menetelmiin, jotka pohjautuvat Kalman filter -menetelmään. Kalman filter -menetelmä johdetaan Bayesin kaavasta ja sen pohjalta esitellään ensemble-menetelmiä, jotka usein ovat laskennallisesti kevyempiä approksimaatiota Kalman filter -menetelmästä. Tutkielmassa sovelletaan Ensemble Adjustment Kalman filter -menetelmään orgaanisen maahiilen hajoamista kuvaavaan Yasso-malliin. Yasson avulla mallinnetaan pitkäaikaista maahiiltä kuudelta eri pellolta. Ennusteita parannetaan data assimilaation avulla yhdistämällä ennusteeseen mittauksista saatu informaatio.
  • Bui, Minh (2021)
    Background. In API requests to a confidential data system, there always are sets of rules that the users must follow to retrieve desired data within their granted permission. These rules are made to assure the security of the system and limit all possible violations. Objective. The thesis is about detecting the violations of these rules in such systems. For any violation found, the request is considered as containing inconsistency and it must be fixed before retrieving any data. This thesis also looks for all diagnoses of inconsistencies requests. These diagnoses promote reconstructing the requests to remove any inconsistency. Method. In this thesis, we choose the design science research methodology to work on solutions. In this methodology, the current problem in distributing data from a smart building plays as the main motivation. Then, system design and development are implemented to prove the found solutions of practicality, while a testing system is built to confirm its validity. Results. The inconsistencies detection is considered as a diagnostic problem, and many algorithms have been found to resolve the diagnostic problem for decades. The algorithms are developed based on DAG algorithms and preserved to apply on different purposes. This thesis is based on these algorithms and constraint programming techniques to resolve the facing issues of the given confidential data system. Conclusions. A combination of constraint programming techniques and DAG algorithms for diagnostic problems can be used to resolve inconsistencies detection in API requests. Despite the need on performance improvement in application of these algorithms, the combination works effectively, and can resolve the research problem.
  • Hinkka, Atte (2018)
    In this thesis we use statistical n-gram language models and the perplexity measure for language typology tasks. We interpret the perplexity of a language model as a distance measure when the model is applied on a phonetic transcript of a language the model wasn't originally trained on. We use these distance measures for detecting language families, detecting closely related languages, and for language family tree reproduction. We also study the sample sizes required to train the language models and make estimations on how large corpora are needed for the successful use of these methods. We find that trigram language models trained from automatically transcribed phonetic transcripts and the perplexity measure can be used for both detecting language families and for detecting closely related languages.
  • Ray, Debarshi (2012)
    Pervasive longitudinal studies in people's intimate surroundings involve gathering data about how people behave in their various places of presence. It is hard to be fully pervasive as it has traditionally required sophisticated instrumentation that may be difficult to acquire and prohibitively expensive. Moreover, setting up such an experiment is laborious. We present a system, in the form of its requirements, design and implementation, that is primarily aimed at collecting data from people's homes. It aims to be as pervasive as possible, and can collect data about a family in the form of audio and video feed from microphones and cameras, network logs and home appliance (eg., TV) usage patterns. The data is then transported over the Internet to a server placed in the close proximity of the researcher, while protecting it from unauthorised access. Instead of instrumenting the test subjects' existing devices, we build our own integrated appliance which is to be placed inside their houses, and has all the necessary features for data collection and transportation. We build the system using cheap off-the-shelf commodity hardware and free and open source software, and evaluate different hardware and software configurations to see how well they can be integrated and how performant or reliable they are in real life scenarios. Finally, we demonstrate a few simple techniques that can be used to analyze the data to gain some insights into the behaviour of the participants.
  • Tulilaulu, Aurora (2017)
    Pro gradu -tutkielmassani esittelen datan perusteella ohjattavaa automaattista säveltämistä eli datamusikalisaatiota. Datamusikalisaatiossa on kyse datasta löytyvien muuttujien kuulumisesta automaattisesti sävelletyssä musiikissa. Tarkoitus olisi, että musiikki toimisi korville tarkoitetun visualisaation tavoin havainnollistamaan valittuja attribuutteja datasta. Erittelen tutkielmassa erilaisia tapoja, miten sonifikaatiota ja automaattista tai koneavustettua säveltämistä on tehty aikaisemmin sekä millaisia sovelluksia niillä on. Käyn läpi yleisimmät käytetyt tavat generoida musiikkia, kuten tyypillisimmät stokastiset menetelmät, kieliopit ja koneoppimiseen perustuvat menetelmät. Kerron myös lyhyesti sonifikaatiosta eli datan suorasta kuvaamisesta äänisignaalina ilman musiikillista elementtiä. Kommentoin erilaisten menetelmien vahvuuksia ja heikkouksia. Käsittelen lyhyesti myös sitä, mihin asti automatisoidussa säveltämisessä ja sen uskottavuudessa ihmisarvioijien silmissä on pisimmillään päästy. Käytän esimerkkinä muutamia erilaisia tunnustusta saaneita säveltäviä ohjelmia. Käsittelen kahta erilaista tekemääni musikalisaatio-ohjelmaa. Ensimmäinen generoi kappaleita tiivistäen käyttäjän yhdestä nukutusta yöstä kerätyn datan neljästä kahdeksaan minuuttia kestävään kappaleeseen. Toinen tekee musiikkia reaaliaikaisesti ja muutettavien parametrien pohjalta, jolloin sen pystyy kytkemään toiseen ohjelmaan, joka analysoi dataa ja muuttaa parametreja. Käsitellyssä esimerkissä musiikki tuotetaan keskustelulokin pohjalta ja esimerkiksi keskustelun sävy ja nopeus vaikuttavat musiikkiin. Käyn läpi tekemieni ohjelmien periaatteet musiikin generoimiselle. Käsittelen myös tehtyjen päätösten syitä käyttäen musiikin teorian ja säveltämisen perusteita. Selitän, millaisilla periaatteilla käytetty data kuuluu tai voidaan saada kuulumaan musiikissa, eli miten musikalisaatio eroaa tavallisesta konesäveltämisestä ja sonifikaatiosta, sekä miten se asettuu näiden kahden jo olemassa olevan tutkimuskentän rajoille. Lopuksi esittelen myös käyttäjäkokeiden tulokset, joissa käyttäjiä on pyydetty arvioimaan keskustelulokeista tehdyn musikalisaation toimivuutta, ja pohdin saatujen tulosten ja alan nykytilan pohjalta musikalisaation mahdollisia sovelluskohteita ja mahdollista tulevaa tutkimusta, jota aiheesta voisi tehdä.
  • Jurinec, Fran (2023)
    This thesis explores the applicability of open-source tools on addressing the challenges of data-driven fusion research. The issue is explored through a survey of the fusion data ecosystem and exploration of possible data architectures, which were used to derive the goals and requirements of a proof-of-concept data platform. This platform, developed using open-source software, namely InvenioRDM and Apache Airflow, enabled transforming existing machine learning (ML) workloads into reusable data-generating workflows, and the cataloging of resulting clean ML datasets. Through a survey of the fusion data ecosystem, a set of challenges and goals was established for the development of a fusion data platform. It was identified that many of the challenges for data-driven research stem from a heterogeneous and geographically scattered source data layer combined with a monolithic approach to ML research. These challenges could be alleviated through improved ML infrastructure, for which two approaches were identified: a query-based approach, which offers more data retrieval flexibility but requires improvements in querying functionality and source data access speeds, and a persisted dataset approach, which uses a centralized workflow to collect and clean data, but requires additional storage resources. Additionally, by cataloging metadata in a central location it would be possible to combine data discovery across heterogeneous sources, combining the benefits of various infrastructure developments. Building on these identified goals and the metadata-driven platform architecture, a proof-of-concept data platform was implemented and examined through a case study. This implementation used InvenioRDM as a metadata catalog to index and provide a dashboard for discovering ML-ready datasets, and Apache Airflow as a workflow orchestration platform to manage the data collection workflows. The case study, grounded in real-world fusion ML research, showcased the platform's ability to convert existing ML workloads into reusable data-generating workflows and to publish clean ML datasets without introducing significant complexity into the research workflows.
  • Ahonen, Heikki (2020)
    The research group dLearn.Helsinki has created a software for defining the work life competence skills of a person, working as a part of a group. The software is a research tool for developing the mentioned skills of users, and users can be of any age, from school children to employees in a company. As the users can be of different age groups, the data privacy of different groups has to be taken into consideration from different aspects. Children are more vulnerable than adults, and may not understand all the risks imposed to-wards them. Thus in the European Union the General Data Protection Regulation (GDPR)determines the privacy and data of children are more protected, and this has to be taken into account when designing software which uses said data. For dLearn.Helsinki this caused changes not only in the data handling of children, but also other users. To tackle this problem, existing and future use cases needed to be planned and possibly implemented. Another solution was to implement different versions of the software, where the organizations would be separate. One option would be determining organizational differences in the existing SaaS solution. The other option would be creating on-premise versions, where organizations would be locked in accordance to the customer type. This thesis introduces said use cases, as well as installation options for both SaaS and on-premise. With these, broader views of data privacy and the different approaches are investigated, and it can be concluded that no matter the approach, the data privacy of children will always prove a challenge.
  • Koskimaa, Kuutti (2020)
    AA Sakatti Mining Oy is researching the possibility of conducting mining operations in Sakatti ore deposit, located partially under the protected Viiankiaapa mire. In order to understand the waters in mining development site, the interactions of surface waters, shallow aquifers, and deep bedrock groundwaters must be understood. To estimate these interactions, hydrogeochemical characterization, together with four tracer methods were used: Tritium/helium, dichlorodifluoromethane and sulfur hexafluoride, stable isotopes of hydrogen and oxygen, and carbon-14. Most of the shallow groundwater samples are similar to the natural precipitation and groundwater in their chemical composition, being of Calcium bicarbonate type. B-11-17HYD013 was an exception, containing much more Cl and SO4. The samples from the deep 17MOS8193 all show a very typical composition for this type of a borehole, on the line between the saline Sodium sulphate and Sodium chloride water types. The samples from the 12MOS8102, as well as the river water samples and the Rytikuru spring sample are located between these two end members. The hydrogen and oxygen isotope values divided the samples into two distinct groups: those that show evaporation signal in the source water, and those that do not. The most likely source for the evaporated signal in the groundwaters is in the surface water pools in the Viiankiaapa mire, which have then infiltrated into the groundwater and followed the known groundwater flow gradient into the observation wells near the River Kitinen. Tritium showed no inclusion of recently recharged water in the deep 17MOS8193, and dated most of the shallow wells with screen below bedrock surface to be recharged in the 70’s and 80’s. B-10-17HYD017 had an older apparent age from 1955, and B-14-17HYD006 was curiously dated to be recharged in 2018. 14C gave apparent age of over 30 000 a for the deep 17MOS8193. The slight contents of 14C could be caused by slight contamination during sampling meaning the age is a minimum. The sample M-4-12MOS8102 got an apparent age of ~3 500 a, which could in turn be an overestimate due to ancient carbon being dissolved from the local bedrock fractures. CFC-12 showed apparent recharge dates from 1963 to 1975 in the shallow wells, and no recently recharged water in the deep 17MOS8193, and so was generally in line with the 14C and Tritium results, although some contamination had happened. SF6 concentrations exceeded possible concentrations considering other results, most likely due to underground generation, and the method was dismissed. By trace element composition, all samples from the deep 17MOS8139 are distinct from other samples and saw slight dilution in concentrations of most elements in the span of the test pumping. Other samples are more mixed and difficult to interpret, but some trends and connections are visible, such as the higher contents in wells with screens below the bedrock surface than those with screens above the bedrock surface, and the exceptionally high contents of many elements in B-13-17HYD004. Overall, the study did benefit from the large array of methods, showing no interaction between the deep bedrock groundwaters and shallow groundwaters or surface waters. The evaporated signal from the Viiankiaapa was clearly visible in the samples close to the River Kitinen.
  • Alcantara, Jose Carlos (2020)
    A recent machine learning technique called federated learning (Konecny, McMahan, et. al., 2016) offers a new paradigm for distributed learning. It consists of performing machine learning on multiple edge devices and simultaneously optimizing a global model for all of them, without transmitting user data. The goal for this thesis was to prove the benefits of applying federated learning to forecasting telecom key performance indicator (KPI) values from radio network cells. After performing experiments with different data sources' aggregations and comparing against a centralized learning model, the results revealed that a federated model can shorten the training time for modelling new radio cells. Moreover, the amount of transferred data to a central server is minimized drastically while keeping equivalent performance to a traditional centralized model. These experiments were performed with multi-layer perceptron as model architecture after comparing its performance against LSTM. Both, input and output data were sequences of KPI values.
  • Barakhtii, Diana (2023)
    This thesis presents the utilisation of nuclear magnetic resonance (NMR) spectroscopy for mechanistic and kinetic studies of the PFAA-Staudinger ligation through its perspective for further application for metabolic glycoengineering, hence nuclear imaging purposes. The literature review focuses on the bioorthogonal reactions, their comparison, known implementations and perspectives in nuclear imaging, specifically in metabolic glycoengineering. In an experimental part set of compounds was studied in different conditions with the same reagent in order to characterise triarylphosphines reactivity in the PFAA-Staudinger reaction. For analysis purposes, 1H and 31P NMR spectra of reagents, products and reaction mixtures were acquired and analysed.
  • Tomberg, Eemeli (2016)
    In this thesis, we study the decoherence of cosmological scalar perturbations during inflation. We first discuss the FRW model and cosmic inflation. Inflation is a period of accelerated expansion in the early universe, in typical models caused by a scalar field called inflaton. We review cosmological perturbation theory, where perturbations of the inflaton field and scalar degrees of freedom of the metric tensor are combined into the gauge-invariant Sasaki-Mukhanov variable. We quantize this variable using canonical quantization. Then, we discuss how interactions between the perturbations and their environment can lead to decoherence. In decoherence, the reduced density operator of the perturbations becomes diagonal with respect to a particular pointer basis. We argue that the pointer basis for the cosmological scalar perturbations consists of approximate eigenstates of the field value operator. Finally, we discuss how decoherence can help understand the transition from quantum theory to classical perturbation theory, and justify the standard treatment of perturbations and their initial conditions in cosmology. We conclude that since decoherence should not spoil the observationally successful predictions of this standard treatment, it is unlikely that the actual amount of decoherence could be observed in, say, the CMB radiation.
  • Althermeler, Nicole (2016)
    Metagenomics promises to shed light on the functioning of microbial communities and their surrounding ecosystem. In metagenomic studies the genomic sequences of a collection of microorganisms are directly extracted from a specific environment. Up to 99% of microbes cannot be cultivated in the lab; thus, traditional analysis techniques have very limited applicability in this challenging setting. By directly extracting the sequences from the environment, metagenomic studies circumvents this dilemma. Thus, metagenomics has become a powerful tool in the analysis of the diversity and metabolic capability of environmental microbes. However, metagenomic studies have challenges of their own. In this thesis we investigate several aspects of metagenomic data set analysis, focusing on means of (1) verifying adequacy of taxonomic unit and enzyme representation and annotation in the sample, (2) highlighting similarities between samples by principal component analysis, (3) visualizing metabolic pathways with manually drawn metabolic maps from the Kyoto Encyclopedia of Genes and Genomes, and (4) estimating taxonomic distributions of pathways with a novel strategy. A case study of deep bedrock groundwater metagenomic samples will illustrate these methods. Water samples from boreholes, up to 2500 meter deep, of two different sites of Finland display the applicability and limitations of aforementioned methods. In addition publicly available metagenomic and genomic samples serve as baseline references. Our analysis resulted in a taxonomic and metabolic characterization of the samples. We were able to adequately retrieve and annotate the metabolic content based on the deep bedrock samples. The visualization provided a tool for further investigation. The microbial community distribution could be characterized on higher levels of abstraction. Previously suspected similarities to fungi or archaea were not verified. First promising results were observed with the novel strategy in estimating taxonomic distributions of pathways. Further results can be found at: http://www.cs.helsinki.fi/group/urenzyme/deepfun/
  • Lintula, Johannes (2023)
    This work examines how neural networks can be used to qualitatively analyze systems of differential equations depicting population dynamics. We present a novel numerical method derived from physics informed learning, capable of extracting equilibria and bifurcations from population dynamics models. The potential of the framework is showcased three different example problems, a logistic model with outside inference, the Rosenzweig-MacArthur model and one model from a recent population dynamics paper. The key idea behind the method is having a neural network learn the dynamics of a free parameter ODE system, and then using the derivatives of the neural network to find equilibria and bifurcations. We, a bit clunkily, refer to these networks as physics informed neural networks with free parameters and variable initial conditions. In addition to these examples, we also survey how and where these neural networks could be further utilized in the context of population dynamics. To answer the how, we document our experiences choosing good hyperparameters for these networks, even venturing into previously unexplored territory. For the where, we suggest potentially useful neural network frameworks to answer questions from an external survey concerning contemporary open questions in population dynamics. The research of the work is preceded by a short dive on qualitative population dynamics, where we ponder what are the problems we want to solve and what are the tools we have available for that. Special attention is paid to parameter sensitivity analysis of ordinary differential equation systems through bifurcation theory. We also provide a beginner friendly introduction to deep learning, so that the research can be understood even by someone not previously familiar with the field. The work was written, and all included contents were selected, with the goal of establishing a basis for future research.
  • Maljanen, Katri (2021)
    Cancer is a leading cause of death worldwide. Unlike its name would suggest, cancer is not a single disease. It is a group of diseases that arises from the expansion of a somatic cell clone. This expansion is thought to be a result of mutations that confer a selective advantage to the cell clone. These mutations that are advantageous to cells that result in their proliferation and escape of normal cell constraints are called driver mutations. The genes that contain driver mutations are known as driver genes. Studying these mutations and genes is important for understanding how cancer forms and evolves. Various methods have been developed that can discover these mutations and genes. This thesis focuses on a method called Deep Mutation Modelling, a deep learning based approach to predicting the probability of mutations. Deep Mutation Modelling’s output probabilities offer the possibility of creating sample and cancer type specific probability scores for mutations that reflect the pathogenicity of the mutations. Most methods in the past have made scores that are the same for all cancer types. Deep Mutation Modelling offers the opportunity to make a more personalised score. The main objectives of this thesis were to examine the Deep Mutation Modelling output as it was unknown what kind of features it has, see how the output compares against other scoring methods and how the probabilities work in mutation hotspots. Lastly, could the probabilities be used in a common driver gene discovery method. Overall, the goal was to see if Deep Mutation Modelling works and if it is competitive with other known methods. The findings indicate that Deep Mutation Modelling works in predicting driver mutations, but that it does not have sufficient power to do this reliably and requires further improvements.