Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by master's degree program "Magisterprogrammet i informatik inom livsvetenskaperna"

Sort by: Order: Results:

  • Zogjani, Yllza (2023)
    The increasing demand for comprehensive datasets to address complex diseases has resulted in a widespread popularity of biobank-based research. However, the collection of biobank-level data may be susceptible to biases when fundamental aspects of study design, such as sampling approach, are overlooked. FinnGen is a large-scale cohort study aiming to improve diagnoses and prevent diseases through genetic research by combining biobank data with registry data.However, FinnGen’s hospital-based recruitment strategy makes FinnGen suffer from selection bias and thus epidemiologically less representative of its sampling population. In this study, we examine the profound impact of selection bias in FinnGen. We use well-established epidemiological methods and leverage representative data on the Finnish population to try and correct for the bias. By comparing key demographic characteristics and association statistics of interest between FinnGen and a comprehensive registry-based study, FinRegistry, we highlight the extent to which selection bias within FinnGen results in distorted association estimates and a dataset that is highly non - representative of its underlying population. In response to these findings, we estimate Iterative Proportional Fitting (IPF) weights to estimate association statistics that are representative of the true sampling population of FinnGen and unaffected by selection bias. By comparing weighted associations estimated in the FinnGen with associations estimated using FinRegistry data, we infer that the use of our IPF weights mitigates volunteer bias in FinnGen.
  • Nebelung, Hanna (2023)
    ScRNA-seq captures a static picture of a cell's transcriptome including abundances of unspliced and spliced RNA. RNA velocity methods offer the opportunity to infer future RNA abundances and thus future states of a cell based on the temporal change of these unspliced and spliced RNA. Early RNA velocity methods have shed light on transcriptional dynamics in many biological processes. However, due to strict assumptions in the underlying model, these models are not reliable when analysing and inferring velocity for genes with complex expression dynamics such as genes with transcriptional boosts. These genes can for example be observed in erythropoietic and hematopoietic data. Several new RNA velocity methods have been proposed recently. Among these, veloVI and Pyro-Velocity both employ Bayesian methods to estimate the reaction rate and latent parameters. Thus the problem of estimating RNA velocity is turned into a posterior probability inference, that allows for more flexible inference of model parameters and the quantification of uncertainty. The objectives of this thesis were to investigate newly published RNA velocity methods, veloVI and Pyro-Velocity, in comparison to the established tool scVelo. To achieve this, we applied the methods to data obtained from scRNA-seq of healthy and ERCC6L2 disease bone marrow cells. ERCC6L2 disease can cause bone marrow failure with a risk of progression to acute myeloid leukemia with erythroid predominance. Specifically, we evaluated whether RNA velocity results reflect hematopoietic differentiation, if genes with transcriptional boosts affect the velocity results, and if RNA velocity analysis can indicate why erythropoiesis in ERCC6L2 disease is affected. We find that new RNA velocity methods can not produce velocity estimations that are fully in line with what is known of hematopoiesis in our data. Further, the results suggest that velocity estimations by veloVI are affected by genes with transcriptional boosts. Moreover, RNA velocity methods examined in this thesis are not robust and cannot reliably predict cell transitions based on the estimated velocity. Subsequently, velocity estimations for disease data such as ERCC6L2 disease must be evaluated carefully before drawing any conclusion about the differentiation process. In conclusion, this thesis highlights the need for models that can model complex transcription kinetics. Still, as this field is rapidly growing and promising new methods are being developed, improvement of RNA velocity analysis, in general, is possible.
  • Viitikko, Tanja (2023)
    Pathogens are everywhere in nature, so organisms have developed various defense mechanisms in order to defend themselves against the pathogens. Two of the defense mechanisms are known as resistance and tolerance. Resistance describes the host's ability to avoid being infected by the pathogen, while tolerance describes the host's ability to reduce the fitness loss caused by the infection. We assume that investing into resistance reduces the transmission rate of the pathogens and investing into tolerance reduces the host's virulence. Developing the defense mechanisms is costly to the host. In this thesis, we assume that the resources invested into resistance and tolerance are taken away from the host's fecundity. The independent but simultaneous evolution of resistance and tolerance is modeled with an SIS model. The model is studied with the methods of adaptive dynamics. We concentrate on finding continuously stable strategies, which serve as the evolutionary end points for the population. We study the varying ecological parameters to determine which strategies are optimal for the host in different environments. We find that for low values of transmission rate, the hosts favor resistance over tolerance. When the transmission rate increases, resistance is traded for tolerance and the host benefits more from high tolerance. Low values of virulence result in tolerance being favored over resistance. Increasing virulence leads to a change in the defense mechanism as for high values of virulence investing into resistance is more beneficial to the host. The same holds for recovery rate, as tolerance is favored for low values of recovery rate and changed for resistance when the recovery rate increases. Patterns and associations between resistance and tolerance are also studied. Positive correlation between resistance and tolerance is found with low values of transmission rate, low and high values of virulence and high values of recovery rate. Resistance and tolerance correlate negatively with high values of transmission rate, intermediate values of virulence and low values of recovery rate.
  • Purmonen, Noora (2022)
    Tämän tutkielman tarkoituksena on esittää ja havainnoida tapoja, joilla tilastollista epävarmuutta voidaan selittää ja visualisoida. Erityisesti kohdeyleisönä tilastollisen epävarmuuden viestinnällä ovat lukijat, joilla ei ole juurikaan aiempaa kokemusta tilastollisista käsitteistä tai menetelmistä. Sovelluskohteena näiden visuaalisten viestinnän menetelmien esittämisessä on hyödynnetty COVID19-aineistoja. COVID19-tartuntataudin viestinnässä kohdeyleisöjä on ollut hyvin erilaisia, mutta esimerkiksi koko Suomen väestöä koskevassa viestinnässä epidemian etenemisestä olennaista on ollut nimenomaan viestintä kohdeyleisölle, joka ei koostu alan asiantuntijoista. Tutkielma pohjautuu vuoden 2020 COVID19-aineistoihin ja tartuntatautitilanteeseen, jolloin väestön keskuudessa ei vielä juurikaan ollut kehittynyt immuniteettia taudille. Tutkielman alussa esitellään SEIR-tartuntatautimalli, jossa kuvataan epidemian kehittymistä väestössä neljän eri tartuntatautivaiheen kautta. SEIR-mallia on hyödynnetty myös COVID19-mallinnuksessa epidemian alkuvaiheessa, sillä COVID19 ajateltiin käyttäytyvän epidemiana samoin näiden neljän vaiheen osalta. Mallin esittelyn lisäksi on hieman pohdittu, kuinka mallissa käytössä olevat parametrit, kuten perustarttuvuusluku, vaikuttavat epidemiatilanteen kehittymiseen. Terveyden ja hyvinvoinnin laitoksen COVID19-mallinnusta on myös esitelty SEIR-mallin ja tartuntamäärien kehittymisen näkökulmasta vuoden 2020 alkupuolella. Tässä on tuotu esille myös vuonna 2020 käytössä olleiden yksilöiden välisten kontaktien määrää alentavien rajoitusten vaikutusta epidemiatilanteeseen tarttuvuusluvun kautta. Tilastollisen epävarmuuden osalta tässä tutkielmassa on keskitytty tilastollisen epävarmuuden syihin, sillä epävarmuus voi olla peräisin hyödynnettävien tietojen puutteesta tai niiden sattumanvaraisuudesta. Taustalla vaikuttavien syiden ymmärtäminen on olennaista kokonaiskuvan ja sen osien selittämisessä ja havainnoimisessa. Tutkielmassa pohditaan erityisesti COVID19-mallinnuksessa ja sen tartuntojen testaamisessa esiintyvää epävarmuutta. Lisäksi tutkielmassa paneudutaan tilastollisen epävarmuuden esittämiskeinoihin, kuten otantaan liittyvään keskihajontaan tai -virheeseen sekä luottamusväleihin, sekä myöhemmin muun muassa näiden käsitteiden visualisointiin ja viestintään. Tilastollisen epävarmuuden viestintää esitetään erityisesti erilaisten visuaalisten kuvaajien, kuten laatikko-janakuvaajien ja sirontakaavioiden, kautta pohtien samalla eri kuvaajien hyötyjä tai haasteita. Tutkielman loppupuolella perehdytään vielä viestinnän näkökulmasta kuvaajien tulkintaan vaikuttaviin seikkoihin sekä epävarmuuden viestinnän päämääriin esimerkiksi viestinnästä syntyvän luottamuksen tai tunteiden kautta. Lopuksi kootaan vielä tilastollisen epävarmuuden visuaalisen esittämisen mahdollisia haasteita, jotka voivat johtua esimerkiksi kohdeyleisön tekemistä tulkinnoista tai epäolennaisten kuvaajien hyödyntämisestä.