Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "survival"

Sort by: Order: Results:

  • Horn, Matthew (2024)
    Long term monitoring programs gather important data to understand population trends and man- age biodiversity, including phenological data. The sampling of such data can suffer from left- censoring where the first occurrence of an event coincides with the first sampling time. This can lead to overestimation of the timing of species’ life history events and obscure phenological trends. This study develops a Bayesian survival model to predict and impute the true first occurrence times of Finnish moths in a given sampling season in left-censored cases, thereby estimating the amount of left-censoring and effectively "decensoring" the data. A simulation study was done to test the model on synthetic data and explore how effect size, the severity of censoring, and sampling fre- quency effect the inference. Forward feature selection was done over environmental covariates for a generalized linear survival model with logit link, incorporating both left-censoring and interval censoring. Five-fold cross validation was done to select the best model and see what covariates would be added during the feature selection process. The validation tested the model both in its ability to predict points that were not left-censored and those that were artificially left-censored. The final model included terms for cumulative Growing Degree Days, cumulative Chilling Days, mean spring temperature, cumulative rainfall, and daily minimum temperature, in addition to an intercept term. It was trained on all of the data and predictions were made for the true first occurrence times of the left-censored sites and years.
  • Lindgren, Himmi (2024)
    Unsupervised learning techniques can detect clinically relevant structure in population cohort data of human gut microbiota. While the gut microbiota composition is influenced by individual factors such as diet, medication, and development of the immune system during early childhood, it is proposed that individuals maintain a relatively stable microbiota ecosystem throughout adulthood. This stability allows to distinguish individuals into subgroups based on their gut microbiota characteristics, which define the key features of microbiota community types within the population. For this, I compared three probabilistic unsupervised learning techniques, optimization-based Non-negative Matrix Factorization, and Bayesian modelling techniques, Dirichlet Multinomial Mixtures and Latent Dirichlet Allocation, with a naive benchmark clustering based on dominant taxa. I used an all-cause mortality association strength as a quantitative metrics to distinguish biologically relevant structure in a large Finnish population cohort with almost 18 years follow-up. The techniques defined microbiota assemblages as either discrete enterotypes, which assigned each sample to a single community type, or continuous enterosignatures, which identified patterns of co-occurrence of microbiota community types within each sample. I found five rather robust community types, characterized by Bacteroides, Alistipes, Agathobacter, Escherichia, and Prevotella bacterial genera. Latent Dirichlet Allocation detected the strongest early mortality signal using Cox regression, outperforming all other techniques. The replicability of Latent Dirichlet Allocation was assessed using cross validation. The predicted community types uncovered similar ecological landscape on the data with the community types obtained using the entire data, confirming the clinical relevance, robustness, and scalability of the technique.