Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by study line "Soveltava bioinformatiikka"

Sort by: Order: Results:

  • Malmsten, Kim (2021)
    Genomic structural variants are large events that change the structure of the genome. These can cause changes in the functions of cells by breaking genes and genomic regulatory regions. Multiple factors are known to affect the formation of structural variants and previous studies have shown that often the sequence content in a genomic region plays a role in their formation. This study aims to characterize the sequence content around structural variant breakpoints from structural variants which have been detected from human tissue samples which have been whole genome sequenced with nanopore sequencing. The characterization was done by looking at the genomic repetitive elements found around the breakpoints, by analyzing the GC-content around the breakpoints, and by studying what kind of enriched DNA motifs were found in the sequences around the breakpoints and how these were located in these sequences. Multiple different repetitive elements were seen to occur near the breakpoint regions, and it was also observed that there were differences in what kind of repetitive elements were seen around different types of structural variants. Around the sequences of different kinds of structural variants there was also distinct differences in what kind of GC-content profiles the sequences had. In addition, various different enriched motifs were also found from the sequences and many of these showed distinct variation on how they were located around the breakpoints. These results support the previous findings showing that also here the sequence content does play a role in the formation of structural variants, but still all of the results here could not be directly explained by previous studies. In these results, it was seen that the GC-content was higher in sequences that have been affected by an event that causes structural variant formation. Also, many of the found DNA motifs were distinctly skewed around the breakpoint sequences, possibly hinting that the sequences containing these motifs would be prone to the formation of structural variants.
  • Gu, Chunhao (2021)
    Along with the rapid scale-up of biological knowledge bases, mechanistic models, especially metabolic network models, are becoming more accurate. On the other hand, machine learning has been widely applied in biomedical researches as a large amount of omics data becomes available in recent years. Thus, it is worth to conduct a study on integration of metabolic network models and machine learning, and the method may result in some biological discoveries. In 2019, MIT researchers proposed an approach called 'White-Box Machine Learning' when they used fluxomics data derived from in silico simulation of a genome-scale metabolic (GEM) model and experimental antibiotic lethality measurements (IC50 values) of E. coli under hundreds of screening conditions to train a linear regression-based machine learning model, and they extracted coefficients of the model to discover some metabolic mechanism involving in antibiotic lethality. In this thesis, we propose a new approach based on the framework of the 'White-Box Machine Learning'. We replace the GEM model with another state-of-the-art metabolic network model -- the expression and thermodynamics flux (ETFL) formulation. We also replace the linear regression-based machine learning model with a novel nonlinear regression model – multi-task elastic net multilayer perceptron (MTENMLP). We apply the approach on the same experimental antibiotic lethality measurements (IC50 values) of E. coli from the 'White-Box Machine Learning' study. Finally, we validate their conclusions and make some new discoveries. Specially, our results show the ppGpp metabolism is active under antibiotic stress, which is supported by some literature. This implies that our approach has potential to make a biological discovery even if we don't know a possible conclusion.
  • Ba, Yue (2021)
    Ringed seals (Pusa hispida) and grey seals (Halichoerus grypus) are known to have hybridized in captivity despite belonging to different taxonomic genera. Earlier genetic analyses have indicated hybridization in the wild and the resulting introgression of genetic material cross species boundaries could potentially explain the intermediate phenotypes observed e.g. in their dentition. Introgression can be detected using genome data, but existing inference methods typically require phased genotype data or cannot separate heterozygous and homozygous introgression tracts. In my thesis, I will present a method based on Hidden Markov Models (HMM) to identify genomic regions with a high density of single nucleotide variants (SNVs) of foreign ancestry. Unlike other methods, my method can use unphased genotype data and can separate heterozygous and homozygous introgression tracts. I will apply this method to study introgression in Baltic ringed seals and grey seals. I will compare our method to an alternative method and assess our method with simulated data in terms of precision and recall. Then, I will apply it to seal data to search for introgression. Finally, I will discuss what future directions to improve our method.