Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "bioinformatiikka"

Sort by: Order: Results:

  • Peltola, Sanni (2019)
    In recent decades, ancient DNA recovered from old and degraded samples, such as bones and fossils, has presented novel prospects in the fields of genetics, archaeology and anthropology. In Finland, ancient DNA research is constrained by the poor preservation of bones: they are quickly degraded by acidic soils, limiting the age of DNA that can be recovered from physical remains. However, some soil components can bind DNA and thus protect the molecules from degradation. Ancient DNA from soils and sediments has previously been used to reconstruct paleoenvironments, to study ancient parasites and diet and to demonstrate the presence of a species at a given site, even when there are no visible fossils present. In this pilot study, I explored the potential of archaeological sediments as an alternative source of ancient human DNA. I collected sediment samples from five Finnish Neolithic Stone Age (6,000–4,000 years ago) settlement sites, located in woodland. In addition, I analysed a lakebed sample from a submerged Mesolithic (10,000–7,000 years ago) settlement site, and a soil sample from an Iron Age burial with bones present to compare DNA yields between the two materials. Soil samples were converted into Illumina sequencing libraries and enriched for human mtDNA. I analysed the sequencing data with a customised metagenomics-based bioinformatic analysis workflow. I also tested program performance with simulated data. The results suggested that human DNA preservation in Finnish archaeological sediments may be poor or very localised. I detected small amounts of human mtDNA in three Stone Age woodland settlement sites and a submerged Mesolithic settlement site. One Stone Age sample exhibited terminal damage patterns suggestive of DNA decay, but the time of deposition is difficult to estimate. Interestingly, no human DNA was recovered from the Iron Age burial soil, suggesting that body decomposition may not serve as a significant source of sedimentary ancient DNA. Additional complications may arise from the high inhibitor content of the soil and the abundance of microbial and other non-human DNA present in environmental samples. In the future, a more refined sampling approach, such as targeting microscopic bone fragments, could be a strategy worth trialling.
  • Koivunen, Sampo (2019)
    The Oxford Nanopore MinION is a third generation sequencer utilizing nanopore sequencing technology. The nanopore sequencing method allows sequencing of either DNA or RNA strands as they pass through the membrane-embedded nanopores. By measuring the corresponding fluctuations in the ion flow passing through the nanopore the passing strands can be sequenced directly without additional second-hand reactions or measurements. The MinION sequencing has very distinctly different characteristics compared to the market leaders of the sequencing field. The small form factor of the device further helps it to separate itself from the other alternatives. However, the technology has only been on the market for a very short time and thus very little golden standards regarding its capabilities or usage have been established. This thesis describes our experiences testing the capabilities of the MinION sequencer both before its commercial release as a part of a special early access program, as well as our continued experiments with the device following its commercial launch. The main results of this study include successfully sequencing and aligning E.coli and human gDNA samples to their respective reference genomes. Using our sequencing and analysis pipeline specifically tuned to the MinION we were able to sequence the entire E.coli genome on a single MinION flow cell with an average depth of around 180. Over the course of the thesis project the MinION sequencing protocol was evaluated and optimized in order to determine whether it has the potential to achieve our ultimate goal of reliably sequencing the previously inaccessible genomic regions of the human genome. The possibility of augmenting the sequencing protocol by adding the pre-sequencing target enrichment was also explored. Ultimately we were able to confirm that the MinION sequencer can be used to sequence long DNA fragments from a multitude of sample types. The majority of the produced reads could successfully be aligned against a reference genome. However, the limited yield and sequencing quality of a single experiment does limit the applicability of the method for more complicated genomic studies. These issues can be addressed with various techniques, chiefly target enrichment, but adapting such methods into the sequencing pipeline has its own challenges.
  • Tuominiemi, Antti (2020)
    The sequencing methods used to study the genome of organisms have become cheaper, resulting in a significant increase in the amount of genomic data available. Knowing the nucleic acid sequence of the DNA does not tell much about an organism. Not without first annotating the genome, which means searching for the locations of genes and defining their products. The programs used for annotation make mistakes and their results must be evaluated in various ways. The vast amount of genomic data encourages fast production of new annotations and this can increase human made errors. Some annotation programs use gene databases, so the number of wrongly annotated genes they contain may increase in the future if the quality control of annotations is not improved. This study examines correlation between selected quality measures and the quality of annotations. The quality metrics used can be divided into two basic types, the first one is based on the basic structures of genes and the second one on comparing the protein product of a gene against a protein database. The study assumes that comparison to a reference is a reliable way to assess the quality of annotations. The comparison is made at genome, exon and nucleotide levels. A single value describing the comparison is calculated at each level. For each gene aligned with a reference gene, sensitivity and specificity are calculated and used to make f-score at the nucleotide level. Four different versions of the wild strawberry (Fragaria vesca) genome and their six annotations were used as data. They were downloaded from the Genome Database for Rosacaea, which is a genome database specializing in rose plants. The correlation coefficients calculated from quality metrics and f-scores were in several cases small but reliable because the p-value was minimal. Correlation coefficients were higher when quality metrics based on protein homology were examined. The correlation coefficient calculated from the mean of the structure-based quality metrics and the f-score received lower values if the studied annotation had a high f-score value. These results detailed in this paper support the view that the selected structure-based quality metrics are not suitable for evaluation of high-grade annotations. They might possibly be used in automated detection of poor-quality annotations. Quality metrics based on protein homology appeared to be promising subjects for further research.
  • Hellsten, Kirsi (2023)
    Triglycerides are a type of lipid that enters our body with fatty food. High triglyceride levels are often caused by an unhealthy diet, poor lifestyle, poorly treated diseases such as diabetes and too little exercise. Other risk factors found in various studies are HIV, menopause, inherited lipid metabolism disorder and South Asian ancestry. Complications of high triglycerides include pancreatitis, carotid artery disease, coronary artery disease, metabolic syndrome, peripheral artery disease, and strokes. Migration has made Singapore diverse, and it contains several subpopulations. One third of the population has genetic ancestry in China. The second largest group has genetic ancestry in Malaysia, and the third largest has genetic ancestry in India. Even though Singapore has one of the highest life expectancies in the world, unhealthy lifestyles such as poor diet, lack of exercise and smoking are still visible in everyday life. The purpose of this thesis was to introduce GWAS-analysis for quantitative traits and apply it to real data, and also to see if there are associations between some variants and triglycerides in three main subpopulations in Singapore and compare the results to previous studies. The research questions that this thesis answered are: what is GWAS analysis and what is it used for, how can GWAS be applied to data containing quantitative traits, and is there associations between some SNPs and triglycerides in three main populations in Singapore. GWAS stands for genome-wide association studies designed to identify statistical association between genetic variants and phenotypes or traits. One reason for developing GWAS was to learn to identify different genetic factors which have an impact on significant phenotypes, for instance susceptibility to certain diseases Such information can eventually be used to predict the phenotypes of individuals. GWAS have been globally used in, for example, anthropology, biomedicine, biotechnology, and forensics. The studies enhance the understanding of human evolution and natural selection and helps forward many areas of biology. The study used several quality control methods, linear models, and Bayesian inference to study associations. The research results were examined, among other things, with the help of various visual methods. The dataset used in this thesis was an open data used by Saw, W., Tantoso, E., Begum, H. et al. in their previous study. This study showed that there are associations between 6 different variants and triglycerides in the three main subpopulations in Singapore. The study results were compared with the results of two previous studies, which differed from the results of this study, suggesting that the results are significant. In addition, the thesis reviewed the ethics of GWAS and the limitations and benefits of GWAS. Most of the studies like this have been done in Europe, so more research is needed in different parts of the world. This research can also be continued with different methods and variables.