Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Mäklin, Tommi"

Sort by: Order: Results:

  • Mäklin, Tommi (2017)
    DNA sequencing has seen a rapid decrease in price during the last decade. As a result, routine sequencing of bacterial colonies in both clinical and environmental sources is becoming increasingly available. However, accurate identification of the bacterial strains colonizing a sample remains difficult especially in the presence of multiple organisms. Traditional methods based on culturing the bacteria are laborous and ineffective, while methods based on sequencing data have trouble differentiating between closely related variants of the species. Accurate identification of the species or strains contained in a sample would be desirable both in metagenomic studies and in improving the quality of hospital care. The aim of this thesis was to develop a computational method for accurate bacterial strain identification. Based on recent advancements in sequencing read alignment and application of Bayesian inference to bacterial strain identification, the thesis introduces a pipeline capable of rapid and accurate strain identification from high-throughput sequencing data. By representing the within-species variation with multiple reference genomes that have been clustered, the pipeline is able to accurately determine the cluster proportions in a sample from pseudoalignment of reads to the reference genomes. The proportions are estimated using a variational Bayesian method. Accuracy of the method is evaluated on both real and synthetic data containing reads originating from Staphylococcus aureus, Staphylococcus epidermidis, Klebsiella pneumoniae, Campylobacter jejuni and Campylobacter coli. In all cases the cluster proportions are accurately identified and performance is significantly better than that of existing methods.