Skip to main content
Login | Suomeksi | På svenska | In English

Probabilistic quantification of bacterial strain mixtures

Show simple item record

dc.date.accessioned 2017-08-31T06:20:27Z und
dc.date.accessioned 2017-10-24T12:22:13Z
dc.date.available 2017-08-31T06:20:27Z und
dc.date.available 2017-10-24T12:22:13Z
dc.date.issued 2017-08-31T06:20:27Z
dc.identifier.uri http://radr.hulib.helsinki.fi/handle/10138.1/6144 und
dc.identifier.uri http://hdl.handle.net/10138.1/6144
dc.title Probabilistic quantification of bacterial strain mixtures en
ethesis.discipline Statistics en
ethesis.discipline Tilastotiede fi
ethesis.discipline Statistik sv
ethesis.discipline.URI http://data.hulib.helsinki.fi/id/670ef0b6-2f9e-4e98-91af-a292298fb670
ethesis.department.URI http://data.hulib.helsinki.fi/id/61364eb4-647a-40e2-8539-11c5c0af8dc2
ethesis.department Institutionen för matematik och statistik sv
ethesis.department Department of Mathematics and Statistics en
ethesis.department Matematiikan ja tilastotieteen laitos fi
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty.URI http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university Helsingfors universitet sv
ethesis.university University of Helsinki en
ethesis.university Helsingin yliopisto fi
dct.creator Mäklin, Tommi
dct.issued 2017
dct.language.ISO639-2 eng
dct.abstract DNA sequencing has seen a rapid decrease in price during the last decade. As a result, routine sequencing of bacterial colonies in both clinical and environmental sources is becoming increasingly available. However, accurate identification of the bacterial strains colonizing a sample remains difficult especially in the presence of multiple organisms. Traditional methods based on culturing the bacteria are laborous and ineffective, while methods based on sequencing data have trouble differentiating between closely related variants of the species. Accurate identification of the species or strains contained in a sample would be desirable both in metagenomic studies and in improving the quality of hospital care. The aim of this thesis was to develop a computational method for accurate bacterial strain identification. Based on recent advancements in sequencing read alignment and application of Bayesian inference to bacterial strain identification, the thesis introduces a pipeline capable of rapid and accurate strain identification from high-throughput sequencing data. By representing the within-species variation with multiple reference genomes that have been clustered, the pipeline is able to accurately determine the cluster proportions in a sample from pseudoalignment of reads to the reference genomes. The proportions are estimated using a variational Bayesian method. Accuracy of the method is evaluated on both real and synthetic data containing reads originating from Staphylococcus aureus, Staphylococcus epidermidis, Klebsiella pneumoniae, Campylobacter jejuni and Campylobacter coli. In all cases the cluster proportions are accurately identified and performance is significantly better than that of existing methods. en
dct.language en
ethesis.language.URI http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language English en
ethesis.language englanti fi
ethesis.language engelska sv
ethesis.thesistype pro gradu-avhandlingar sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype.URI http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
dct.identifier.urn URN:NBN:fi-fe2017112251701
dc.type.dcmitype Text

Files in this item

Files Size Format View
gradu_tommi_maklin.pdf 1.052Mb PDF

This item appears in the following Collection(s)

Show simple item record