Skip to main content
Login | Suomeksi | På svenska | In English

Improving the throughput of the forward population genetic simulation environment simuPOP

Show simple item record

dc.date.accessioned 2013-11-19T13:54:28Z und
dc.date.accessioned 2017-10-24T12:24:40Z
dc.date.available 2013-11-19T13:54:28Z und
dc.date.available 2017-10-24T12:24:40Z
dc.date.issued 2013-11-19T13:54:28Z
dc.identifier.uri http://radr.hulib.helsinki.fi/handle/10138.1/3252 und
dc.identifier.uri http://hdl.handle.net/10138.1/3252
dc.title Improving the throughput of the forward population genetic simulation environment simuPOP en
ethesis.department.URI http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department Institutionen för datavetenskap sv
ethesis.department Department of Computer Science en
ethesis.department Tietojenkäsittelytieteen laitos fi
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty.URI http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university Helsingfors universitet sv
ethesis.university University of Helsinki en
ethesis.university Helsingin yliopisto fi
dct.creator Kammonen, Juhana
dct.issued 2013
dct.language.ISO639-2 eng
dct.abstract Biological populations arise, develop and evolve under a series of well-studied laws and fairly regular mechanisms. Population genetics is a field of science, that aims to study and model these laws and the genetic composition and diversity of populations of various types of species and life. At best, population genetic models can be of use in verifying past events of a population and eventually reconstructing unknown population histories in light of multidisciplinary evidence. An example case of this is the research concerning human population prehistory of Finland. Population simulations are a sub-branch of the rapidly developing field of bioinformatics and can be divided into two pipelines: forward-in-time and backward-in-time (coalescent). The methodologies enable in silico testing of the development of genetic composition of individuals in a well-defined population. This thesis focuses on the forward-in-time approach. Multiple pieces of software exist today for forward population simulations, and simuPOP [http://simupop.sourceforge.net] probably is the single most flexible one of them. Being able to incorporate transmission of genomes and arbitrary individual information between generations, simuPOP has potential applications even beyond population genetics. However, simuPOP tends to use an enormous amount of computer random access memory when simulating large population sizes. This thesis introduces three approaches to improve the throughput of simuPOP. These are i) introducing scripting guidelines, ii) approximating a complex simulation using the inbuilt biallelic mode of simuPOP and iii) changes in the source code of simuPOP that would enable improved throughput. A previous simuPOP script designed to simulate past demographic events of Finnish population history is used as an example. A batch of 100 simulation runs is run on three versions of the previous script: standard, modified and biallelic. As compared to the standard mode, the modified simulation script performs marginally faster. Despite doubling the user time of a single simulation run, the biallelic approximation method proves to consume three times less random access memory still being compatible from the population genetic point of view. This suggests that built-in support for the biallelic approximation could be a valuable supplement to simuPOP. Evidently, simuPOP can be applied to very complex forward population simulations. The use of individual information fields enables the user to set up arbitrary simulation scenarios. Data structure changes at source code level are likely to improve throughput even further. Besides introducing improvements and guidelines to the simulation workflow, this thesis is a standalone case study concerning the use and development of a bioinformatics software. Furthermore, an individual development version of simuPOP called simuPOP-rev is founded with the goal of implementing the source code changes suggested in this thesis. ACM Computing Classification System (CCS): D.1 [Programming techniques], G.1.6 [Optimization], H.3 [Information storage and retrieval] en
dct.language en
ethesis.language.URI http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language English en
ethesis.language englanti fi
ethesis.language engelska sv
ethesis.thesistype pro gradu-avhandlingar sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype.URI http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram Bioinformatics en
dct.identifier.urn URN:NBN:fi-fe2017112251321
dc.type.dcmitype Text

Files in this item

Files Size Format View
jkammone_gradu_06112013.pdf 726.5Kb PDF

This item appears in the following Collection(s)

Show simple item record