Skip to main content
Login | Suomeksi | På svenska | In English

Improving the throughput of the forward population genetic simulation environment simuPOP

Show full item record

Title: Improving the throughput of the forward population genetic simulation environment simuPOP
Author(s): Kammonen, Juhana
Contributor: University of Helsinki, Faculty of Science, Department of Computer Science
Language: English
Acceptance year: 2013
Abstract:
Biological populations arise, develop and evolve under a series of well-studied laws and fairly regular mechanisms. Population genetics is a field of science, that aims to study and model these laws and the genetic composition and diversity of populations of various types of species and life. At best, population genetic models can be of use in verifying past events of a population and eventually reconstructing unknown population histories in light of multidisciplinary evidence. An example case of this is the research concerning human population prehistory of Finland. Population simulations are a sub-branch of the rapidly developing field of bioinformatics and can be divided into two pipelines: forward-in-time and backward-in-time (coalescent). The methodologies enable in silico testing of the development of genetic composition of individuals in a well-defined population. This thesis focuses on the forward-in-time approach. Multiple pieces of software exist today for forward population simulations, and simuPOP [http://simupop.sourceforge.net] probably is the single most flexible one of them. Being able to incorporate transmission of genomes and arbitrary individual information between generations, simuPOP has potential applications even beyond population genetics. However, simuPOP tends to use an enormous amount of computer random access memory when simulating large population sizes. This thesis introduces three approaches to improve the throughput of simuPOP. These are i) introducing scripting guidelines, ii) approximating a complex simulation using the inbuilt biallelic mode of simuPOP and iii) changes in the source code of simuPOP that would enable improved throughput. A previous simuPOP script designed to simulate past demographic events of Finnish population history is used as an example. A batch of 100 simulation runs is run on three versions of the previous script: standard, modified and biallelic. As compared to the standard mode, the modified simulation script performs marginally faster. Despite doubling the user time of a single simulation run, the biallelic approximation method proves to consume three times less random access memory still being compatible from the population genetic point of view. This suggests that built-in support for the biallelic approximation could be a valuable supplement to simuPOP. Evidently, simuPOP can be applied to very complex forward population simulations. The use of individual information fields enables the user to set up arbitrary simulation scenarios. Data structure changes at source code level are likely to improve throughput even further. Besides introducing improvements and guidelines to the simulation workflow, this thesis is a standalone case study concerning the use and development of a bioinformatics software. Furthermore, an individual development version of simuPOP called simuPOP-rev is founded with the goal of implementing the source code changes suggested in this thesis. ACM Computing Classification System (CCS): D.1 [Programming techniques], G.1.6 [Optimization], H.3 [Information storage and retrieval]


Files in this item

Files Size Format View
jkammone_gradu_06112013.pdf 726.5Kb PDF

This item appears in the following Collection(s)

Show full item record