Skip to main content
Login | Suomeksi | På svenska | In English

Towards faster RNA Sequencing analysis

Show simple item record

dc.date.accessioned 2014-11-04T12:54:18Z und
dc.date.accessioned 2017-10-24T12:23:51Z
dc.date.available 2014-11-04T12:54:18Z und
dc.date.available 2017-10-24T12:23:51Z
dc.date.issued 2014-11-04T12:54:18Z
dc.identifier.uri http://radr.hulib.helsinki.fi/handle/10138.1/4247 und
dc.identifier.uri http://hdl.handle.net/10138.1/4247
dc.title Towards faster RNA Sequencing analysis en
ethesis.discipline Computer science en
ethesis.discipline Tietojenkäsittelytiede fi
ethesis.discipline Datavetenskap sv
ethesis.discipline.URI http://data.hulib.helsinki.fi/id/1dcabbeb-f422-4eec-aaff-bb11d7501348
ethesis.department.URI http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department Institutionen för datavetenskap sv
ethesis.department Department of Computer Science en
ethesis.department Tietojenkäsittelytieteen laitos fi
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty.URI http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university Helsingfors universitet sv
ethesis.university University of Helsinki en
ethesis.university Helsingin yliopisto fi
dct.creator Sirokov, Roman
dct.issued 2014
dct.language.ISO639-2 eng
dct.abstract Processing data produced by next-generation sequencing technologies is a computationally intensive task. We aim to speed up this task by means of parallel computing. Our paralel computing solution employs Slurm for managing workload between different nodes. It can be used on top of Anduril, a workflow management software for scientific data analysis, as well as on its own. To test the performance of our solution, we use a workflow for post-processing and analyzing RNA-Seq data that originates from lymphoma patients. Data consists of 447 samples independent from each other that can be processed in parallel. To evaluate the performance we employ three different metrics: a level of parallelization, execution time and CPU load. The workflow achieved an excellent level of parallelization for the provided data of 447 samples with the upper bound of 894 cores. Execution times were compared in two different manners: with a set of homogenous samples of various sizes and heterogenous samples. Homogenous samples took on average the similar amount of time regardless of the size of the set. With heterogenous sets the execution time of the largest sample was chosen as a reference and the total execution time was 32% longer than the baseline. Finally, the CPU load of each component was measured. With homogenous sets high CPU load was observed, while with heterogenous sets CPU idling was detected. en
dct.language en
ethesis.language.URI http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language English en
ethesis.language englanti fi
ethesis.language engelska sv
ethesis.thesistype pro gradu-avhandlingar sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype.URI http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram Bioinformatics en
dct.identifier.urn URN:NBN:fi-fe2017112251842
dc.type.dcmitype Text

Files in this item

Files Size Format View
thesis.pdf 988.6Kb PDF

This item appears in the following Collection(s)

Show simple item record