Towards faster RNA Sequencing analysis

Towards faster RNA Sequencing analysis

dc.date.accessioned	2014-11-04T12:54:18Z	und
dc.date.accessioned	2017-10-24T12:23:51Z
dc.date.available	2014-11-04T12:54:18Z	und
dc.date.available	2017-10-24T12:23:51Z
dc.date.issued	2014-11-04T12:54:18Z
dc.identifier.uri	http://radr.hulib.helsinki.fi/handle/10138.1/4247	und
dc.identifier.uri	http://hdl.handle.net/10138.1/4247
dc.title	Towards faster RNA Sequencing analysis	en
ethesis.discipline	Computer science	en
ethesis.discipline	Tietojenkäsittelytiede	fi
ethesis.discipline	Datavetenskap	sv
ethesis.discipline.URI	http://data.hulib.helsinki.fi/id/1dcabbeb-f422-4eec-aaff-bb11d7501348
ethesis.department.URI	http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department	Institutionen för datavetenskap	sv
ethesis.department	Department of Computer Science	en
ethesis.department	Tietojenkäsittelytieteen laitos	fi
ethesis.faculty	Matematisk-naturvetenskapliga fakulteten	sv
ethesis.faculty	Matemaattis-luonnontieteellinen tiedekunta	fi
ethesis.faculty	Faculty of Science	en
ethesis.faculty.URI	http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI	http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university	Helsingfors universitet	sv
ethesis.university	University of Helsinki	en
ethesis.university	Helsingin yliopisto	fi
dct.creator	Sirokov, Roman
dct.issued	2014
dct.language.ISO639-2	eng
dct.abstract	Processing data produced by next-generation sequencing technologies is a computationally intensive task. We aim to speed up this task by means of parallel computing. Our paralel computing solution employs Slurm for managing workload between different nodes. It can be used on top of Anduril, a workflow management software for scientific data analysis, as well as on its own. To test the performance of our solution, we use a workflow for post-processing and analyzing RNA-Seq data that originates from lymphoma patients. Data consists of 447 samples independent from each other that can be processed in parallel. To evaluate the performance we employ three different metrics: a level of parallelization, execution time and CPU load. The workflow achieved an excellent level of parallelization for the provided data of 447 samples with the upper bound of 894 cores. Execution times were compared in two different manners: with a set of homogenous samples of various sizes and heterogenous samples. Homogenous samples took on average the similar amount of time regardless of the size of the set. With heterogenous sets the execution time of the largest sample was chosen as a reference and the total execution time was 32% longer than the baseline. Finally, the CPU load of each component was measured. With homogenous sets high CPU load was observed, while with heterogenous sets CPU idling was detected.	en
dct.language	en
ethesis.language.URI	http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language	English	en
ethesis.language	englanti	fi
ethesis.language	engelska	sv
ethesis.thesistype	pro gradu-avhandlingar	sv
ethesis.thesistype	pro gradu -tutkielmat	fi
ethesis.thesistype	master's thesis	en
ethesis.thesistype.URI	http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram	Bioinformatics	en
dct.identifier.urn	URN:NBN:fi-fe2017112251842
dc.type.dcmitype	Text

Files in this item

Files	Size	Format	View
thesis.pdf	988.6Kb	PDF

This item appears in the following Collection(s)

Faculty of Science [4203]

Show simple item record

Towards faster RNA Sequencing analysis

Files in this item

This item appears in the following Collection(s)

Yhteystiedot

HELSINGIN YLIOPISTO