Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Sirokov, Roman"

Sort by: Order: Results:

  • Sirokov, Roman (2014)
    Processing data produced by next-generation sequencing technologies is a computationally intensive task. We aim to speed up this task by means of parallel computing. Our paralel computing solution employs Slurm for managing workload between different nodes. It can be used on top of Anduril, a workflow management software for scientific data analysis, as well as on its own. To test the performance of our solution, we use a workflow for post-processing and analyzing RNA-Seq data that originates from lymphoma patients. Data consists of 447 samples independent from each other that can be processed in parallel. To evaluate the performance we employ three different metrics: a level of parallelization, execution time and CPU load. The workflow achieved an excellent level of parallelization for the provided data of 447 samples with the upper bound of 894 cores. Execution times were compared in two different manners: with a set of homogenous samples of various sizes and heterogenous samples. Homogenous samples took on average the similar amount of time regardless of the size of the set. With heterogenous sets the execution time of the largest sample was chosen as a reference and the total execution time was 32% longer than the baseline. Finally, the CPU load of each component was measured. With homogenous sets high CPU load was observed, while with heterogenous sets CPU idling was detected.