Skip to main content
Login | Suomeksi | På svenska | In English

Comparison of spliced alignment software in analyzing RNA- Seq data

Show full item record

Title: Comparison of spliced alignment software in analyzing RNA- Seq data
Author(s): Kuosmanen, Anna
Contributor: University of Helsinki, Faculty of Science, Department of Computer Science
Language: English
Acceptance year: 2013
Abstract:
A recently developed protocol for sequencing RNA in a cell in a high-throughput manner, RNA-seq, generates from hundreds of thousands to a few billion short sequence fragments from each RNA sample. Aligning these fragments, or 'reads', to the reference genome in a fast and accurate manner is a challenging task that has been tackled by many researchers over the past five years. In this thesis I review the process of RNA-seq data creation and analysis, and introduce and compare some of the popular alignment software. As part of the thesis, I implemented an alignment software based on the novel idea of a limited range BWT-transformed index. This software, called SpliceAligner, is also introduced in detail. In addition to my own software, I chose for comparison Tophat, SpliceMap, MapSplice, SOAPsplice and SHRiMP2. I tested the chosen software on simulated data sets with read lengths of 50, 100, 150 and 250 base pairs, as well as with data from a real RNA-seq experiment. I ranked the software based on the running time, number of reads mapped and the accuracy of the alignments. I also predicted transcripts from the alignments of the simulated data, and measured the correctness of the predictions. With read lengths of 50 base pairs, 100 base pairs and 150 base pairs, speed, alignment accuracy and ease of use make Tophat a solid top choice. MapSplice is a comparable choice in speed and alignment accuracy, and SOAPsplice is only slightly behind, but their user interfaces are much more complicated. However, Tophat slowed down significantly as the read length increased to 250 base pairs and SOAPsplice completely failed to run with 250 base pairs long reads. This leaves MapSplice as the top choice for long reads in most cases. My software SpliceAligner was competitive in the alignment accuracy with the top choices, but there still remains work to be done on the running speed as well as on multiple small optimizations.


Files in this item

Files Size Format View
masters_thesis_Kuosmanen_Anna.pdf 2.632Mb PDF

This item appears in the following Collection(s)

Show full item record