Skip to main content
Login | Suomeksi | På svenska | In English

Comparison of spliced alignment software in analyzing RNA- Seq data

Show simple item record

dc.date.accessioned 2013-11-12T11:11:03Z und
dc.date.accessioned 2017-10-24T12:24:39Z
dc.date.available 2013-11-12T11:11:03Z und
dc.date.available 2017-10-24T12:24:39Z
dc.date.issued 2013-11-12T11:11:03Z
dc.identifier.uri http://radr.hulib.helsinki.fi/handle/10138.1/3234 und
dc.identifier.uri http://hdl.handle.net/10138.1/3234
dc.title Comparison of spliced alignment software in analyzing RNA- Seq data en
ethesis.department.URI http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department Institutionen för datavetenskap sv
ethesis.department Department of Computer Science en
ethesis.department Tietojenkäsittelytieteen laitos fi
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty.URI http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university Helsingfors universitet sv
ethesis.university University of Helsinki en
ethesis.university Helsingin yliopisto fi
dct.creator Kuosmanen, Anna
dct.issued 2013
dct.language.ISO639-2 eng
dct.abstract A recently developed protocol for sequencing RNA in a cell in a high-throughput manner, RNA-seq, generates from hundreds of thousands to a few billion short sequence fragments from each RNA sample. Aligning these fragments, or 'reads', to the reference genome in a fast and accurate manner is a challenging task that has been tackled by many researchers over the past five years. In this thesis I review the process of RNA-seq data creation and analysis, and introduce and compare some of the popular alignment software. As part of the thesis, I implemented an alignment software based on the novel idea of a limited range BWT-transformed index. This software, called SpliceAligner, is also introduced in detail. In addition to my own software, I chose for comparison Tophat, SpliceMap, MapSplice, SOAPsplice and SHRiMP2. I tested the chosen software on simulated data sets with read lengths of 50, 100, 150 and 250 base pairs, as well as with data from a real RNA-seq experiment. I ranked the software based on the running time, number of reads mapped and the accuracy of the alignments. I also predicted transcripts from the alignments of the simulated data, and measured the correctness of the predictions. With read lengths of 50 base pairs, 100 base pairs and 150 base pairs, speed, alignment accuracy and ease of use make Tophat a solid top choice. MapSplice is a comparable choice in speed and alignment accuracy, and SOAPsplice is only slightly behind, but their user interfaces are much more complicated. However, Tophat slowed down significantly as the read length increased to 250 base pairs and SOAPsplice completely failed to run with 250 base pairs long reads. This leaves MapSplice as the top choice for long reads in most cases. My software SpliceAligner was competitive in the alignment accuracy with the top choices, but there still remains work to be done on the running speed as well as on multiple small optimizations. en
dct.language en
ethesis.language.URI http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language English en
ethesis.language englanti fi
ethesis.language engelska sv
ethesis.thesistype pro gradu-avhandlingar sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype.URI http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram Bioinformatics en
dct.identifier.urn URN:NBN:fi-fe2017112252516
dc.type.dcmitype Text

Files in this item

Files Size Format View
masters_thesis_Kuosmanen_Anna.pdf 2.632Mb PDF

This item appears in the following Collection(s)

Show simple item record