Skip to main content
Login | Suomeksi | På svenska | In English

IPRANK : iterative Alignment with PRANK

Show full item record

Title: IPRANK : iterative Alignment with PRANK
Author(s): Li, Chunxiang
Contributor: University of Helsinki, Faculty of Science, Department of Mathematics and Statistics
Language: English
Acceptance year: 2014
Abstract:
Multiple Sequence Alignment (MSA) is one of the essential methods in molecular biology. The accuracy of MSAs effects downstream analyses such as phylogenetic inference, protein structure prediction, and function prediction. Because of the importance of MSA, current methods try to search for the optimal alignment with different objective functions and heuristics. As a result, different methods perform differently on various tasks. For example, the sequence alignments from methods designed based on structure homology are likely to mislead the comparative and phylogenetic analyses, since these downstream analyses require alignments correctly represent the evolutionary homology. The phylogeny-aware alignment method PRANK by Löytynoja and Goldman has been demonstrated to have good performance in aligning protein-coding genes for the evolutionary and comparative analyses. One of the reasons is that the phylogenetic information is also considered during the alignment in order to distinguish insertions from deletions. It has become the method of choice in the comparative sequence analyses and for instance, in recently published tiger genome study, PRANK was used to align the orthologous genes. However, there are still some issues that need to be resolved in PRANK. First of all, it can be sensitive to errors in the guide phylogenetic tree and bias the resulting alignment. Second, the single-threaded design does not allow PRANK to take advantage of modern CPU architecture or computer clusters, which is a disadvantage when working with large volumes of data. In this thesis, iPRANK, an iterative alignment tool to PRANK, will be introduced. The proposed tool is able to utilize multiple cores and make alignment faster via a divide-and-merge approach which splits the data set into subsets according to a guide tree and then runs PRANK simultaneously on each subset. The iteration of alignment and tree inference allows to search for a good tree and get rid of errors in the initial guide tree to improve the resulting alignment. In this way, iPRANK can estimate the tree and the multiple sequence alignment simultaneously for a set of unaligned sequences. In addition to improved alignment of single data set, the developed tool is also capable of inferring phylogeny from data sets consisting of multiple genes via gene concatenation strategy. By performing extensive studies on a set of simulation data, it will demostrate that our developed tool can run PRANK on a large computer cluster and produce improved alignments and phylogenetic trees compared to other approaches.


Files in this item

Files Size Format View
thesis.pdf 2.383Mb PDF

This item appears in the following Collection(s)

Show full item record