Skip to main content
Login | Suomeksi | På svenska | In English

Novel approaches to computationally predict contacts between amino acids in protein native states

Show full item record

Title: Novel approaches to computationally predict contacts between amino acids in protein native states
Author(s): Hartonen, Tuomo
Contributor: University of Helsinki, Faculty of Science, Department of Physics
Discipline: Theoretical Physics
Language: English
Acceptance year: 2013
Abstract:
Ability to deduce three-dimensional structure of a protein from its one-dimensional amino acid chain is a long-standing challenge in structural biology. Accurate structure prediction has enormous application potential in e.g. drug development and design of novel enzymes. In past this problem has been studied experimentally (X-ray crystallography, nuclear magnetic resonance imaging) and computationally by simulating molecular dynamics of protein folding. However, the latter requires enormous computing resources and the former is expensive and time-consuming. Direct contact analysis (DCA) is an inference method relying on direct correlations measured from multiple sequence alignments (MSA) of protein families to predict contacts between amino acids in the three-dimensional structure of a protein. It solves the 21-state inverse Potts problem of statistical physics, i.e. given the correlations, what are the interactions between the amino acids of a protein. The current state of the art in the DCA approach is the plmDCA-algorithm relying on pseudolikelihood maximization. In this study the performance of the parallelised asymmetric plmDCA-algorithm is tested on a diverse set of more than 100 protein families. It is seen that generally for MSA's with more than approximately 2000 sequences plmDCA is able to predict more than half of the 100 top-scoring contacts correctly with the prediction accuracy increasing almost linearly as a function of the number of sequences. Parallelisation of plmDCA is also observed to make the algorithm tens of times (depending on the number of CPU cores used) faster than the previously described serial plmDCA. Extensions to Potts model taking into account the differences in distributions of gaps and amino acids in MSA's are investigated. An extension incorporating the position-dependant frequencies of gaps of length one to Potts model is found to increase the prediction accuracy for short sequences. Further and more extensive studies are however needed to discover the full potential of this approach.


Files in this item

Files Size Format View
hartonen_tuomo_pro_gradu_150913.pdf 1.679Mb PDF

This item appears in the following Collection(s)

Show full item record