Skip to main content
Login | Suomeksi | På svenska | In English

Correlation analysis for evaluation of functional annotation methods of proteins

Show full item record

Title: Correlation analysis for evaluation of functional annotation methods of proteins
Author(s): Kumar, Ajay Anand
Contributor: University of Helsinki, Faculty of Science, Department of Mathematics and Statistics
Language: English
Acceptance year: 2012
Abstract:
Due to next generation of sequencing technologies the amount of public sequence data is exponentially growing, however the rate of sequence annotation is lagging behind. There is need for development of robust computational tools for correct assignment of annotation to protein sequences. Sequence homology based inference of molecular function assignment and subsequent transfer of the annotation is the traditional way of annotating genome sequences. TF-IDF based methodology of mining informative description of high quality annotated sequences can be used to cluster functionally similar and dissimilar protein sequences. The aim of this thesis work is to perform the correlation analysis of TF-IDF methodology with standard methods of Gene Ontology (GO) semantic similarity measures. We have developed and implemented a high-throughput tool named GOParGenPy for effective and faster analysis related to Gene Ontology. It incorporates any Gene Ontology linked annotation file and generates corresponding data matrices, which provides a useful interface for any downstream analysis associated with Gene Ontology across various mathematical platforms. Finally, the correlation evaluation between TF-IDF and standard Gene Ontology semantic similarity methods validates the effectiveness of TF-IDF methodology in order to cluster functionally similar protein sequences.


Files in this item

Files Size Format View
merged_document_final.pdf 1.985Mb PDF

This item appears in the following Collection(s)

Show full item record