Skip to main content
Login | Suomeksi | På svenska | In English

Correlation analysis for evaluation of functional annotation methods of proteins

Show simple item record

dc.date.accessioned 2012-11-27T06:18:38Z und
dc.date.accessioned 2017-10-24T12:22:10Z
dc.date.available 2012-11-27T06:18:38Z und
dc.date.available 2017-10-24T12:22:10Z
dc.date.issued 2012-11-27T06:18:38Z
dc.identifier.uri http://radr.hulib.helsinki.fi/handle/10138.1/2165 und
dc.identifier.uri http://hdl.handle.net/10138.1/2165
dc.title Correlation analysis for evaluation of functional annotation methods of proteins en
ethesis.department.URI http://data.hulib.helsinki.fi/id/61364eb4-647a-40e2-8539-11c5c0af8dc2
ethesis.department Institutionen för matematik och statistik sv
ethesis.department Department of Mathematics and Statistics en
ethesis.department Matematiikan ja tilastotieteen laitos fi
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty.URI http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university Helsingfors universitet sv
ethesis.university University of Helsinki en
ethesis.university Helsingin yliopisto fi
dct.creator Kumar, Ajay Anand
dct.issued 2012
dct.language.ISO639-2 eng
dct.abstract Due to next generation of sequencing technologies the amount of public sequence data is exponentially growing, however the rate of sequence annotation is lagging behind. There is need for development of robust computational tools for correct assignment of annotation to protein sequences. Sequence homology based inference of molecular function assignment and subsequent transfer of the annotation is the traditional way of annotating genome sequences. TF-IDF based methodology of mining informative description of high quality annotated sequences can be used to cluster functionally similar and dissimilar protein sequences. The aim of this thesis work is to perform the correlation analysis of TF-IDF methodology with standard methods of Gene Ontology (GO) semantic similarity measures. We have developed and implemented a high-throughput tool named GOParGenPy for effective and faster analysis related to Gene Ontology. It incorporates any Gene Ontology linked annotation file and generates corresponding data matrices, which provides a useful interface for any downstream analysis associated with Gene Ontology across various mathematical platforms. Finally, the correlation evaluation between TF-IDF and standard Gene Ontology semantic similarity methods validates the effectiveness of TF-IDF methodology in order to cluster functionally similar protein sequences. en
dct.language en
ethesis.language.URI http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language English en
ethesis.language englanti fi
ethesis.language engelska sv
ethesis.thesistype pro gradu-avhandlingar sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype.URI http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram Bioinformatics en
dct.identifier.urn URN:NBN:fi-fe2017112251746
dc.type.dcmitype Text

Files in this item

Files Size Format View
merged_document_final.pdf 1.985Mb PDF

This item appears in the following Collection(s)

Show simple item record