Skip to main content
Login | Suomeksi | På svenska | In English

Minimum Description Length Models for Unsupervised Learning of Morphology

Show simple item record

dc.date.accessioned 2016-08-17T10:01:20Z und
dc.date.accessioned 2017-10-24T12:24:14Z
dc.date.available 2016-08-17T10:01:20Z und
dc.date.available 2017-10-24T12:24:14Z
dc.date.issued 2016-08-17T10:01:20Z
dc.identifier.uri http://radr.hulib.helsinki.fi/handle/10138.1/5723 und
dc.identifier.uri http://hdl.handle.net/10138.1/5723
dc.title Minimum Description Length Models for Unsupervised Learning of Morphology en
ethesis.department.URI http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department Institutionen för datavetenskap sv
ethesis.department Department of Computer Science en
ethesis.department Tietojenkäsittelytieteen laitos fi
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty.URI http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university Helsingfors universitet sv
ethesis.university University of Helsinki en
ethesis.university Helsingin yliopisto fi
dct.creator Nouri, Javad
dct.issued 2016
dct.language.ISO639-2 eng
dct.abstract This thesis work introduces an approach to unsupervised learning of morphological structure of human languages. We focus on morphologically rich languages and the goal is to construct a knowledge-free and language-independent model. This model works by receiving a long list of words in a language and is expected to learn how to segment the input words in a way that the resulting segments correspond to morphemes in the target language. Several improvements inspired by well-motivated linguistic principles of morphology of languages are introduced to the proposed MDL-based learning algorithm. In addition to the learning algorithm, a new evaluation method and corresponding resources are introduced. Evaluation of morphological segmentations is a challenging task due to the inherent ambiguity of natural languages and underlying morphological processes such as fusion which encumber identification of unique 'correct' segmentations for words. Our evaluation method addresses the problem of segmentation evaluation with a focus on consistency of segmentations. Our approach is tested on data from Finnish, Turkish, and Russian. Evaluation shows a gain in performance over the state of the art. en
dct.language en
ethesis.language.URI http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language English en
ethesis.language englanti fi
ethesis.language engelska sv
ethesis.thesistype pro gradu-avhandlingar sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype.URI http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram Algorithms and Machine Learning en
dct.identifier.urn URN:NBN:fi-fe2017112252174
dc.type.dcmitype Text

Files in this item

Files Size Format View
javad-thesis-05-final.pdf 1.033Mb PDF

This item appears in the following Collection(s)

Show simple item record