Skip to main content
Login | Suomeksi | På svenska | In English

Minimum Description Length Models for Unsupervised Learning of Morphology

Show full item record

Title: Minimum Description Length Models for Unsupervised Learning of Morphology
Author(s): Nouri, Javad
Contributor: University of Helsinki, Faculty of Science, Department of Computer Science
Language: English
Acceptance year: 2016
Abstract:
This thesis work introduces an approach to unsupervised learning of morphological structure of human languages. We focus on morphologically rich languages and the goal is to construct a knowledge-free and language-independent model. This model works by receiving a long list of words in a language and is expected to learn how to segment the input words in a way that the resulting segments correspond to morphemes in the target language. Several improvements inspired by well-motivated linguistic principles of morphology of languages are introduced to the proposed MDL-based learning algorithm. In addition to the learning algorithm, a new evaluation method and corresponding resources are introduced. Evaluation of morphological segmentations is a challenging task due to the inherent ambiguity of natural languages and underlying morphological processes such as fusion which encumber identification of unique 'correct' segmentations for words. Our evaluation method addresses the problem of segmentation evaluation with a focus on consistency of segmentations. Our approach is tested on data from Finnish, Turkish, and Russian. Evaluation shows a gain in performance over the state of the art.


Files in this item

Files Size Format View
javad-thesis-05-final.pdf 1.033Mb PDF

This item appears in the following Collection(s)

Show full item record