Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "representativeness"

Sort by: Order: Results:

  • Matulis, Haralds (2024)
    Master’s thesis, guided by an overarching research question of usability of computational methods in the study of digitized collection of self-writing, is examining Latvian Diary Corpus (LDC), which was compiled in 2021 and contains 36 handwritten, digitized diaries spanning from 1917 to 2012, with a corpus size totalling 2,771,300 tokens. The theoretical and methodological framework of the master’s thesis is situated in digital humanities, drawing on corpus linguistics approach in corpora compilation, and informed by digital curation and archival practices of cultural heritage domain. Diary, as a genre, is a part of the self-writing field, and various humanities disciplines, such as folkloristics, literary studies, and cultural anthropology, examine diary from different viewpoints. The main body of master’s thesis is structured into an Introduction, four chapters, and Conclusions. Chapters build onto each other to discuss from different perspectives: (1) the representativeness and heterogeneity of digital collections in the humanities; (2) conceptualizations of the diary in self-writing research field and how these theoretical concepts translate into practical decisions regarding diary domain operationalization, crowdsourcing diaries from population, and methodological border cases encountered by curators in composing LDC; (3) statistical exploration of LDC and probing the correlation between diary length and four variables: time intervals between diary entries, and three linguistic features – personal pronouns, past and present tense, activity and non-activity verbs. The results of computational analysis reveal significant variance in diaries, suggesting not only diverse writing styles of individual diarists but also structural heterogeneity within LDC. There is a reason to believe that texts in LDC, merged under the umbrella term of “diary”, contain several specific sub-genres of self-writing, each with its own distinct signature. Starting the research by inquiring the concept of representativeness, the findings of this master’s thesis suggest that it also could be fruitful to study heterogeneous digital collections, their diversity being not a drawback, but a source of richness, which can be further leveraged by using computational methods to uncover and analyze these heterogeneities, which are then assessed by critical reading methods. To apply computational methods on heterogeneous humanities collections with a sufficient degree of generalizability of results, the master’s thesis proposes careful domain operationalization, source criticism, and cultural analysis steps, and then, guided be the particular research question, subsetting a homogeneous sub-corpus from a larger collection of heterogenous items for computational analysis.