Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "natural language generation"

Sort by: Order: Results:

  • Moilanen, Jouni Petteri (2023)
    In recent years, a concern has grown within the NLG community about the comparability of systems and reproducibility of research results. This concern has mainly been focused on the evaluation of NLG systems. Problems with automated metrics, crowd-sourced human evaluations, sloppy experimental design and error reporting, etc. have been widely discussed in the literature. A lot of proposals for best practices, metrics, frameworks and benchmarks for NLG evaluation have lately been issued to address these problems. In this thesis we examine the current state of NLG evaluation – focusing on data-to-text evaluation – in terms of proposed best practices, benchmarks, etc., and their adoption in practice. Academic publications concerning NLG evaluation indexed in the Scopus database published in 2018-2022 were examined. After manual inspection 141 of those I deemed to contain some kind of concrete proposal for improvements in evaluation practices. The adoption (use in practice) of those was again examined by inspecting papers citing them. There seems to be a willingness in the academic community to adopt these proposals, especially ”best practices” and metrics. As for datasets, benchmarks, evaluation platforms, etc., the results are inconclusive.
  • Rämö, Miia (2020)
    In news agencies, there is a growing interest towards automated journalism. Majority of the systems applied are template- or rule-based, as they are expected to produce accurate and fluent output transparently. However, this approach often leads to output that lacks variety. To overcome this issue, I propose two approaches. In the lexicalization approach new words are included in the sentences, and in relexicalization approach some existing words are replaced with synonyms. Both of the approaches utilize contextual word embeddings for finding suitable words. Furthermore, the above approaches require linguistic resources, which are only available for high- resource languages. Thus, I present variants of the (re)lexicalization approaches that allow their utilization for low-resource languages. These variants utilize cross-lingual word embeddings to access linguistic resources of a high-resource language. The high-resource variants achieved promising results. However, the sampling of words should be further enhanced to improve reliability. The low-resource variants did show some promising results, but the quality suffered from complex morphology of the example language. This is a clear next issue to address and resolving it is expected to significantly improve the results.