Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "QA system"

Sort by: Order: Results:

  • Rintaniemi, Ari-Heikki (2024)
    In this thesis a Retrieval-Augmented Generation (RAG) based Question Answering (QA) system is implemented. The RAG framework is composed of three components: a data storage, a retriever and a generator. To evaluate the performance of the system, a QA dataset is created from Prime minister Orpo's Government Programme. The QA pairs are created by human and also generated by using transformer-based language models. Experiments are conducted by using the created QA dataset to evaluate the performance of the different options to implement the retriever (both traditional algorithmic and transformer-based language models) and generator (transformer-based language models) components. The language model options used in the generator component are the same which were used for generating QA pairs to the QA dataset. Mean reciprocal rank (MRR) and semantic answer similarity (SAS) are used to measure the performance of the retriever and generator component, respectively. The used SAS metric turns out to be useful for providing an aggregated level view on the performance of the QA system, but it is not an optimal evaluation metric for every scenario identified in the results of the experiments. Inference costs of the system are also analysed, as commercial language models are included in the evaluation. Analysis of the created QA dataset shows that the language models generate questions that tend to reveal information from the underlying paragraphs, or the questions do not provide enough context, making the questions difficult to answer for the QA system. The human created questions are diverse and thus more difficult to answer compared to the language model generated questions. The QA pair source affects the results: the language models used in the generator component receive on average high score answers to QA pairs which they had themselves generated. In order to create a high quality QA dataset for QA system evaluation, human effort is needed for creating the QA pairs, but also prompt engineering could provide a way to generate more usable QA pairs. Evaluation approaches for the generator component need further research in order to find alternatives that would provide an unbiased view to the performance of the QA system.