Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "multi-armed bandit task"

Sort by: Order: Results:

  • Koli, Jaakko (2022)
    Humans need to reason about the unknown constantly utilising similar existing knowledge as well as explore the unknown to gather more information for the future. I investigate this kind of human exploration and extrapolation in simple conceptual and spatial tasks in this thesis using Bayesian optimisation. My work extends Wu et al. paper Similarities and differences in spatial and non-spatial cognitive maps [Wu et al., 2020] where they model human exploration and extrapolation with Bayesian optimisation using an acquisition function and an activation function to represent human exploration and a Gaussian process to model the participant's belief of the environment based on the knowledge they acquire. Wu et al. use Bayesian optimisation to model human behaviour in these tasks as their main model of choice. Their model consists of a Gaussian process with a Radial Basis Function (RBF) kernel, Upper Confidence Bound (UCB) acquisition function and softmax activation function to transform the output of the acquisition function. Their model has three free parameters: the length scale of the RBF kernel λ describing the extent of generalisation, the exploration bonus of UCB sampling β and the temperature of softmax activation function τ [Wu et al., 2020]. I attempt to extend their work by allowing the length scale parameter λ of the RBF kernel to change when participants explore the presented space and gather more information. This will model how the participants learn the extent of generalisation as they explore the space and gain more knowledge of the underlying environment. This model with a changing length scale parameter managed to improve the goodness of fit when compared to the model used by Wu et al. [Wu et al., 2020], but it failed to capture all of the behavioural differences between spatial and conceptual tasks. It is possible that the values estimated for the length scale parameter λ could have also absorbed information that would have otherwise allowed the other parameters τ and β to capture the differences between the spatial and conceptual tasks. This thesis provides a basis for further research of human exploration and extrapolation utilising Bayesian optimisation with a changing degree of generalisation where the aforementioned shortcomings could be mitigated for example by designing the experiment in a way that provides more information about the participant's belief of the environment during each trial.