Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Tapper, Suvi"

Sort by: Order: Results:

  • Tapper, Suvi (2023)
    The purpose of this thesis is to examine the prosodic features of English dialects using WaveNet. The exact goal is to investigate whether the differences in prosody between the dialects are present in the data and the results, and whether the geographical distance between the cities included in the data has any influence on this. Another aim is to see how the prosodic features of the sentence types present in the data and their possible differences are manifested in the data and the results. Prosody is concerned with those characteristics of speech which cover more than just individual sounds. Prosodic features can further be divided into paralinguistic features, such as the rate of speech and pausing, and linguistic features, like intonation. Parameters useful for analysing prosody are fundamental frequency (f0), intensity and voice quality – we are interested in the first two. Fundamental frequency is the speed of the vibration of the vocal folds while speaking. Intensity in turn is connected to the changes of air pressure while speaking. The data used for this study is the IViE corpus (Intonational Variation in English), comprising of recordings done in nine British cities – Belfast, Bradford, Cambridge, Cardiff, Dublin, Leeds, Liverpool, London and Newcastle, with approximately 12 speakers per city. In three of the cities, Bradford, Cardiff and London, the dialect is that of a minority. The part of the corpus chosen for this study is a set of 22 sentences consisting of five sentence types. The analysis was performed using WaveNet, a convolutional neural network. It uses causal convolutions to ensure the data is processed correctly. In addition to being conditioned on the output of the network itself, it can also be conditioned using embeddings. The WaveNet implementation used here has two embedding layers – target and normalisation embeddings. Before the analysis the data was pre-processed and the relevant information concerning the fundamental frequency and intensity were extracted from the sound files. A corresponding *.time file was also created for each of the sound files, with the aim of minimising the influence of the possible differences in length between sentences and thus improve the network's ability to recognise the intonation contours correctly. The results are presented in the form of dendrograms, depicting the relationships between the dialects and sentence types – both separately and as a combination of the dialects and sentence types. It was shown, that the differences in prosody were in fact manifested in the data for both dialects and sentence types, although not exactly as expected. The geographical proximity did not seem to influence the dialectal similarities as much as was assumed – in addition to other influences this might also be due to some of the dialects being minority dialects in the cities, and therefore not necessarily so easily comparable to the dialects of the neighbouring area as the majority dialects might have been.