Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "deep learning"

Sort by: Order: Results:

  • Törö, Tuukka (2022)
    In recent years, advances in deep learning have made it possible to develop neural speech synthesizers that not only generate near natural speech but also enable us to control its acoustic features. This means it is possible to synthesize expressive speech with different speaking styles that fit a given context. One way to achieve this control is by adding a reference encoder on the synthesizer that works as a bottleneck modeling a prosody related latent space. The aim of this study was to analyze how the latent space of a reference encoder models diverse and realistic speaking styles, and what correlation there is between the phonetic features of encoded utterances and their latent space representations. Another aim was to analyze how the synthesizer output could be controlled in terms of speaking styles. The model used in the study was a Tacotron 2 speech synthesizer with a reference encoder that was trained with read speech uttered in various styles by one female speaker. The latent space was analyzed with principal component analysis on the reference encoder outputs for all of the utterances in order to extract salient features that differentiate the styles. Basing on the assumption that there are acoustic correlates to speaking styles, a possible connection between the principal components and measured acoustic features of the encoded utterances was investigated. For the synthesizer output, two evaluations were conducted: an objective evaluation assessing acoustic features and a subjective evaluation assessing appropriateness of synthesized speech in regard to the uttered sentence. The results showed that the reference encoder modeled stylistic differences well, but the styles were complex with major internal variation within the styles. The principal component analysis disentangled the acoustic features somewhat and a statistical analysis showed a correlation between the latent space and prosodic features. The objective evaluation suggested that the synthesizer did not produce all of the acoustic features of the styles, but the subjective evaluation showed that it did enough to affect judgments of appropriateness, i.e., speech synthesized in an informal style was deemed more appropriate than formal style for informal style sentences and vice versa.
  • Berg, Anton (2022)
    This master's thesis seeks to conceptually replicate psychologist Michael Kosinski's study, published in 2021 in Nature Scientific Reports, in which he trained a cross-validated logistic regression model to predict political orientations from facial images. Kosinski reported that his model achieved an accuracy of 72\%, which is significantly higher than the 55\% accuracy measured in humans for the same task. Kosinski's research attracted a huge amount of attention and also accusations of pseudoscience. Where Kosinski trained his model with facial features containing information for example about head position and emotions, in this thesis I use a deep learning convolutional neural network for the same task. Also, I train my model with Finnish data, consisting of photos of the faces of Finnish left- and right-wing candidates gathered from the 2021 municipal elections. I research whether a convolutional neural network can learn to predict from candidates' faces whether a member of a Finnish party belongs to either the right-wing Coalition Party (Coalition) or the left-wing Left Alliance (Left Alliance) with better than 55\% accuracy, and what is the possible role of color information on the classification accuracy of the model. On this basis, I also consider the wider ethical issues surrounding these types of models and the technological advances they bring. There has been a recent ethical debate on the widespread use of facial recognition technology in relation to issues such as human autonomy, privacy, and civil liberties. In the context of previous scientific findings, there has also been debate about the potential ability of facial recognition technologies to reveal information about our most personal traits, such as sexual orientation, personality, and emotional states. Thus, facial recognition technologies are also closely related to privacy issues. In his original article, Michael Kosinski did not underestimate the many problematic ethical issues that the use of facial recognition technology can raise. He did, however, underline the role of science in trying to determine the function, capability, and accuracy of these technologies. Only through research can we gain insights into these technologies, which can then potentially be used to inform societal decision-making. This research approach is also the aim of this Master's thesis.