Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Rouvinen, Alina"

Sort by: Order: Results:

  • Rouvinen, Alina (2023)
    Smiling is fundamentally human but a more complex phenomenon than might appear at first glance. Studies in the field of language sciences have explored smiling in the context of speech and found that speaking while smiling has perceivable effects on the voice, and this phenomenon is commonly known as “smiling voice”. Although this phenomenon is widely recognised, there is no clear consensus on the precise acoustic characteristics that cue listeners to the presence of a smile. This study aims to investigate whether listeners can identify smiling voice based only on audio stimuli, what prosodic cues or characteristics they might be using to do so, and whether those cues can be extracted and used to replicate smiling voice using speech synthesis. Another aim of this study is to determine whether the level of perceived smiliness can be controlled in synthetic speech. These issues are addressed with the objectives of adding to the understanding of smiling voice in the field of phonetics and exploring the potential of speech synthesis technology for producing expressive speech. A corpus of Finnish speech was used to conduct a preliminary listening experiment where participants compared neutral and positive utterances in a questionnaire and indicated whether the speaker was smiling in the latter. Utterances that were identified as smiley were analysed acoustically to detect prosodic differences between neutral and smiley speech. Based on the results, formant frequencies F2 and F3 and centre of gravity were selected as prosodic cues to control smiling in speech synthesis. The speech synthesiser was a Tacotron 2 system, including a reference encoder, which was already trained on the speech corpus used. Synthesis evaluation was conducted with a second questionnaire where participants listened to the synthesised utterances and indicated how strongly the speaker was smiling. The results of the first questionnaire showed that listeners were able to distinguish neutral and smiley speech, and subsequent acoustic analyses indicated significant effects of smiling on fundamental frequency, formant frequencies, and centre of gravity. Speech synthesis evaluation results further indicated that F2, F3, and centre of gravity can be used to control the level of perceived smiling at least to the extent of a binary distinction. However, the evaluation showed that more sophisticated control of the level of smiling voice was not achieved.