Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Lybeck, Lasse"

Sort by: Order: Results:

  • Lybeck, Lasse (2015)
    Speech is the most common form of human communication. An understanding of the speech production mechanism and the perception of speech is therefore an important topic when studying human communication. This understanding is also of great importance both in medical treatment regarding a patient's voice and in human-computer interaction via speech. In this thesis we will present a model for digital speech called the source-filter model. In this model speech is represented with two independent components, the glottal excitation signal and the vocal tract filter. The glottal excitation signal models the airflow created at the vocal folds, which works as the source for the created speech sound. The vocal tract filter describes how the airflow is filtered as it travels through the vocal tract, creating the sound radiated to the surrounding space from the lips, which we recognize as speech. We will also present two different parametrized models for the glottal excitation signal, the Rosenberg-Klatt model (RK-model) and the Liljencrants-Fant model (LF-model). The RK-model is quite simple, being parametrized with only one parameter in addition to the fundamental frequency of the signal, while the LF-model is more complex, taking in four parameters to define the shape of the signal. A transfer function for vocal tract filter is also derived from a simplified model of the vocal tract. Additionally, relevant parts of the theory of signal processing are presented before the presentation of the source-filter model. A relatively new model for glottal inverse filtering (GIF), called the Markov chain Monte Carlo method for glottal inverse filtering (MCMC-GIF) is also presented in this thesis. Glottal inverse filtering is a technique for estimating the glottal excitation signal from a recorded speech sample. It is a widely used technique for example in phoniatrics, when inspecting the condition of a patient's vocal folds. In practice the aim is to separate the measured signal into the glottal excitation signal and the vocal tract filter. The first method for solving glottal inverse filtering was proposed in the 1950s and since then many different methods have been proposed, but so far none of the methods have been able to yield robust estimates for the glottal excitation signal from recordings with a high fundamental frequency, such as women's and children's voices. Recently, using synthetic vowels, MCMC-GIF has been shown to produce better estimates for these kind of signals compared to other state of the art methods. The MCMC-GIF method requires an initial estimate for the vocal tract filter. This is obtained from the measurements with the iterative adaptive inverse filtering (IAIF) method. A synthetic vowel is then created with the RK-model and the vocal tract filter, and compared to the measurements. The MCMC method is then used to adjust the RK excitation parameter and the parameters for the vocal tract filter to minimize the error between the synthetic vowel and the measurements, and ultimately receive a new estimate for the vocal tract filter. The filter can then be used to calculate the glottal excitation signal from the measurements. We will explain this process in detail, and give numerical examples of the results of the MCMC-GIF method compared against the IAIF method.