Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "speech"

Sort by: Order: Results:

  • Mäkelin, Minnea (2022)
    Abstract Introduction: Children with nonsyndromic cleft lip and/or palate have smaller consonant inventories, less accurate articulation, and more speech errors than their peers without clefts. Speech and dental arch relationships have widely been the primary outcome measure of palate repair. Aims: The aim was to evaluate the occurrence of misarticulations of the Finnish alveolar consonants /s/, /l/ and /r/ and their possible relationship with maxillary dental arch dimensions in 5-year-old children with unilateral cleft lip and palate (UCLP). Materials and methods: Subgroup analysis was conducted within a multicenter controlled trial of primary surgery (Scandcleft project). 46 Caucasian Finnish-speaking patients (29 boys) with non-syndromic complete UCLP were evaluated retrospectively. Production of the Finnish alveolar consonants /s/, /l/ and /r/ was assessed from standardized audio recordings at the mean age of 5.06 years (range 4.82-5.89). Articulation errors were categorized as either correct, distortion, substitution, or omission. Maxillary dental arch measurements were assessed using the technique of Moorrees from plaster casts taken at the same age. Additionally, the anterior and posterior palatal heights were measured. Aspin-Welch Unequal-Variance T-Test, Equal-Variance T-Test and Mann-Whitney U test were used in the statistical analyses. Kappa statistics were calculated to assess reliability. Results: Only one of the children articulated all the studied sounds correctly. 93.2% misarticulated /r/, 63.0% misarticulated /s/ and 39.1% misarticulated /l/. Distortions and substitutions were common. Omissions were sparse. There was no relationship between the occurrence of alveolar consonant misarticulations and the maxillary dental arch dimensions. Intra- and interrater agreements varied between moderate to excellent. Conclusions: Children with UCLP have a notable amount of alveolar consonant misarticulations. Maxillary dental arch dimensions were not related to the misarticulation of /s/, /l/ or /r/ in 5-year-old children with UCLP.
  • Mäkelin, Minnea (2022)
    Abstract Introduction: Children with nonsyndromic cleft lip and/or palate have smaller consonant inventories, less accurate articulation, and more speech errors than their peers without clefts. Speech and dental arch relationships have widely been the primary outcome measure of palate repair. Aims: The aim was to evaluate the occurrence of misarticulations of the Finnish alveolar consonants /s/, /l/ and /r/ and their possible relationship with maxillary dental arch dimensions in 5-year-old children with unilateral cleft lip and palate (UCLP). Materials and methods: Subgroup analysis was conducted within a multicenter controlled trial of primary surgery (Scandcleft project). 46 Caucasian Finnish-speaking patients (29 boys) with non-syndromic complete UCLP were evaluated retrospectively. Production of the Finnish alveolar consonants /s/, /l/ and /r/ was assessed from standardized audio recordings at the mean age of 5.06 years (range 4.82-5.89). Articulation errors were categorized as either correct, distortion, substitution, or omission. Maxillary dental arch measurements were assessed using the technique of Moorrees from plaster casts taken at the same age. Additionally, the anterior and posterior palatal heights were measured. Aspin-Welch Unequal-Variance T-Test, Equal-Variance T-Test and Mann-Whitney U test were used in the statistical analyses. Kappa statistics were calculated to assess reliability. Results: Only one of the children articulated all the studied sounds correctly. 93.2% misarticulated /r/, 63.0% misarticulated /s/ and 39.1% misarticulated /l/. Distortions and substitutions were common. Omissions were sparse. There was no relationship between the occurrence of alveolar consonant misarticulations and the maxillary dental arch dimensions. Intra- and interrater agreements varied between moderate to excellent. Conclusions: Children with UCLP have a notable amount of alveolar consonant misarticulations. Maxillary dental arch dimensions were not related to the misarticulation of /s/, /l/ or /r/ in 5-year-old children with UCLP.
  • Leino, Leevi (2024)
    A presentetaion of the basic tools of traditional audio deconvolution and a supervised NMF algorithm to enhance a filtered and noisy speech signal.
  • Salo, Laura (2019)
    Goals. This thesis explores what are the convincing prosodic features of Finnish speech, concen-trating on two main features, fundamental frequency and speech rate. Earlier studies into the pro-sodic features of speech have shown listeners perceive speakers with lower fundamental frequen-cy (f0) and higher speech rate to be more convincing. I am also trying to establish whether there can be found an interaction between speaker’s gender and credibility. However, at the point of publication and to the best of the author’s knowledge, there has not been any published research regarding what are the convincing parameters in Finnish speech prosody. In light of the above, the hypothesis of this research is: The prosodic features of a convincing speech in the Finnish language do not differ from the prosodic features to be convincing (proven by published re-search) in other western and European languages. The purpose of this study is to provide addi-tional information to the field of speech research in Finnish language. Methods. This was a quantitative study and involved the use of both listening experiments and statistical tests. The listening experiment was used to examine the prosodic features of convincing speech, with 16 statements being collected from European Parliament's plenary website, two statements each from four Finnish male MEPs and 4 Finnish female MEPs. Each statement was first delexicalized, and out of each delexicalized statement, eight new manipulations were creat-ed, for a total of 64 manipulated statements. These manipulations involved raising and lowering the fundamental frequency by ±4 semitones and both speeding up and speeding down the speech rate by ±1.5 seconds. This resulted in the eight manipulated statements for each lexicalized state-ment being classified as: high, low, fast, slow, high-fast, high-slow, low-fast and low-slow. During the listening experiment, each manipulated statement was compared to a non-modified statement. Twelve native Finnish-speaking subjects participated in the experiment; during which they lis-tened to sounds in pairs of two (manipulated vs. non-manipulated), after which the subjects an-swered the question “Which of the statements is more convincing: the first or second one?” Results and conclusions. In conclusion, it was observed that a lower fundamental frequency and higher speech rate were perceived as more convincing than a higher fundamental frequency and lower speech rate. This matches previous research findings on other European languages and due to the statistically significant results we saw between lower f0, faster speech rate and convincing speech, this allows us to prove this thesis’ hypothesis, that convincing prosodic features in the Finnish language are the same as those identified in English language.
  • Lahti, Lauri (Helsingin yliopistoUniversity of HelsinkiHelsingfors universitet, 2006)
    The study examines various uses of computer technology in acquisition of information for visually impaired people. For this study 29 visually impaired persons took part in a survey about their experiences concerning acquisition of infomation and use of computers, especially with a screen magnification program, a speech synthesizer and a braille display. According to the responses, the evolution of computer technology offers an important possibility for visually impaired people to cope with everyday activities and interacting with the environment. Nevertheless, the functionality of assistive technology needs further development to become more usable and versatile. Since the challenges of independent observation of environment were emphasized in the survey, the study led into developing a portable text vision system called Tekstinäkö. Contrary to typical stand-alone applications, Tekstinäkö system was constructed by combining devices and programs that are readily available on consumer market. As the system operates, pictures are taken by a digital camera and instantly transmitted to a text recognition program in a laptop computer that talks out loud the text using a speech synthesizer. Visually impaired test users described that even unsure interpretations of the texts in the environment given by Tekstinäkö system are at least a welcome addition to complete perception of the environment. It became clear that even with a modest development work it is possible to bring new, useful and valuable methods to everyday life of disabled people. Unconventional production process of the system appeared to be efficient as well. Achieved results and the proposed working model offer one suggestion for giving enough attention to easily overlooked needs of the people with special abilities. ACM Computing Classification System (1998): K.4.2 Social Issues: Assistive technologies for persons with disabilities I.4.9 Image processing and computer vision: Applications
  • Koutonen, Anniina (2024)
    Aims: Speech production involves constrictions made in the vocal tract, and some of them can be seen by the perceiver. Consequently, lipreading is a natural part of human communication. Talker differences in visual intelligibility have been observed, but the features of a clear talker are still poorly known. Previous research on visual speech perception has mainly focused on the dynamic features of the lips, which are key articulators e.g. for labial consonants. For many other consonants, the main articulation happens inside the mouth. The current study investigated the relationship between visual articulatory features and visual speech perception. The study aimed to identify articulatory features that perceivers use to recognize spoken stop consonants visually. It was hypothesized that greater lip opening, longer duration, and hence greater visibility of articulators inside the mouth (teeth and tongue) would be related to recognition. Methods: 52 Finnish adults participated in a syllable recognition test. The stimuli analyzed in the current study were easily confusable visual stop consonants [k] and [t] articulated in CV (consonant-vowel [a]) context by 8 talkers (4 Finnish and 4 Japanese, 2 males and 2 females in both). The following visual features were analyzed from the video clips: (1) maximum mouth opening, (2) timing of the maximum mouth opening, (3) upper teeth visibility, and (4) lower teeth visibility throughout the visual stimulus, as well as (5) mouth opening, (6) tongue visibility and (7) motion blur in the consonant closure frame (when the tongue stopped the airflow in the soft palate for [k] and behind the upper front teeth in [t]). The relationship between visual features and perception responses was examined using hierarchical regression models. The model including the visual features contributing to consonant recognition was formed using forward stepwise regression. Pearson’s correlations between visual features and response proportions were also calculated. Analyses were conducted for correct responses to visual [k] and [t], as well as their confusions with each other. Results and conclusions: Tongue visibility, motion blur, and upper teeth visibility were the best predictors for the recognition of [k]. However, the blur effect finding was unreliable due to heteroscedasticity. Lower teeth visibility predicted the recognition of [t]. Measures of mouth opening correlated with recognition of both consonants but were not significant predictors. The talker’s ethnicity and gender were significant predictors of the recognition, though these differences were likely to be at least partly due to individual differences in a small sample of talkers. This study provides a greater understanding of static articulatory features’ contribution to the recognition of stop consonants in lipreading. The results of this study show that it is not only how open the mouth is during articulation but also what can be seen inside the mouth.
  • Koutonen, Anniina (2024)
    Aims: Speech production involves constrictions made in the vocal tract, and some of them can be seen by the perceiver. Consequently, lipreading is a natural part of human communication. Talker differences in visual intelligibility have been observed, but the features of a clear talker are still poorly known. Previous research on visual speech perception has mainly focused on the dynamic features of the lips, which are key articulators e.g. for labial consonants. For many other consonants, the main articulation happens inside the mouth. The current study investigated the relationship between visual articulatory features and visual speech perception. The study aimed to identify articulatory features that perceivers use to recognize spoken stop consonants visually. It was hypothesized that greater lip opening, longer duration, and hence greater visibility of articulators inside the mouth (teeth and tongue) would be related to recognition. Methods: 52 Finnish adults participated in a syllable recognition test. The stimuli analyzed in the current study were easily confusable visual stop consonants [k] and [t] articulated in CV (consonant-vowel [a]) context by 8 talkers (4 Finnish and 4 Japanese, 2 males and 2 females in both). The following visual features were analyzed from the video clips: (1) maximum mouth opening, (2) timing of the maximum mouth opening, (3) upper teeth visibility, and (4) lower teeth visibility throughout the visual stimulus, as well as (5) mouth opening, (6) tongue visibility and (7) motion blur in the consonant closure frame (when the tongue stopped the airflow in the soft palate for [k] and behind the upper front teeth in [t]). The relationship between visual features and perception responses was examined using hierarchical regression models. The model including the visual features contributing to consonant recognition was formed using forward stepwise regression. Pearson’s correlations between visual features and response proportions were also calculated. Analyses were conducted for correct responses to visual [k] and [t], as well as their confusions with each other. Results and conclusions: Tongue visibility, motion blur, and upper teeth visibility were the best predictors for the recognition of [k]. However, the blur effect finding was unreliable due to heteroscedasticity. Lower teeth visibility predicted the recognition of [t]. Measures of mouth opening correlated with recognition of both consonants but were not significant predictors. The talker’s ethnicity and gender were significant predictors of the recognition, though these differences were likely to be at least partly due to individual differences in a small sample of talkers. This study provides a greater understanding of static articulatory features’ contribution to the recognition of stop consonants in lipreading. The results of this study show that it is not only how open the mouth is during articulation but also what can be seen inside the mouth.