Skip to main navigation Skip to search Skip to main content

Formant-Based Vowel Categorization for Cross-Lingual Phone Recognition

  • Delft University of Technology

Research output: Journal Article or Conference Article in JournalJournal articleResearchpeer-review

Abstract

Multilingual phone recognition models can learn language-independent pronunciation patterns from large volumes of spoken data and recognize them across languages. This potential can be harnessed to improve speech technologies for underresourced languages. However, these models are typically trained on phonological representations of speech sounds, which do not necessarily reflect the phonetic realization of speech. A mismatch between a phonological symbol and its phonetic realizations can lead to phone confusions and reduce performance. This work introduces formant-based vowel categorization aimed at improving cross-lingual vowel recognition by uncovering a vowel's phonetic quality from its formant frequencies, and reorganizing the vowel categories in a multilingual speech corpus to increase their consistency across languages. The work investigates vowel categories obtained from a trilingual multi-dialect speech corpus of Danish, Norwegian, and Swedish using three categorization techniques. Cross-lingual phone recognition experiments reveal that uniting vowel categories of different languages into a set of shared formant-based categories improves cross-lingual recognition of the shared vowels, but also interferes with recognition of vowels not present in one or more training languages. Cross-lingual evaluation on regional dialects provides inconclusive results. Nevertheless, improved recognition of individual vowels can translate to improvements in overall phone recognition on languages unseen during training.
Original languageEnglish
JournalJournal of the Acoustical Society of America
Volume157
Issue number3
Pages (from-to)2248-2262
Number of pages15
ISSN0001-4966
DOIs
Publication statusPublished - 27 Mar 2025

Keywords

  • Speech communication
  • Phonetics
  • Human voice
  • Phonology
  • Speech sounds
  • Vocalization
  • Speech production
  • Consonants
  • Vowel systems
  • Covariance and correlation

Fingerprint

Dive into the research topics of 'Formant-Based Vowel Categorization for Cross-Lingual Phone Recognition'. Together they form a unique fingerprint.

Cite this