Understanding Auditory Representations of Emotional Expressions with Neural Networks

Iris Wieser, Pablo Barros, Stefan Heinrich, Stefan Wermter

Research output: Journal Article or Conference Article in JournalJournal articleResearchpeer-review

Abstract

In contrast to many established emotion recognition systems, convolutional neural networks do not rely on handcrafted features to categorize emotions. Although achieving state-of-the-art performances, it is still not fully understood what these networks learn and how the learned representations correlate with the emotional characteristics of speech. The aim of this work is to contribute to a deeper understanding of the acoustic and prosodic features that are relevant for the perception of emotional states. Firstly, an artificial deep neural network architecture is proposed that learns the auditory features directly from the raw and unprocessed speech signal. Secondly, we introduce two novel methods for the analysis of the implicitly learned representations based on data-driven and network-driven visualization techniques. Using these methods, we identify how the network categorizes an audio signal as a two-dimensional representation of emotions, namely valence and arousal. The proposed approach is a general method to enable a deeper analysis and understanding of the most relevant representations to perceive emotional expressions in speech.
Original languageEnglish
JournalNeural Computing and Applications
ISSN0941-0643
DOIs
Publication statusPublished - 1 Dec 2018
Externally publishedYes

Keywords

  • Auditory emotion categorization
  • Affect analysis
  • Dimensional emotions
  • Deep neural network

Fingerprint

Dive into the research topics of 'Understanding Auditory Representations of Emotional Expressions with Neural Networks'. Together they form a unique fingerprint.

Cite this