TY - JOUR
T1 - The generation of a [multi’vocal] voice
AU - Jørgensen, Stina Hasse
PY - 2021/4/6
Y1 - 2021/4/6
N2 - Living in a world where machines are talking to us with synthetic voices, it is important to discuss questions of representation and aesthetics. Today most voices in devices and systems are designed to have binary vocal identities. This could be different. Our project aims to inspire a reimagination of the paralinguistics of synthesized voices, exploring how to train and develop the pitch, timbre, pace, and other vocal features beyond speech, based on vocal data from many different people, presenting the idea of a diverse and collective voice, initiating a reflection of the sonic appearance of future synthesized speech that goes beyond the binary. In this contribution we present a first-step approach for generating a multivocal synthesized voice, listening to each stage in the training process to show how the voice develops over time with many different voices in the training pool. We describe our technical approach for training and reflect on the effectiveness of this in regard to making audible a more diverse vocal representation. In the audio paper we reflect on whether current deep learning methods are suitable for our aim of generating a multivocal voice and discuss whether bias within both the dataset and the network itself becomes prominent in the resulting voice. The generation itself perhaps offers an audible example of bias in AI. Our sonic exploration of the multivocal synthetic voice points to the difficulties of applying conventional machine learning approaches, which may be mono-domain focused, when aiming to make a diverse vocal representation audible.
AB - Living in a world where machines are talking to us with synthetic voices, it is important to discuss questions of representation and aesthetics. Today most voices in devices and systems are designed to have binary vocal identities. This could be different. Our project aims to inspire a reimagination of the paralinguistics of synthesized voices, exploring how to train and develop the pitch, timbre, pace, and other vocal features beyond speech, based on vocal data from many different people, presenting the idea of a diverse and collective voice, initiating a reflection of the sonic appearance of future synthesized speech that goes beyond the binary. In this contribution we present a first-step approach for generating a multivocal synthesized voice, listening to each stage in the training process to show how the voice develops over time with many different voices in the training pool. We describe our technical approach for training and reflect on the effectiveness of this in regard to making audible a more diverse vocal representation. In the audio paper we reflect on whether current deep learning methods are suitable for our aim of generating a multivocal voice and discuss whether bias within both the dataset and the network itself becomes prominent in the resulting voice. The generation itself perhaps offers an audible example of bias in AI. Our sonic exploration of the multivocal synthetic voice points to the difficulties of applying conventional machine learning approaches, which may be mono-domain focused, when aiming to make a diverse vocal representation audible.
KW - Synthesized voice
KW - Listening to training process
KW - Diversity of vocal representation
KW - Bias in AI
KW - Audible machine learning
KW - Voices beyond the binary
KW - Synthesized voice
KW - Listening to training process
KW - Diversity of vocal representation
KW - Bias in AI
KW - Audible machine learning
KW - Voices beyond the binary
U2 - 10.48233/SEISMOGRAF2612
DO - 10.48233/SEISMOGRAF2612
M3 - Journal article
SN - 2245-4705
JO - Seismograf/DMT
JF - Seismograf/DMT
ER -