ITU

The generation of a [multi’vocal] voice

Research output: Journal Article or Conference Article in JournalJournal articleResearchpeer-review

Standard

The generation of a [multi’vocal] voice. / Jørgensen, Stina Hasse.

In: Seismograf/DMT, 06.04.2021.

Research output: Journal Article or Conference Article in JournalJournal articleResearchpeer-review

Harvard

APA

Vancouver

Author

Bibtex

@article{e9c4bdb30d7e45daa5a4d0732b97c519,
title = "The generation of a [multi{\textquoteright}vocal] voice",
abstract = "Living in a world where machines are talking to us with synthetic voices, it is important to discuss questions of representation and aesthetics. Today most voices in devices and systems are designed to have binary vocal identities. This could be different. Our project aims to inspire a reimagination of the paralinguistics of synthesized voices, exploring how to train and develop the pitch, timbre, pace, and other vocal features beyond speech, based on vocal data from many different people, presenting the idea of a diverse and collective voice, initiating a reflection of the sonic appearance of future synthesized speech that goes beyond the binary. In this contribution we present a first-step approach for generating a multivocal synthesized voice, listening to each stage in the training process to show how the voice develops over time with many different voices in the training pool. We describe our technical approach for training and reflect on the effectiveness of this in regard to making audible a more diverse vocal representation. In the audio paper we reflect on whether current deep learning methods are suitable for our aim of generating a multivocal voice and discuss whether bias within both the dataset and the network itself becomes prominent in the resulting voice. The generation itself perhaps offers an audible example of bias in AI. Our sonic exploration of the multivocal synthetic voice points to the difficulties of applying conventional machine learning approaches, which may be mono-domain focused, when aiming to make a diverse vocal representation audible.",
author = "J{\o}rgensen, {Stina Hasse}",
year = "2021",
month = apr,
day = "6",
doi = "10.48233/SEISMOGRAF2612",
language = "English",
journal = "Seismograf/DMT",
issn = "2245-4705",
publisher = "Foreningen Dansk Musik Tidsskrift",

}

RIS

TY - JOUR

T1 - The generation of a [multi’vocal] voice

AU - Jørgensen, Stina Hasse

PY - 2021/4/6

Y1 - 2021/4/6

N2 - Living in a world where machines are talking to us with synthetic voices, it is important to discuss questions of representation and aesthetics. Today most voices in devices and systems are designed to have binary vocal identities. This could be different. Our project aims to inspire a reimagination of the paralinguistics of synthesized voices, exploring how to train and develop the pitch, timbre, pace, and other vocal features beyond speech, based on vocal data from many different people, presenting the idea of a diverse and collective voice, initiating a reflection of the sonic appearance of future synthesized speech that goes beyond the binary. In this contribution we present a first-step approach for generating a multivocal synthesized voice, listening to each stage in the training process to show how the voice develops over time with many different voices in the training pool. We describe our technical approach for training and reflect on the effectiveness of this in regard to making audible a more diverse vocal representation. In the audio paper we reflect on whether current deep learning methods are suitable for our aim of generating a multivocal voice and discuss whether bias within both the dataset and the network itself becomes prominent in the resulting voice. The generation itself perhaps offers an audible example of bias in AI. Our sonic exploration of the multivocal synthetic voice points to the difficulties of applying conventional machine learning approaches, which may be mono-domain focused, when aiming to make a diverse vocal representation audible.

AB - Living in a world where machines are talking to us with synthetic voices, it is important to discuss questions of representation and aesthetics. Today most voices in devices and systems are designed to have binary vocal identities. This could be different. Our project aims to inspire a reimagination of the paralinguistics of synthesized voices, exploring how to train and develop the pitch, timbre, pace, and other vocal features beyond speech, based on vocal data from many different people, presenting the idea of a diverse and collective voice, initiating a reflection of the sonic appearance of future synthesized speech that goes beyond the binary. In this contribution we present a first-step approach for generating a multivocal synthesized voice, listening to each stage in the training process to show how the voice develops over time with many different voices in the training pool. We describe our technical approach for training and reflect on the effectiveness of this in regard to making audible a more diverse vocal representation. In the audio paper we reflect on whether current deep learning methods are suitable for our aim of generating a multivocal voice and discuss whether bias within both the dataset and the network itself becomes prominent in the resulting voice. The generation itself perhaps offers an audible example of bias in AI. Our sonic exploration of the multivocal synthetic voice points to the difficulties of applying conventional machine learning approaches, which may be mono-domain focused, when aiming to make a diverse vocal representation audible.

U2 - 10.48233/SEISMOGRAF2612

DO - 10.48233/SEISMOGRAF2612

M3 - Journal article

JO - Seismograf/DMT

JF - Seismograf/DMT

SN - 2245-4705

ER -

ID: 85887764