From Sound to Sight: Towards AI-authored Music Videos

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

Conventional music visualisation systems rely on handcrafted ad hoc transformations of shapes and colours that offer only limited expressiveness. We propose two novel pipelines for automatically generating music videos from any user-specified, vocal or instrumental song using off-the-shelf deep learning models. Inspired by the manual workflows of music video producers, we experiment on how well latent feature-based techniques can analyse audio to detect musical qualities, such as emotional cues and instrumental patterns, and distil them into textual scene descriptions using a language model. Next, we employ a generative model to produce the corresponding video clips. To assess the generated videos, we identify several critical aspects and design and conduct a preliminary user evaluation that demonstrates storytelling potential, visual coherency and emotional alignment with the music. Our findings underscore the potential of latent feature techniques and deep generative models to expand music visualisation beyond traditional approaches.
OriginalsprogEngelsk
Titel2025 IEEE/CVF International Conference on Computer Vision (ICCV)
ForlagIEEE
Publikationsdato2025
DOI
StatusUdgivet - 2025
BegivenhedGenerative AI for Storytelling - Hawaii, Honolulu, USA
Varighed: 20 okt. 202520 okt. 2025
Konferencens nummer: 3
https://aistory2025.github.io/

Konference

KonferenceGenerative AI for Storytelling
Nummer3
LokationHawaii
Land/OmrådeUSA
ByHonolulu
Periode20/10/202520/10/2025
Internetadresse

Fingeraftryk

Dyk ned i forskningsemnerne om 'From Sound to Sight: Towards AI-authored Music Videos'. Sammen danner de et unikt fingeraftryk.
  • Best Demo Award

    Frost, M. (Modtager), maj 2011

    Pris: Priser, stipendier, udnævnelser

Citationsformater