Abstract
Conventional music visualisation systems rely on handcrafted ad hoc transformations of shapes and colours that offer only limited expressiveness. We propose two novel pipelines for automatically generating music videos from any user-specified, vocal or instrumental song using off-the-shelf deep learning models. Inspired by the manual workflows of music video producers, we experiment on how well latent feature-based techniques can analyse audio to detect musical qualities, such as emotional cues and instrumental patterns, and distil them into textual scene descriptions using a language model. Next, we employ a generative model to produce the corresponding video clips. To assess the generated videos, we identify several critical aspects and design and conduct a preliminary user evaluation that demonstrates storytelling potential, visual coherency and emotional alignment with the music. Our findings underscore the potential of latent feature techniques and deep generative models to expand music visualisation beyond traditional approaches.
| Original language | English |
|---|---|
| Title of host publication | 2025 IEEE/CVF International Conference on Computer Vision (ICCV) |
| Publisher | IEEE |
| Publication date | 2025 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | Generative AI for Storytelling - Hawaii, Honolulu, United States Duration: 20 Oct 2025 → 20 Oct 2025 Conference number: 3 https://aistory2025.github.io/ |
Conference
| Conference | Generative AI for Storytelling |
|---|---|
| Number | 3 |
| Location | Hawaii |
| Country/Territory | United States |
| City | Honolulu |
| Period | 20/10/2025 → 20/10/2025 |
| Internet address |
Keywords
- Music visualisation
- Audio-to-video synthesis
- Latent feature representations
- Text-to-video generation
- Multimodal generative systems
Fingerprint
Dive into the research topics of 'From Sound to Sight: Towards AI-authored Music Videos'. Together they form a unique fingerprint.Prizes
Projects
- 2 Active
-
P1: Pioneer Centre for Artificial Intelligence
Sestoft, P. (PI), Plank, B. (CoI), Hansen, D. W. (CoI), Larsen, A. W. (CoI), Bogers, T. (CoI), Madsen, I. J. W. (CoI), Dixen, L. (CoI), Trinhammer, M. L. (CoI), Iarygina, O. (CoI), Grasshof, S. (CoI), Mottelson, A. (CoI), Burelli, P. (CoI), Risi, S. (CoI), Rogers, A. (CoI), Goot, R. V. D. (CoI), Coscia, M. (CoI), Hardmeier, C. (CoI), Heinrich, S. (Collaborator) & Güven, A. B. (Collaborator)
Danish National Research Foundation
01/07/2021 → 30/06/2034
Project: Research
-
XTREME: Extended Reality Environment for Immersive Experience of Art and Music
Brandt, S. (PI), Sivertsen, C. (CoI), Starostka, J. (CoI) & Harshit, H. (CoI)
01/01/2024 → 31/12/2026
Project: Research
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver