Spring til hovednavigation Spring til søgning Spring til hovedindhold

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

  • Carnegie Mellon University
  • Princeton University

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

A wide array of sequence models are built on a framework modeled after Transformers, comprising alternating sequence mixer and channel mixer layers. This paper studies a unifying matrix mixer view of sequence mixers that can be conceptualized as a linear map on the input sequence. This framework encompasses a broad range of well-known sequence models, including the self-attention of Transformers as well as recent strong alternatives such as structured state space models (SSMs), and allows understanding downstream characteristics such as efficiency and expressivity through properties of their structured matrix class. We identify a key axis of matrix parameterizations termed sequence alignment, which increases the flexibility and performance of matrix mixers, providing insights into the strong performance of Transformers and recent SSMs such as Mamba. Furthermore, the matrix mixer framework offers a systematic approach to developing sequence mixers with desired properties, allowing us to develop several new sub-quadratic sequence models. In particular, we propose a natural bidirectional extension of the Mamba model (Hydra), parameterized as a quasiseparable matrix mixer, which demonstrates superior performance over other sequence models including Transformers on non-causal tasks. As a drop-in replacement for attention layers, Hydra outperforms BERT by 0.8 points on the GLUE benchmark and ViT by 2% Top-1 accuracy on ImageNet.
OriginalsprogEngelsk
TitelThe Thirty-eighth Annual Conference on Neural Information Processing Systems
ForlagNeural Information Processing Systems
Publikationsdato2024
Sider1-33
StatusUdgivet - 2024
BegivenhedConference on Neural Information Processing Systems - Vancouver Convention Centre, Vancouver, Canada
Varighed: 9 dec. 202415 dec. 2024
Konferencens nummer: 38
https://inspirehep.net/conferences/2827144

Konference

KonferenceConference on Neural Information Processing Systems
Nummer38
LokationVancouver Convention Centre
Land/OmrådeCanada
ByVancouver
Periode09/12/202415/12/2024
Internetadresse

Fingeraftryk

Dyk ned i forskningsemnerne om 'Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers'. Sammen danner de et unikt fingeraftryk.

Citationsformater