Abstract
Multiple studies have shown that Transformers are remarkably robust to pruning. Contrary to this received wisdom, we demonstrate that pre-trained Transformer encoders are surprisingly fragile to the removal of a very small number of features in the layer outputs (
| Original language | English |
|---|---|
| Title of host publication | Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 |
| Number of pages | 14 |
| Place of Publication | Online |
| Publisher | Association for Computational Linguistics |
| Publication date | 1 Aug 2021 |
| Pages | 3392-3405 |
| Publication status | Published - 1 Aug 2021 |
Keywords
- Transformers robustness
- Pre-trained models
- Layer outputs
- Feature pruning
- Model fragility
Fingerprint
Dive into the research topics of 'BERT Busters: Outlier Dimensions That Disrupt Transformers'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver