BERT Busters: Outlier Dimensions That Disrupt Transformers

Olga Kovaleva, Saurabh Kulshreshtha, Anna Rogers, Anna Rumshisky

    Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review


    Multiple studies have shown that Transformers are remarkably robust to pruning. Contrary to this received wisdom, we demonstrate that pre-trained Transformer encoders are surprisingly fragile to the removal of a very small number of features in the layer outputs (
    Original languageEnglish
    Title of host publicationFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021
    Number of pages14
    Place of PublicationOnline
    PublisherAssociation for Computational Linguistics
    Publication date1 Aug 2021
    Publication statusPublished - 1 Aug 2021


    • Transformers robustness
    • Pre-trained models
    • Layer outputs
    • Feature pruning
    • Model fragility


    Dive into the research topics of 'BERT Busters: Outlier Dimensions That Disrupt Transformers'. Together they form a unique fingerprint.

    Cite this