ITU

Cartography Active Learning

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Standard

Cartography Active Learning. / Zhang, Mike; Plank, Barbara.

Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 2021. p. 395–406.

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Harvard

Zhang, M & Plank, B 2021, Cartography Active Learning. in Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, pp. 395–406. <https://aclanthology.org/2021.findings-emnlp.36.pdf>

APA

Zhang, M., & Plank, B. (2021). Cartography Active Learning. In Findings of the Association for Computational Linguistics: EMNLP 2021 (pp. 395–406). Association for Computational Linguistics. https://aclanthology.org/2021.findings-emnlp.36.pdf

Vancouver

Zhang M, Plank B. Cartography Active Learning. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics. 2021. p. 395–406

Author

Zhang, Mike ; Plank, Barbara. / Cartography Active Learning. Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 2021. pp. 395–406

Bibtex

@inproceedings{a27ef98eb9984b5fa8cddf71fd254033,
title = "Cartography Active Learning",
abstract = "We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling. CAL is inspired by data maps, which were recently proposed to derive insights into dataset quality (Swayamdipta et al., 2020). We compare our method on popular text classification tasks to commonly used AL strategies, which instead rely on post-training behavior. We demonstrate that CAL is competitive to other common AL methods, showing that training dynamics derived from small seed data can be successfully used for AL. We provide insights into our new AL method by analyzing batch-level statistics utilizing the data maps. Our results further show that CAL results in a more data-efficient learning strategy, achieving comparable or better results with considerably less training data.",
author = "Mike Zhang and Barbara Plank",
year = "2021",
month = nov,
day = "8",
language = "English",
pages = "395–406",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
publisher = "Association for Computational Linguistics",
address = "United States",

}

RIS

TY - GEN

T1 - Cartography Active Learning

AU - Zhang, Mike

AU - Plank, Barbara

PY - 2021/11/8

Y1 - 2021/11/8

N2 - We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling. CAL is inspired by data maps, which were recently proposed to derive insights into dataset quality (Swayamdipta et al., 2020). We compare our method on popular text classification tasks to commonly used AL strategies, which instead rely on post-training behavior. We demonstrate that CAL is competitive to other common AL methods, showing that training dynamics derived from small seed data can be successfully used for AL. We provide insights into our new AL method by analyzing batch-level statistics utilizing the data maps. Our results further show that CAL results in a more data-efficient learning strategy, achieving comparable or better results with considerably less training data.

AB - We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling. CAL is inspired by data maps, which were recently proposed to derive insights into dataset quality (Swayamdipta et al., 2020). We compare our method on popular text classification tasks to commonly used AL strategies, which instead rely on post-training behavior. We demonstrate that CAL is competitive to other common AL methods, showing that training dynamics derived from small seed data can be successfully used for AL. We provide insights into our new AL method by analyzing batch-level statistics utilizing the data maps. Our results further show that CAL results in a more data-efficient learning strategy, achieving comparable or better results with considerably less training data.

M3 - Article in proceedings

SP - 395

EP - 406

BT - Findings of the Association for Computational Linguistics: EMNLP 2021

PB - Association for Computational Linguistics

ER -

ID: 86198495