Abstract
Digital assistants are becoming an integral part of everyday life. However, commercial digital assistants are only available for a limited set of languages. Because of this, a vast amount of people can not use these devices in their native tongue.
In this work, we focus on two core tasks within the digital assistant pipeline: intent classification and slot detection. Intent classification recovers the goal of the utterance, whereas slot detection identifies important properties regarding this goal. Besides introducing a novel cross-lingual dataset for these tasks, consisting of 11 languages, we evaluate a variety of models: 1)
multilingually pretrained transformer-based models, 2) we supplement these models with auxiliary tasks to evaluate whether multi-task learning can be beneficial, and 3) annotation transfer with neural machine translation.
In this work, we focus on two core tasks within the digital assistant pipeline: intent classification and slot detection. Intent classification recovers the goal of the utterance, whereas slot detection identifies important properties regarding this goal. Besides introducing a novel cross-lingual dataset for these tasks, consisting of 11 languages, we evaluate a variety of models: 1)
multilingually pretrained transformer-based models, 2) we supplement these models with auxiliary tasks to evaluate whether multi-task learning can be beneficial, and 3) annotation transfer with neural machine translation.
Original language | English |
---|---|
Publication date | 25 Sept 2021 |
Publication status | Published - 25 Sept 2021 |
Event | RESOURCEFUL-2020 : RESOURCEs and representations For Under-resourced Languages and domains - Gothenburg, Gothenburg, Sweden Duration: 25 Nov 2020 → … https://gu-clasp.github.io/resourceful-2020/ |
Workshop
Workshop | RESOURCEFUL-2020 |
---|---|
Location | Gothenburg |
Country/Territory | Sweden |
City | Gothenburg |
Period | 25/11/2020 → … |
Internet address |
Keywords
- Digital assistants
- Intent classification
- Slot detection
- Cross-lingual dataset
- Multilingual transformers