TY - GEN
T1 - Kompetencer: Fine-grained Skill Classification in Danish Job Postings via Distant Supervision and Transfer Learning
AU - Zhang, Mike
AU - Jensen, Kristian Nørgaard
AU - Plank, Barbara
PY - 2022/6/16
Y1 - 2022/6/16
N2 - Skill Classification (SC) is the task of classifying job competences from job postings. This work is the first in SC applied to Danish job vacancy data. We release the first Danish job posting dataset: *Kompetencer* (\_en\_: competences), annotated for nested spans of competences. To improve upon coarse-grained annotations, we make use of The European Skills, Competences, Qualifications and Occupations (ESCO; le Vrang et al., (2014)) taxonomy API to obtain fine-grained labels via distant supervision. We study two setups: The zero-shot and few-shot classification setting. We fine-tune English-based models and RemBERT (Chung et al., 2020) and compare them to in-language Danish models. Our results show RemBERT significantly outperforms all other models in both the zero-shot and the few-shot setting.
AB - Skill Classification (SC) is the task of classifying job competences from job postings. This work is the first in SC applied to Danish job vacancy data. We release the first Danish job posting dataset: *Kompetencer* (\_en\_: competences), annotated for nested spans of competences. To improve upon coarse-grained annotations, we make use of The European Skills, Competences, Qualifications and Occupations (ESCO; le Vrang et al., (2014)) taxonomy API to obtain fine-grained labels via distant supervision. We study two setups: The zero-shot and few-shot classification setting. We fine-tune English-based models and RemBERT (Chung et al., 2020) and compare them to in-language Danish models. Our results show RemBERT significantly outperforms all other models in both the zero-shot and the few-shot setting.
KW - Skill Classification
KW - Job Postings
KW - Danish Job Data
KW - Distant Supervision
KW - Machine Learning Models
U2 - 10.48550/arXiv.2205.01381
DO - 10.48550/arXiv.2205.01381
M3 - Article in proceedings
SP - 436
EP - 447
BT - 13th International Conference on Language Resources and Evaluation
PB - European Language Resources Association (ELRA)
ER -