Abstract
Recent approaches in skill-to-surface-form matching, employing synthetic training data for classification or similarity model training, have shown promising results, eliminating the need for time-consuming and expensive annotation. However, previous datasets have limitations, such as featuring only one skill per sentence and generally comprising short sentences. This paper introduces JobSkape, a framework to generate synthetic data that resembles real-world job postings, specifically designed to enhance skill-to-taxonomy matching. Within this framework, we create SkillSkape, a comprehensive open-source synthetic dataset of job postings tailored for skill-matching tasks. We introduce several offline metrics that show our dataset is more diverse, realistic, and follows a higher quality based on similarities. Additionally, we present a multi-step pipeline utilizing large language models (LLMs), benchmarking against supervised methodologies. We outline that the performances are comparable and that each method can be used for different use cases.
Originalsprog | Engelsk |
---|---|
Titel | 1st Workshop on Natural Language Processing for Human Resources |
Antal sider | 16 |
Forlag | Association for Computational Linguistics |
Publikationsdato | mar. 2024 |
Sider | 43–58 |
Status | Udgivet - mar. 2024 |
Begivenhed | NLP4HR WORKSHOP 2024: Workshop on Natural Language Processing for Human Resources - St. Julians, Malta Varighed: 22 mar. 2024 → … https://megagon.ai/nlp4hr-2024/ |
Workshop
Workshop | NLP4HR WORKSHOP 2024 |
---|---|
Land/Område | Malta |
By | St. Julians |
Periode | 22/03/2024 → … |
Internetadresse |
Emneord
- Skill-to-surface-form matching
- Synthetic training data
- Job postings
- Skill-matching tasks
- Large language models (LLMs)