Abstract
Recent approaches in skill-to-surface-form matching, employing synthetic training data for classification or similarity model training, have shown promising results, eliminating the need for time-consuming and expensive annotation. However, previous datasets have limitations, such as featuring only one skill per sentence and generally comprising short sentences. This paper introduces JobSkape, a framework to generate synthetic data that resembles real-world job postings, specifically designed to enhance skill-to-taxonomy matching. Within this framework, we create SkillSkape, a comprehensive open-source synthetic dataset of job postings tailored for skill-matching tasks. We introduce several offline metrics that show our dataset is more diverse, realistic, and follows a higher quality based on similarities. Additionally, we present a multi-step pipeline utilizing large language models (LLMs), benchmarking against supervised methodologies. We outline that the performances are comparable and that each method can be used for different use cases.
| Originalsprog | Engelsk |
|---|---|
| Titel | 1st Workshop on Natural Language Processing for Human Resources |
| Antal sider | 16 |
| Forlag | Association for Computational Linguistics |
| Publikationsdato | mar. 2024 |
| Sider | 43–58 |
| Status | Udgivet - mar. 2024 |
| Begivenhed | NLP4HR WORKSHOP 2024: Workshop on Natural Language Processing for Human Resources - St. Julians, Malta Varighed: 22 mar. 2024 → … https://megagon.ai/nlp4hr-2024/ |
Workshop
| Workshop | NLP4HR WORKSHOP 2024 |
|---|---|
| Land/Område | Malta |
| By | St. Julians |
| Periode | 22/03/2024 → … |
| Internetadresse |
Emneord
- Skill-to-surface-form matching
- Synthetic training data
- Job postings
- Skill-matching tasks
- Large language models (LLMs)