JobSkape: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching

Antoine Magron, Anna Dai, Mike Zhang, Syrielle Montariol, Antoine Bosselut

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

Recent approaches in skill-to-surface-form matching, employing synthetic training data for classification or similarity model training, have shown promising results, eliminating the need for time-consuming and expensive annotation. However, previous datasets have limitations, such as featuring only one skill per sentence and generally comprising short sentences. This paper introduces JobSkape, a framework to generate synthetic data that resembles real-world job postings, specifically designed to enhance skill-to-taxonomy matching. Within this framework, we create SkillSkape, a comprehensive open-source synthetic dataset of job postings tailored for skill-matching tasks. We introduce several offline metrics that show our dataset is more diverse, realistic, and follows a higher quality based on similarities. Additionally, we present a multi-step pipeline utilizing large language models (LLMs), benchmarking against supervised methodologies. We outline that the performances are comparable and that each method can be used for different use cases.
Original languageEnglish
Title of host publication1st Workshop on Natural Language Processing for Human Resources
Number of pages16
PublisherAssociation for Computational Linguistics
Publication dateMar 2024
Pages43–58
Publication statusPublished - Mar 2024
EventNLP4HR WORKSHOP 2024: Workshop on Natural Language Processing for Human Resources - St. Julians, Malta
Duration: 22 Mar 2024 → …
https://megagon.ai/nlp4hr-2024/

Workshop

WorkshopNLP4HR WORKSHOP 2024
Country/TerritoryMalta
CitySt. Julians
Period22/03/2024 → …
Internet address

Fingerprint

Dive into the research topics of 'JobSkape: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching'. Together they form a unique fingerprint.

Cite this