TY - JOUR
T1 - Toward more realistic career path prediction: evaluation and methods
AU - Senger, Elena
AU - Campbell, Yuri
AU - van der Goot, Rob
AU - Plank, Barbara
PY - 2025/8/25
Y1 - 2025/8/25
N2 - Predicting career trajectories is a complex yet impactful task, offering significant benefits for personalized career counseling, recruitment optimization, and workforce planning. However, effective career path prediction (CPP) modeling faces challenges including highly variable career trajectories, free-text resume data, and limited publicly available benchmark datasets. In this study, we present a comprehensive comparative evaluation of CPP models—linear projection, multilayer perceptron (MLP), LSTM, and large language models (LLMs)—across multiple input settings and two recently introduced public datasets. Our contributions are threefold: (1) we propose novel model variants, including an MLP extension and a standardized LLM approach, (2) we systematically evaluate model performance across input types (titles only vs. title+description, standardized vs. free-text), and (3) we investigate the role of synthetic data and fine-tuning strategies in addressing data scarcity and improving model generalization. Additionally, we provide a detailed qualitative analysis of prediction behaviors across industries, career lengths, and transitions. Our findings establish new baselines, reveal the trade-offs of different modeling strategies, and offer practical insights for deploying CPP systems in real-world settings.
AB - Predicting career trajectories is a complex yet impactful task, offering significant benefits for personalized career counseling, recruitment optimization, and workforce planning. However, effective career path prediction (CPP) modeling faces challenges including highly variable career trajectories, free-text resume data, and limited publicly available benchmark datasets. In this study, we present a comprehensive comparative evaluation of CPP models—linear projection, multilayer perceptron (MLP), LSTM, and large language models (LLMs)—across multiple input settings and two recently introduced public datasets. Our contributions are threefold: (1) we propose novel model variants, including an MLP extension and a standardized LLM approach, (2) we systematically evaluate model performance across input types (titles only vs. title+description, standardized vs. free-text), and (3) we investigate the role of synthetic data and fine-tuning strategies in addressing data scarcity and improving model generalization. Additionally, we provide a detailed qualitative analysis of prediction behaviors across industries, career lengths, and transitions. Our findings establish new baselines, reveal the trade-offs of different modeling strategies, and offer practical insights for deploying CPP systems in real-world settings.
KW - LLM
KW - career path prediction
KW - labor market
KW - recommendation
KW - synthetic data
U2 - 10.3389/fdata.2025.1564521
DO - 10.3389/fdata.2025.1564521
M3 - Journal article
SN - 2624-909X
VL - 8
JO - Frontiers in Big Data
JF - Frontiers in Big Data
ER -