TemPose: a new skeleton-based transformer model designed for fine-grained motion recognition in badminton

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review


This paper presents TemPose, a novel skeleton-based transformer model designed for fine-grained motion recognition to improve understanding of the detailed player actions in badminton. The model utilizes multiple temporal and interaction layers to capture variable-length multi-person human actions while minimizing reliance on non-human visual context. TemPose is evaluated on two fine-grained badminton datasets, where it significantly outperforms other baseline models by incorporating additional input streams, such as the shuttlecock position, into the temporal transformer layers of the model. Additionally, TemPose demonstrates great versatility by achieving competitive results compared to other state-of-the-art skeleton-based models on the large-scale action recognition benchmark NTU RGB+D. Experiments are conducted to explore how different model parameter configurations affect TemPose's performance. Additionally, a qualitative analysis of the temporal attention maps suggests that the model learns to prioritize frames of specific poses relevant to different actions while formulating an intuition of each individual's importance in the sequences. Overall, TemPose is an intuitive and versatile architecture that has the potential to be further developed and incorporated into other methods for managing human motion in sports with state-of-the-art results.
Original languageEnglish
Title of host publication2023 Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Publication date17 Jun 2023
Publication statusPublished - 17 Jun 2023


  • Skeleton-based motion recognition
  • Transformer model
  • Fine-grained action recognition
  • Temporal attention maps
  • Human motion in sports


Dive into the research topics of 'TemPose: a new skeleton-based transformer model designed for fine-grained motion recognition in badminton'. Together they form a unique fingerprint.

Cite this