Publicação
LAViTSPose: A lightweight cascaded framework for robust sitting posture recognition via detection– segmentation–classification
| Resumo: | Sitting posture recognition, defined as automatically localizing and categorizing seated human postures, has become essential for large-scale ergonomics assessment and longitudinal health-risk monitoring in classrooms and offices. However, in real-world multi-person scenes, pervasive occlusions and overlaps induce keypoint misalignment, causing global-attention backbones to fail to localize critical local structures. Moreover, annotation scarcity makes small-sample training commonplace, leaving models insufficiently robust to misalignment perturbations and thereby limiting cross-domain generalization. To address these challenges, we propose LAViTSPose, a lightweight cascaded framework for sitting posture recognition. Concretely, a YOLOR-based detector trained with a Range-aware IoU (RaIoU) loss yields tight person crops under partial visibility; ESBody suppresses cross-person leakage and estimates occlusion/head-orientation cues; a compact ViT head (MLiT) with Spatial Displacement Contact (SDC) and a learnable temperature (LT) mechanism performs skeleton-only classification with a local structural-consistency regularizer. From an information-theoretic perspective, our design enhances discriminative feature compactness and reduces structural entropy under occlusion and annotation scarcity. We conducted a systematic evaluation on the USSP dataset, and the results show that LAViTSPose outperforms existing methods on both sitting posture classification and face-orientation recognition while meeting real-time inference requirements. |
|---|---|
| Autores principais: | Wang, Shu |
| Outros Autores: | Tavares, Adriano; Lima, Carlos; Gomes, Tiago Manuel Ribeiro; Zhang, Yicong; Zhao, Jiyu; Liang, Yanchun |
| Assunto: | sitting posture recognition semantic segmentation lightweight Vision Transformer local consistency regularization |
| Ano: | 2025 |
| País: | Portugal |
| Tipo de documento: | artigo |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | Sitting posture recognition, defined as automatically localizing and categorizing seated human postures, has become essential for large-scale ergonomics assessment and longitudinal health-risk monitoring in classrooms and offices. However, in real-world multi-person scenes, pervasive occlusions and overlaps induce keypoint misalignment, causing global-attention backbones to fail to localize critical local structures. Moreover, annotation scarcity makes small-sample training commonplace, leaving models insufficiently robust to misalignment perturbations and thereby limiting cross-domain generalization. To address these challenges, we propose LAViTSPose, a lightweight cascaded framework for sitting posture recognition. Concretely, a YOLOR-based detector trained with a Range-aware IoU (RaIoU) loss yields tight person crops under partial visibility; ESBody suppresses cross-person leakage and estimates occlusion/head-orientation cues; a compact ViT head (MLiT) with Spatial Displacement Contact (SDC) and a learnable temperature (LT) mechanism performs skeleton-only classification with a local structural-consistency regularizer. From an information-theoretic perspective, our design enhances discriminative feature compactness and reduces structural entropy under occlusion and annotation scarcity. We conducted a systematic evaluation on the USSP dataset, and the results show that LAViTSPose outperforms existing methods on both sitting posture classification and face-orientation recognition while meeting real-time inference requirements. |
|---|