Publicação
SLVideo: A Sign Language Video Moment Retrieval Framework
| Resumo: | Sign Language Recognition has been an increasingly studied and developed subject throughout the years to help deaf and hard-of-hearing individuals in their social interactions in everyday life. These technologies employ manual sign recognition algorithms; however, the majority of them lack the capacity to recognise facial expressions, which are also an essential part of sign language as they allow the speaker to add expressiveness to their dialogue or even change the meaning of certain manual signs. For Portuguese Sign Language Recognition software this is no exception. This dissertation introduces SLVideo, a video moment retrieval system for Sign Language videos that incorporates facial expressions, addressing the gap in existing technology by focusing on both hand and facial signs. The system extracts embedding representations for the hand and face signs from video frames to capture the language signs in their entirety. This enables users to search for a specific sign language video segment with text queries or to search by similar sign language videos. To evaluate this system, a collection of eight hours of annotated Portuguese Sign Language videos is used as the dataset, and a CLIP model is used to generate the embeddings. The initial results are promising in a zero-shot setting. Additionally, SLVideo allows users to edit existing annotations and create new ones, making it a collaborative tool for annotators working with the same videos. |
|---|---|
| Autores principais: | Martins, Gonçalo Vinagre |
| Assunto: | Sign Language Recognition Facial expressions Portuguese Sign Language Video moment retrieval |
| Ano: | 2024 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
| Resumo: | Sign Language Recognition has been an increasingly studied and developed subject throughout the years to help deaf and hard-of-hearing individuals in their social interactions in everyday life. These technologies employ manual sign recognition algorithms; however, the majority of them lack the capacity to recognise facial expressions, which are also an essential part of sign language as they allow the speaker to add expressiveness to their dialogue or even change the meaning of certain manual signs. For Portuguese Sign Language Recognition software this is no exception. This dissertation introduces SLVideo, a video moment retrieval system for Sign Language videos that incorporates facial expressions, addressing the gap in existing technology by focusing on both hand and facial signs. The system extracts embedding representations for the hand and face signs from video frames to capture the language signs in their entirety. This enables users to search for a specific sign language video segment with text queries or to search by similar sign language videos. To evaluate this system, a collection of eight hours of annotated Portuguese Sign Language videos is used as the dataset, and a CLIP model is used to generate the embeddings. The initial results are promising in a zero-shot setting. Additionally, SLVideo allows users to edit existing annotations and create new ones, making it a collaborative tool for annotators working with the same videos. |
|---|