Publicação

Vision-based interaction within a multimodal framework

Detalhes bibliográficos
Resumo:	Our contribution is to the field of video-based interaction techniques and is integrated in the home environment of the EMBASSI project. This project addresses innovative methods of man-machine interaction achieved through the development of intelligent assistance and anthropomorphic user interfaces. Within this project, multimodal techniques represent a basic requirement, especially considering those related to the integration of modalities. We are using a stereoscopic approach to allow the natural selection of devices via pointing ges-tures. The pointing hand is segmented from the video images and the 3D position and orientation of the forefinger is calculated. This modality has a subsequent integration with that of speech, in the context of a multimodal interaction infrastructure. In a first phase, we use semantic fusion with amodal input, considering the modalities in a so-called late fusion state.
Autores principais:	Sá, Vítor J.
Outros Autores:	Malerczyk, Cornelius; Schnaider, Michael
Assunto:	EMBASSI project 3D deictic gestures Multimodal man-machine interaction Agent-based systems
Ano:	2001
País:	Portugal
Tipo de documento:	comunicação em conferência
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade do Minho
Idioma:	inglês
Origem:	RepositóriUM - Universidade do Minho

Descrição
Resumo:	Our contribution is to the field of video-based interaction techniques and is integrated in the home environment of the EMBASSI project. This project addresses innovative methods of man-machine interaction achieved through the development of intelligent assistance and anthropomorphic user interfaces. Within this project, multimodal techniques represent a basic requirement, especially considering those related to the integration of modalities. We are using a stereoscopic approach to allow the natural selection of devices via pointing ges-tures. The pointing hand is segmented from the video images and the 3D position and orientation of the forefinger is calculated. This modality has a subsequent integration with that of speech, in the context of a multimodal interaction infrastructure. In a first phase, we use semantic fusion with amodal input, considering the modalities in a so-called late fusion state.