Publicação

Unsupervised models for audio emotion detection

Detalhes bibliográficos
Resumo:	Traditionally, supervised methods are able to learn language models and understand human communication using data-intensive approaches. However, many languages and dialects have few or inexistent resources, being a major drawback to the development of Automatic Speech Recognition (ASR) systems. This work seeks to develop a complete unsupervised pipeline to detect emotions from raw audio signals of Via Directa (VD) call center recordings, in the European Portuguese language. To that end, a concise literature review about low-resource approaches for the subtasks of ASR, Speech Enhancement (SE), and Sentiment Analysis (SA) was done. Considering the SE task, a Wave-U-net model was successfully implemented, being able to denoise raw audio signals with average Segmental Signal-to-Noise Ratio (SSNR) scores above 3.9 and a 0.8% increase in Signal-to-Noise Ratio (SNR). For the SA task, a domain specific sentiment lexicon based on the SentiWordNet3.0 dictionary was developed for the European Portuguese language. Then, using a linear Support Vector Machine (SVM) baseline model for benchmarking, the Lex2Sent model was modified and its performance improved for binary classification of sentiment in the corresponding transcriptions, which achieved an F1 macro score of 0.584. Lastly, limitations are discussed with the goal of developing the remaining unsupervised ASR system.
Autores principais:	Bernardo, Miguel Ângelo Martins
Assunto:	NLP Unsupervised Learning Sentiment Analysis Speech Enhancement Lexicon
Ano:	2024
País:	Portugal
Tipo de documento:	dissertação de mestrado
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade Nova de Lisboa
Idioma:	inglês
Origem:	Repositório Institucional da UNL

Descrição
Resumo:	Traditionally, supervised methods are able to learn language models and understand human communication using data-intensive approaches. However, many languages and dialects have few or inexistent resources, being a major drawback to the development of Automatic Speech Recognition (ASR) systems. This work seeks to develop a complete unsupervised pipeline to detect emotions from raw audio signals of Via Directa (VD) call center recordings, in the European Portuguese language. To that end, a concise literature review about low-resource approaches for the subtasks of ASR, Speech Enhancement (SE), and Sentiment Analysis (SA) was done. Considering the SE task, a Wave-U-net model was successfully implemented, being able to denoise raw audio signals with average Segmental Signal-to-Noise Ratio (SSNR) scores above 3.9 and a 0.8% increase in Signal-to-Noise Ratio (SNR). For the SA task, a domain specific sentiment lexicon based on the SentiWordNet3.0 dictionary was developed for the European Portuguese language. Then, using a linear Support Vector Machine (SVM) baseline model for benchmarking, the Lex2Sent model was modified and its performance improved for binary classification of sentiment in the corresponding transcriptions, which achieved an F1 macro score of 0.584. Lastly, limitations are discussed with the goal of developing the remaining unsupervised ASR system.