Publicação

Speaker recognition for door opening systems

Ver documento

Detalhes bibliográficos
Resumo:Besides being an important communication tool, the voice can also serve for identification purposes since it has an individual signature for each person. Speaker recognition technologies can use this signature as an authentication method to access environments. This work explores the development and testing of machine and deep learning models, specifically the GMM, the VGG-M, and ResNet50 models, for speaker recognition access control to build a system to grant access to CeDRI’s laboratory. The deep learning models were evaluated based on their performance in recognizing speakers from audio samples, emphasizing the Equal Error Rate metric to determine their effectiveness. The models were trained and tested initially in public datasets with 1251 to 6112 speakers and then fine-tuned on private datasets with 32 speakers of CeDri’s laboratory. In this study, we compared the performance of ResNet50, VGGM, and GMM models for speaker verification. After conducting experiments on our private datasets, we found that the ResNet50 model outperformed the other models. It achieved the lowest Equal Error Rate (EER) of 0.7% on the Framed Silence Removed dataset. On the same dataset,« the VGGM model achieved an EER of 5%, and the GMM model achieved an EER of 2.13%. Our best model’s performance was unable to achieve the current state-of-the-art of 2.87% in the VoxCeleb 1 verification dataset. However, our best implementation using ResNet50 achieved an EER of 5.96% while being trained on only a tiny portion of the data than it usually is. So, this result indicates that our model is robust and efficient and provides a significant improvement margin. This thesis provides insights into the capabilities of these models in a real-world application, aiming to deploy the system on a platform for practical use in laboratory access authorization. The results of this study contribute to the field of biometric security by demonstrating the potential of speaker recognition systems in controlled environments.
Autores principais:Manfron, Enrico
Assunto:Besides Communication tool Deep learning model
Ano:2023
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Instituto Politécnico de Bragança
Idioma:inglês
Origem:Biblioteca Digital do IPB
_version_ 1867173246291410944
author Manfron, Enrico
author_facet Manfron, Enrico
author_role author
contributor_name_str_mv Teixeira, João Paulo
Minetto, Rodrigo
Biblioteca Digital do IPB
country_str PT
creators_json_txt [{\"Person.name\":\"Manfron, Enrico\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Teixeira, João Paulo
Minetto, Rodrigo
Biblioteca Digital do IPB
datacite.creators.creator.creatorName.fl_str_mv Manfron, Enrico
datacite.date.Accepted.fl_str_mv 2023-01-01T00:00:00Z
datacite.date.available.fl_str_mv 2024-01-03T14:47:47Z
datacite.date.embargoed.fl_str_mv 2024-01-03T14:47:47Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Besides
Communication tool
Deep learning model
datacite.titles.title.fl_str_mv Speaker recognition for door opening systems
dc.contributor.none.fl_str_mv Teixeira, João Paulo
Minetto, Rodrigo
Biblioteca Digital do IPB
dc.creator.none.fl_str_mv Manfron, Enrico
dc.date.Accepted.fl_str_mv 2023-01-01T00:00:00Z
dc.date.available.fl_str_mv 2024-01-03T14:47:47Z
dc.date.embargoed.fl_str_mv 2024-01-03T14:47:47Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv http://hdl.handle.net/10198/29072
dc.language.none.fl_str_mv eng
dc.rights.cclincense.fl_str_mv http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv Besides
Communication tool
Deep learning model
dc.title.fl_str_mv Speaker recognition for door opening systems
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_bdcc
description Besides being an important communication tool, the voice can also serve for identification purposes since it has an individual signature for each person. Speaker recognition technologies can use this signature as an authentication method to access environments. This work explores the development and testing of machine and deep learning models, specifically the GMM, the VGG-M, and ResNet50 models, for speaker recognition access control to build a system to grant access to CeDRI’s laboratory. The deep learning models were evaluated based on their performance in recognizing speakers from audio samples, emphasizing the Equal Error Rate metric to determine their effectiveness. The models were trained and tested initially in public datasets with 1251 to 6112 speakers and then fine-tuned on private datasets with 32 speakers of CeDri’s laboratory. In this study, we compared the performance of ResNet50, VGGM, and GMM models for speaker verification. After conducting experiments on our private datasets, we found that the ResNet50 model outperformed the other models. It achieved the lowest Equal Error Rate (EER) of 0.7% on the Framed Silence Removed dataset. On the same dataset,« the VGGM model achieved an EER of 5%, and the GMM model achieved an EER of 2.13%. Our best model’s performance was unable to achieve the current state-of-the-art of 2.87% in the VoxCeleb 1 verification dataset. However, our best implementation using ResNet50 achieved an EER of 5.96% while being trained on only a tiny portion of the data than it usually is. So, this result indicates that our model is robust and efficient and provides a significant improvement margin. This thesis provides insights into the capabilities of these models in a real-world application, aiming to deploy the system on a platform for practical use in laboratory access authorization. The results of this study contribute to the field of biometric security by demonstrating the potential of speaker recognition systems in controlled environments.
dirty 0
eu_rights_str_mv openAccess
format masterThesis
fulltext.url.fl_str_mv https://bibliotecadigital.ipb.pt/bitstreams/4d200ef7-8b4a-41ca-87ee-371afd682c58/download
id ipb_3f80efc43bf9bf85ebee2fd317bceee8
identifier.url.fl_str_mv http://hdl.handle.net/10198/29072
instacron_str ipb
institution Instituto Politécnico de Bragança
instname_str Instituto Politécnico de Bragança
language eng
network_acronym_str ipb
network_name_str Biblioteca Digital do IPB
oai_identifier_str oai:bibliotecadigital.ipb.pt:10198/29072
organization_str_mv urn:organizationAcronym:ipb
person_str_mv Manfron, Enrico
publishDate 2023
reponame_str Biblioteca Digital do IPB
repository_id_str urn:repositoryAcronym:ipb
service_str_mv urn:repositoryAcronym:ipb
spelling engpt_PTBesides being an important communication tool, the voice can also serve for identification purposes since it has an individual signature for each person. Speaker recognition technologies can use this signature as an authentication method to access environments. This work explores the development and testing of machine and deep learning models, specifically the GMM, the VGG-M, and ResNet50 models, for speaker recognition access control to build a system to grant access to CeDRI’s laboratory. The deep learning models were evaluated based on their performance in recognizing speakers from audio samples, emphasizing the Equal Error Rate metric to determine their effectiveness. The models were trained and tested initially in public datasets with 1251 to 6112 speakers and then fine-tuned on private datasets with 32 speakers of CeDri’s laboratory. In this study, we compared the performance of ResNet50, VGGM, and GMM models for speaker verification. After conducting experiments on our private datasets, we found that the ResNet50 model outperformed the other models. It achieved the lowest Equal Error Rate (EER) of 0.7% on the Framed Silence Removed dataset. On the same dataset,« the VGGM model achieved an EER of 5%, and the GMM model achieved an EER of 2.13%. Our best model’s performance was unable to achieve the current state-of-the-art of 2.87% in the VoxCeleb 1 verification dataset. However, our best implementation using ResNet50 achieved an EER of 5.96% while being trained on only a tiny portion of the data than it usually is. So, this result indicates that our model is robust and efficient and provides a significant improvement margin. This thesis provides insights into the capabilities of these models in a real-world application, aiming to deploy the system on a platform for practical use in laboratory access authorization. The results of this study contribute to the field of biometric security by demonstrating the potential of speaker recognition systems in controlled environments.application/pdfpt_PTSpeaker recognition for door opening systemsManfron, EnricoTeixeira, João PauloMinetto, RodrigoHostingInstitutionOrganizationalBiblioteca Digital do IPBe-mailmailto:dspace@ipb.ptdspace@ipb.ptURNurn:tid:2034445822024-01-03T14:47:47Z20232023-01-01T00:00:00ZHandlehttp://hdl.handle.net/10198/29072http://purl.org/coar/access_right/c_abf2open accessBesidesCommunication toolDeep learning model3430411 bytesliteraturehttp://purl.org/coar/resource_type/c_bdccmaster thesis2023http://creativecommons.org/licenses/by-nc/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://bibliotecadigital.ipb.pt/bitstreams/4d200ef7-8b4a-41ca-87ee-371afd682c58/download
spellingShingle Speaker recognition for door opening systems
Manfron, Enrico
Besides
Communication tool
Deep learning model
status SINGLETON
subject.fl_str_mv Besides
Communication tool
Deep learning model
title Speaker recognition for door opening systems
title_full Speaker recognition for door opening systems
title_fullStr Speaker recognition for door opening systems
title_full_unstemmed Speaker recognition for door opening systems
title_short Speaker recognition for door opening systems
title_sort Speaker recognition for door opening systems
topic Besides
Communication tool
Deep learning model
topic_facet Besides
Communication tool
Deep learning model
url http://hdl.handle.net/10198/29072
visible 1