Publicação

Speaker recognition for door opening systems

Detalhes bibliográficos
Resumo:	Besides being an important communication tool, the voice can also serve for identification purposes since it has an individual signature for each person. Speaker recognition technologies can use this signature as an authentication method to access environments. This work explores the development and testing of machine and deep learning models, specifically the GMM, the VGG-M, and ResNet50 models, for speaker recognition access control to build a system to grant access to CeDRI’s laboratory. The deep learning models were evaluated based on their performance in recognizing speakers from audio samples, emphasizing the Equal Error Rate metric to determine their effectiveness. The models were trained and tested initially in public datasets with 1251 to 6112 speakers and then fine-tuned on private datasets with 32 speakers of CeDri’s laboratory. In this study, we compared the performance of ResNet50, VGGM, and GMM models for speaker verification. After conducting experiments on our private datasets, we found that the ResNet50 model outperformed the other models. It achieved the lowest Equal Error Rate (EER) of 0.7% on the Framed Silence Removed dataset. On the same dataset,« the VGGM model achieved an EER of 5%, and the GMM model achieved an EER of 2.13%. Our best model’s performance was unable to achieve the current state-of-the-art of 2.87% in the VoxCeleb 1 verification dataset. However, our best implementation using ResNet50 achieved an EER of 5.96% while being trained on only a tiny portion of the data than it usually is. So, this result indicates that our model is robust and efficient and provides a significant improvement margin. This thesis provides insights into the capabilities of these models in a real-world application, aiming to deploy the system on a platform for practical use in laboratory access authorization. The results of this study contribute to the field of biometric security by demonstrating the potential of speaker recognition systems in controlled environments.
Autores principais:	Manfron, Enrico
Assunto:	Besides Communication tool Deep learning model
Ano:	2023
País:	Portugal
Tipo de documento:	dissertação de mestrado
Tipo de acesso:	acesso aberto
Instituição associada:	Instituto Politécnico de Bragança
Idioma:	inglês
Origem:	Biblioteca Digital do IPB

_version_	1867173246291410944
author	Manfron, Enrico
author_facet	Manfron, Enrico
author_role	author
contributor_name_str_mv	Teixeira, João Paulo Minetto, Rodrigo Biblioteca Digital do IPB
country_str	PT
creators_json_txt	[{\"Person.name\":\"Manfron, Enrico\"}]
datacite.contributors.contributor.contributorName.fl_str_mv	Teixeira, João Paulo Minetto, Rodrigo Biblioteca Digital do IPB
datacite.creators.creator.creatorName.fl_str_mv	Manfron, Enrico
datacite.date.Accepted.fl_str_mv	2023-01-01T00:00:00Z
datacite.date.available.fl_str_mv	2024-01-03T14:47:47Z
datacite.date.embargoed.fl_str_mv	2024-01-03T14:47:47Z
datacite.rights.fl_str_mv	http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv	Besides Communication tool Deep learning model
datacite.titles.title.fl_str_mv	Speaker recognition for door opening systems
dc.contributor.none.fl_str_mv	Teixeira, João Paulo Minetto, Rodrigo Biblioteca Digital do IPB
dc.creator.none.fl_str_mv	Manfron, Enrico
dc.date.Accepted.fl_str_mv	2023-01-01T00:00:00Z
dc.date.available.fl_str_mv	2024-01-03T14:47:47Z
dc.date.embargoed.fl_str_mv	2024-01-03T14:47:47Z
dc.format.none.fl_str_mv	application/pdf
dc.identifier.none.fl_str_mv	http://hdl.handle.net/10198/29072
dc.language.none.fl_str_mv	eng
dc.rights.cclincense.fl_str_mv	http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.none.fl_str_mv	http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv	Besides Communication tool Deep learning model
dc.title.fl_str_mv	Speaker recognition for door opening systems
dc.type.none.fl_str_mv	http://purl.org/coar/resource_type/c_bdcc
description	Besides being an important communication tool, the voice can also serve for identification purposes since it has an individual signature for each person. Speaker recognition technologies can use this signature as an authentication method to access environments. This work explores the development and testing of machine and deep learning models, specifically the GMM, the VGG-M, and ResNet50 models, for speaker recognition access control to build a system to grant access to CeDRI’s laboratory. The deep learning models were evaluated based on their performance in recognizing speakers from audio samples, emphasizing the Equal Error Rate metric to determine their effectiveness. The models were trained and tested initially in public datasets with 1251 to 6112 speakers and then fine-tuned on private datasets with 32 speakers of CeDri’s laboratory. In this study, we compared the performance of ResNet50, VGGM, and GMM models for speaker verification. After conducting experiments on our private datasets, we found that the ResNet50 model outperformed the other models. It achieved the lowest Equal Error Rate (EER) of 0.7% on the Framed Silence Removed dataset. On the same dataset,« the VGGM model achieved an EER of 5%, and the GMM model achieved an EER of 2.13%. Our best model’s performance was unable to achieve the current state-of-the-art of 2.87% in the VoxCeleb 1 verification dataset. However, our best implementation using ResNet50 achieved an EER of 5.96% while being trained on only a tiny portion of the data than it usually is. So, this result indicates that our model is robust and efficient and provides a significant improvement margin. This thesis provides insights into the capabilities of these models in a real-world application, aiming to deploy the system on a platform for practical use in laboratory access authorization. The results of this study contribute to the field of biometric security by demonstrating the potential of speaker recognition systems in controlled environments.
dirty	0
eu_rights_str_mv	openAccess
format	masterThesis
fulltext.url.fl_str_mv	https://bibliotecadigital.ipb.pt/bitstreams/4d200ef7-8b4a-41ca-87ee-371afd682c58/download
id	ipb_3f80efc43bf9bf85ebee2fd317bceee8
identifier.url.fl_str_mv	http://hdl.handle.net/10198/29072
instacron_str	ipb
institution	Instituto Politécnico de Bragança
instname_str	Instituto Politécnico de Bragança
language	eng
network_acronym_str	ipb
network_name_str	Biblioteca Digital do IPB
oai_identifier_str	oai:bibliotecadigital.ipb.pt:10198/29072
organization_str_mv	urn:organizationAcronym:ipb
person_str_mv	Manfron, Enrico
publishDate	2023
reponame_str	Biblioteca Digital do IPB
repository_id_str	urn:repositoryAcronym:ipb
service_str_mv	urn:repositoryAcronym:ipb
spelling	engpt_PTBesides being an important communication tool, the voice can also serve for identification purposes since it has an individual signature for each person. Speaker recognition technologies can use this signature as an authentication method to access environments. This work explores the development and testing of machine and deep learning models, specifically the GMM, the VGG-M, and ResNet50 models, for speaker recognition access control to build a system to grant access to CeDRI’s laboratory. The deep learning models were evaluated based on their performance in recognizing speakers from audio samples, emphasizing the Equal Error Rate metric to determine their effectiveness. The models were trained and tested initially in public datasets with 1251 to 6112 speakers and then fine-tuned on private datasets with 32 speakers of CeDri’s laboratory. In this study, we compared the performance of ResNet50, VGGM, and GMM models for speaker verification. After conducting experiments on our private datasets, we found that the ResNet50 model outperformed the other models. It achieved the lowest Equal Error Rate (EER) of 0.7% on the Framed Silence Removed dataset. On the same dataset,« the VGGM model achieved an EER of 5%, and the GMM model achieved an EER of 2.13%. Our best model’s performance was unable to achieve the current state-of-the-art of 2.87% in the VoxCeleb 1 verification dataset. However, our best implementation using ResNet50 achieved an EER of 5.96% while being trained on only a tiny portion of the data than it usually is. So, this result indicates that our model is robust and efficient and provides a significant improvement margin. This thesis provides insights into the capabilities of these models in a real-world application, aiming to deploy the system on a platform for practical use in laboratory access authorization. The results of this study contribute to the field of biometric security by demonstrating the potential of speaker recognition systems in controlled environments.application/pdfpt_PTSpeaker recognition for door opening systemsManfron, EnricoTeixeira, João PauloMinetto, RodrigoHostingInstitutionOrganizationalBiblioteca Digital do IPBe-mailmailto:dspace@ipb.ptdspace@ipb.ptURNurn:tid:2034445822024-01-03T14:47:47Z20232023-01-01T00:00:00ZHandlehttp://hdl.handle.net/10198/29072http://purl.org/coar/access_right/c_abf2open accessBesidesCommunication toolDeep learning model3430411 bytesliteraturehttp://purl.org/coar/resource_type/c_bdccmaster thesis2023http://creativecommons.org/licenses/by-nc/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://bibliotecadigital.ipb.pt/bitstreams/4d200ef7-8b4a-41ca-87ee-371afd682c58/download
spellingShingle	Speaker recognition for door opening systems Manfron, Enrico Besides Communication tool Deep learning model
status	SINGLETON
subject.fl_str_mv	Besides Communication tool Deep learning model
title	Speaker recognition for door opening systems
title_full	Speaker recognition for door opening systems
title_fullStr	Speaker recognition for door opening systems
title_full_unstemmed	Speaker recognition for door opening systems
title_short	Speaker recognition for door opening systems
title_sort	Speaker recognition for door opening systems
topic	Besides Communication tool Deep learning model
topic_facet	Besides Communication tool Deep learning model
url	http://hdl.handle.net/10198/29072
visible	1

Publicação

Speaker recognition for door opening systems

Registos relacionados