Publicação

Bridging the gap between real and synthetic traffic sign repositories

Ver documento

Detalhes bibliográficos
Resumo:Current traffic sign image repositories for classification purposes suffer from scarcity of samples due to the compiling and labelling images being mainly a manual process. Thus, researchers resort to alternative approaches to deal with this issue, such as increasing the model architectural complexity or performing data augmentation. A third approach is the usage of synthetic data. This work addresses the data shortage issue by building a synthetic repository proposing a pipeline to build synthetic samples introducing previously unused image operators. Three use cases for synthetic data usage are explored: as a standalone training set, merging with real data, and ensembling. The first option provides results that not only clearly surpass any previous attempt on using synthetic data for traffic sign recognition but are also encouragingly placing the obtained accuracies closer to results with real images. Merging real and synthetic data in a single data set further improves those resul ts. Due to the different nature of the datasets involved, ensembling provides a boost in accuracy results. Overall we got results in three different datasets that surpass previous state of the art results: GTSRB (99:85%), BTSC (99:76%), and rMASTIF (99:84%). Finally, cross testing amongst the three datasets hints that our synthetic datasets have the potential to provide better generalization ability than using real data.
Autores principais:Silva, Diogo Lopes da
Outros Autores:Fernandes, António Ramires
Assunto:Synthetic training sets Traffic sign classification repositories Convolutional neural networks Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Ano:2022
País:Portugal
Tipo de documento:comunicação em conferência
Tipo de acesso:acesso aberto
Instituição associada:Universidade do Minho
Idioma:inglês
Origem:RepositóriUM - Universidade do Minho
Descrição
Resumo:Current traffic sign image repositories for classification purposes suffer from scarcity of samples due to the compiling and labelling images being mainly a manual process. Thus, researchers resort to alternative approaches to deal with this issue, such as increasing the model architectural complexity or performing data augmentation. A third approach is the usage of synthetic data. This work addresses the data shortage issue by building a synthetic repository proposing a pipeline to build synthetic samples introducing previously unused image operators. Three use cases for synthetic data usage are explored: as a standalone training set, merging with real data, and ensembling. The first option provides results that not only clearly surpass any previous attempt on using synthetic data for traffic sign recognition but are also encouragingly placing the obtained accuracies closer to results with real images. Merging real and synthetic data in a single data set further improves those resul ts. Due to the different nature of the datasets involved, ensembling provides a boost in accuracy results. Overall we got results in three different datasets that surpass previous state of the art results: GTSRB (99:85%), BTSC (99:76%), and rMASTIF (99:84%). Finally, cross testing amongst the three datasets hints that our synthetic datasets have the potential to provide better generalization ability than using real data.