Publicação
Bridging the gap between real and synthetic traffic sign repositories
| Resumo: | Current traffic sign image repositories for classification purposes suffer from scarcity of samples due to the compiling and labelling images being mainly a manual process. Thus, researchers resort to alternative approaches to deal with this issue, such as increasing the model architectural complexity or performing data augmentation. A third approach is the usage of synthetic data. This work addresses the data shortage issue by building a synthetic repository proposing a pipeline to build synthetic samples introducing previously unused image operators. Three use cases for synthetic data usage are explored: as a standalone training set, merging with real data, and ensembling. The first option provides results that not only clearly surpass any previous attempt on using synthetic data for traffic sign recognition but are also encouragingly placing the obtained accuracies closer to results with real images. Merging real and synthetic data in a single data set further improves those resul ts. Due to the different nature of the datasets involved, ensembling provides a boost in accuracy results. Overall we got results in three different datasets that surpass previous state of the art results: GTSRB (99:85%), BTSC (99:76%), and rMASTIF (99:84%). Finally, cross testing amongst the three datasets hints that our synthetic datasets have the potential to provide better generalization ability than using real data. |
|---|---|
| Autores principais: | Silva, Diogo Lopes da |
| Outros Autores: | Fernandes, António Ramires |
| Assunto: | Synthetic training sets Traffic sign classification repositories Convolutional neural networks Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
| Ano: | 2022 |
| País: | Portugal |
| Tipo de documento: | comunicação em conferência |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | Current traffic sign image repositories for classification purposes suffer from scarcity of samples due to the compiling and labelling images being mainly a manual process. Thus, researchers resort to alternative approaches to deal with this issue, such as increasing the model architectural complexity or performing data augmentation. A third approach is the usage of synthetic data. This work addresses the data shortage issue by building a synthetic repository proposing a pipeline to build synthetic samples introducing previously unused image operators. Three use cases for synthetic data usage are explored: as a standalone training set, merging with real data, and ensembling. The first option provides results that not only clearly surpass any previous attempt on using synthetic data for traffic sign recognition but are also encouragingly placing the obtained accuracies closer to results with real images. Merging real and synthetic data in a single data set further improves those resul ts. Due to the different nature of the datasets involved, ensembling provides a boost in accuracy results. Overall we got results in three different datasets that surpass previous state of the art results: GTSRB (99:85%), BTSC (99:76%), and rMASTIF (99:84%). Finally, cross testing amongst the three datasets hints that our synthetic datasets have the potential to provide better generalization ability than using real data. |
|---|