Publicação

Deeploy: a neural network computer vision tool (for the NVidia Tegra TX2 Embedded System)

Detalhes bibliográficos
Resumo:	Machine Learning (ML) gives a computer system the ability to perform a certain task without being explicitly programmed to do it. Although ML is not a new topic in the field of computer science, these techniques have been gaining increasing popularity due to advances in hardware (especially GPUs). More powerful hardware supports more efficient training and a more responsive end-system, once deployed. These algorithms have proven to be particularly effective in image processing and feature detection, namely with deep neural networks. In the context of a vehicle, autonomous or not, perceiving its external and internal environment enables the ability to detect and identify left behind objects, its misuse or other potentially dangerous situations. This captured data is relevant to trigger vehicle intelligent responses. Bosch is currently developing a system that has these capabilities and plans to leverage deep learning approaches to implement it. This work aimed to test and evaluate the suitability of a given embedded device for the project. It also determined the best strategy to implement deep learning solutions in the device. The supplied test bed was a NVidia Software Development Kit (SDK) system for the embedded NVidia Jetson TX2 device with the System-on-Chip (SOC) Parker, an heterogeneous computing chip with 2 Denver-cores (a NVidia implementation of ARM-64 architecture), 4 CortexA57-cores (also ARM-64), 256 Pascal GPU-cores and support for up to 6 video cameras. The SDK includes several software library packages, including for image processing and ML. With the goal of fully exploiting the embedded device compute capabilities, this work studied several inference frameworks, going as far as implementing an inference engine from scratch (named Deeploy) that produces inferences based on two libraries provided by NVidia: cuDNN and TensorRT. Deeploy was evaluated against well known and established frameworks, namely Tensorflow, PyTorch and Darknet, in terms of efficiency, resource management and overall ease of use, maintainability and flexibility. This work also exploited key performance related features available on the device, such as power modes, half-precision floating point computation and the implemented shared memory architecture between the GPU-cores and the CPU-cores.
Autores principais:	Fernandes, João Pedro Alves
Assunto:	Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Ano:	2018
País:	Portugal
Tipo de documento:	dissertação de mestrado
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade do Minho
Idioma:	inglês
Origem:	RepositóriUM - Universidade do Minho

Descrição
Resumo:	Machine Learning (ML) gives a computer system the ability to perform a certain task without being explicitly programmed to do it. Although ML is not a new topic in the field of computer science, these techniques have been gaining increasing popularity due to advances in hardware (especially GPUs). More powerful hardware supports more efficient training and a more responsive end-system, once deployed. These algorithms have proven to be particularly effective in image processing and feature detection, namely with deep neural networks. In the context of a vehicle, autonomous or not, perceiving its external and internal environment enables the ability to detect and identify left behind objects, its misuse or other potentially dangerous situations. This captured data is relevant to trigger vehicle intelligent responses. Bosch is currently developing a system that has these capabilities and plans to leverage deep learning approaches to implement it. This work aimed to test and evaluate the suitability of a given embedded device for the project. It also determined the best strategy to implement deep learning solutions in the device. The supplied test bed was a NVidia Software Development Kit (SDK) system for the embedded NVidia Jetson TX2 device with the System-on-Chip (SOC) Parker, an heterogeneous computing chip with 2 Denver-cores (a NVidia implementation of ARM-64 architecture), 4 CortexA57-cores (also ARM-64), 256 Pascal GPU-cores and support for up to 6 video cameras. The SDK includes several software library packages, including for image processing and ML. With the goal of fully exploiting the embedded device compute capabilities, this work studied several inference frameworks, going as far as implementing an inference engine from scratch (named Deeploy) that produces inferences based on two libraries provided by NVidia: cuDNN and TensorRT. Deeploy was evaluated against well known and established frameworks, namely Tensorflow, PyTorch and Darknet, in terms of efficiency, resource management and overall ease of use, maintainability and flexibility. This work also exploited key performance related features available on the device, such as power modes, half-precision floating point computation and the implemented shared memory architecture between the GPU-cores and the CPU-cores.