Publicação

Parallel dot-products for deep learning on FPGA

Ver documento

Detalhes bibliográficos
Resumo:Deep neural networks have recently shown great results in a vast set of image applications. The associated deep learning models are computationally very demanding and, therefore, several hardware solutions have been proposed to accelerate their computation. FPGAs have recently shown very good performances for these kind of applications and so it is considered a promising platform to accelerate the execution of deep learning algorithms. A common operation in these algorithms is multiply-accumulate (MACC) that is used to calculate dot-products. Since many dot products can be calculated in parallel, as long as memory bandwidth is available, it is very important to implement this operation very efficiently to increase the density of MACC units in an FPGA. In this paper, we propose an implementation of parallel MACC units in FPGA for dot-product operations with very high performance/area ratios using a mix of DSP blocks and LUTs. We consider fixed-point representations with 8 bits of size, but the method can be applied to other bit widths. The method allows us to achieve TOPs performances, even for low cost FPGAs.
Autores principais:Véstias, Mário
Outros Autores:Duarte, Rui; De Sousa, Jose; Cláudio de Campos Neto, Horácio
Assunto:Multiply-accumulate Deep learning FPGA Multiplicar-acumular
Ano:2017
País:Portugal
Tipo de documento:documento de conferência
Tipo de acesso:acesso restrito
Instituição associada:Instituto Politécnico de Lisboa
Idioma:inglês
Origem:Repositório Científico do Instituto Politécnico de Lisboa
_version_ 1866887350534012928
author Véstias, Mário
author2 Duarte, Rui
De Sousa, Jose
Cláudio de Campos Neto, Horácio
author2_role author
author
author
author_facet Véstias, Mário
Duarte, Rui
De Sousa, Jose
Cláudio de Campos Neto, Horácio
author_role author
contributor_name_str_mv RCIPL
country_str PT
creators_json_txt [{\"Person.name\":\"Véstias, Mário\",\"Person.identifier.orcid\":\"0000-0001-8556-4507\"},{\"Person.name\":\"Duarte, Rui\",\"Person.identifier.orcid\":\"0000-0002-7060-4745\"},{\"Person.name\":\"De Sousa, Jose\",\"Person.identifier.orcid\":\"0000-0001-7525-7546\"},{\"Person.name\":\"Cláudio de Campos Neto, Horácio\",\"Person.identifier.orcid\":\"0000-0002-3621-8322\"}]
datacite.contributors.contributor.contributorName.fl_str_mv RCIPL
datacite.creators.creator.creatorName.fl_str_mv Véstias, Mário
Duarte, Rui
De Sousa, Jose
Cláudio de Campos Neto, Horácio
datacite.date.Accepted.fl_str_mv 2017-10-05T00:00:00Z
datacite.date.available.fl_str_mv 2019-04-01T10:16:29Z
datacite.date.embargoed.fl_str_mv 2019-04-01T10:16:29Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_16ec
datacite.subjects.subject.fl_str_mv Multiply-accumulate
Deep learning
FPGA
Multiplicar-acumular
datacite.titles.title.fl_str_mv Parallel dot-products for deep learning on FPGA
dc.contributor.none.fl_str_mv RCIPL
dc.creator.none.fl_str_mv Véstias, Mário
Duarte, Rui
De Sousa, Jose
Cláudio de Campos Neto, Horácio
dc.date.Accepted.fl_str_mv 2017-10-05T00:00:00Z
dc.date.available.fl_str_mv 2019-04-01T10:16:29Z
dc.date.embargoed.fl_str_mv 2019-04-01T10:16:29Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv http://hdl.handle.net/10400.21/9807
dc.language.none.fl_str_mv eng
dc.publisher.none.fl_str_mv Institute of Electrical and Electronics Engineers
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_16ec
dc.subject.none.fl_str_mv Multiply-accumulate
Deep learning
FPGA
Multiplicar-acumular
dc.title.fl_str_mv Parallel dot-products for deep learning on FPGA
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_c94f
description Deep neural networks have recently shown great results in a vast set of image applications. The associated deep learning models are computationally very demanding and, therefore, several hardware solutions have been proposed to accelerate their computation. FPGAs have recently shown very good performances for these kind of applications and so it is considered a promising platform to accelerate the execution of deep learning algorithms. A common operation in these algorithms is multiply-accumulate (MACC) that is used to calculate dot-products. Since many dot products can be calculated in parallel, as long as memory bandwidth is available, it is very important to implement this operation very efficiently to increase the density of MACC units in an FPGA. In this paper, we propose an implementation of parallel MACC units in FPGA for dot-product operations with very high performance/area ratios using a mix of DSP blocks and LUTs. We consider fixed-point representations with 8 bits of size, but the method can be applied to other bit widths. The method allows us to achieve TOPs performances, even for low cost FPGAs.
dirty 0
eu_rights_str_mv restrictedAccess
format conferenceObject
fulltext.url.fl_str_mv https://repositorio.ipl.pt/bitstreams/5d82dc38-c7a3-4bfb-999d-5a35d0d96349/download
id ripl_8f2e442fd09bb7eaa64ec9ebb18daab2
identifier.url.fl_str_mv http://hdl.handle.net/10400.21/9807
instacron_str ipl
institution Instituto Politécnico de Lisboa
instname_str Instituto Politécnico de Lisboa
language eng
network_acronym_str ripl
network_name_str Repositório Científico do Instituto Politécnico de Lisboa
oai_identifier_str oai:repositorio.ipl.pt:10400.21/9807
organization_str_mv urn:organizationAcronym:ipl
person_str_mv Véstias, Mário
Véstias, Mário
https://www.ciencia-id.pt/4717-C2C7-3F2C
4717-C2C7-3F2C
http://orcid.org/0000-0001-8556-4507
0000-0001-8556-4507
Duarte, Rui
Duarte, Rui
https://www.ciencia-id.pt/B91E-770F-19A3
B91E-770F-19A3
http://orcid.org/0000-0002-7060-4745
0000-0002-7060-4745
De Sousa, Jose
De Sousa, Jose
https://www.ciencia-id.pt/BE18-E262-E0EC
BE18-E262-E0EC
http://orcid.org/0000-0001-7525-7546
0000-0001-7525-7546
Cláudio de Campos Neto, Horácio
Cláudio de Campos Neto, Horácio
https://www.ciencia-id.pt/9915-3BDF-5C35
9915-3BDF-5C35
http://orcid.org/0000-0002-3621-8322
0000-0002-3621-8322
publishDate 2017
publisher.none.fl_str_mv Institute of Electrical and Electronics Engineers
reponame_str Repositório Científico do Instituto Politécnico de Lisboa
repository_id_str urn:repositoryAcronym:ripl
service_str_mv urn:repositoryAcronym:ripl
spelling engInstitute of Electrical and Electronics Engineerspt_PTDeep neural networks have recently shown great results in a vast set of image applications. The associated deep learning models are computationally very demanding and, therefore, several hardware solutions have been proposed to accelerate their computation. FPGAs have recently shown very good performances for these kind of applications and so it is considered a promising platform to accelerate the execution of deep learning algorithms. A common operation in these algorithms is multiply-accumulate (MACC) that is used to calculate dot-products. Since many dot products can be calculated in parallel, as long as memory bandwidth is available, it is very important to implement this operation very efficiently to increase the density of MACC units in an FPGA. In this paper, we propose an implementation of parallel MACC units in FPGA for dot-product operations with very high performance/area ratios using a mix of DSP blocks and LUTs. We consider fixed-point representations with 8 bits of size, but the method can be applied to other bit widths. The method allows us to achieve TOPs performances, even for low cost FPGAs.application/pdfpt_PTParallel dot-products for deep learning on FPGAPersonalVéstias, MárioDSpacehttp://dspace.org/items/a7d22b29-c961-45ac-bc09-cd5e1002f1e8DSpacehttp://dspace.org/items/a7d22b29-c961-45ac-bc09-cd5e1002f1e8VéstiasMárioCiência IDhttps://www.ciencia-id.pt4717-C2C7-3F2CORCIDhttp://orcid.org0000-0001-8556-4507Researcher IDhttps://www.researcherid.comH-9953-2012Scopus Author IDhttps://www.scopus.com14525867300PersonalDuarte, RuiDSpacehttp://dspace.org/items/f2b4b9e6-6c89-48c7-bc83-62d2e98a787bDSpacehttp://dspace.org/items/f2b4b9e6-6c89-48c7-bc83-62d2e98a787bDuarteRuiCiência IDhttps://www.ciencia-id.ptB91E-770F-19A3ORCIDhttp://orcid.org0000-0002-7060-4745Researcher IDhttps://www.researcherid.comI-4402-2015Scopus Author IDhttps://www.scopus.com24823991600PersonalDe Sousa, JoseDSpacehttp://dspace.org/items/d98a4d45-2d45-42ec-9f1d-14775723709bDSpacehttp://dspace.org/items/d98a4d45-2d45-42ec-9f1d-14775723709bde SousaJoseCiência IDhttps://www.ciencia-id.ptBE18-E262-E0ECORCIDhttp://orcid.org0000-0001-7525-7546Researcher IDhttps://www.researcherid.comL-6859-2015Scopus Author IDhttps://www.scopus.com7102813024PersonalCláudio de Campos Neto, HorácioDSpacehttp://dspace.org/items/38334d5e-83e8-494c-a9e0-396299376d97DSpacehttp://dspace.org/items/38334d5e-83e8-494c-a9e0-396299376d97Cláudio de Campos NetoHorácioCiência IDhttps://www.ciencia-id.pt9915-3BDF-5C35ORCIDhttp://orcid.org0000-0002-3621-8322HostingInstitutionOrganizationalRCIPLe-mailmailto:rcaap@sp.ipl.ptrcaap@sp.ipl.ptISBNIsPartOf978-9-0903-0428-1ISBNIsPartOf978-1-5386-2040-3ISSNIsPartOf1946-1488DOIIsPartOf10.23919/FPL.2017.80568632019-04-01T10:16:29Z2017-10-052017-10-05T00:00:00ZHandlehttp://hdl.handle.net/10400.21/9807http://purl.org/coar/access_right/c_16ecrestricted accessMultiply-accumulateDeep learningFPGAMultiplicar-acumular557510 bytesother research producthttp://purl.org/coar/resource_type/c_c94fconference objecthttp://purl.org/coar/access_right/c_16ecapplication/pdffulltexthttps://repositorio.ipl.pt/bitstreams/5d82dc38-c7a3-4bfb-999d-5a35d0d96349/download27th International Conference on Field Programmable Logic and Applications (FPL)144-8 Sept. 2017 - Ghent, Belgium
spellingShingle Parallel dot-products for deep learning on FPGA
Véstias, Mário
Multiply-accumulate
Deep learning
FPGA
Multiplicar-acumular
status SINGLETON
subject.fl_str_mv Multiply-accumulate
Deep learning
FPGA
Multiplicar-acumular
title Parallel dot-products for deep learning on FPGA
title_full Parallel dot-products for deep learning on FPGA
title_fullStr Parallel dot-products for deep learning on FPGA
title_full_unstemmed Parallel dot-products for deep learning on FPGA
title_short Parallel dot-products for deep learning on FPGA
title_sort Parallel dot-products for deep learning on FPGA
topic Multiply-accumulate
Deep learning
FPGA
Multiplicar-acumular
topic_facet Multiply-accumulate
Deep learning
FPGA
Multiplicar-acumular
url http://hdl.handle.net/10400.21/9807
visible 1