Publicação

Parallel dot-products for deep learning on FPGA

Detalhes bibliográficos
Resumo:	Deep neural networks have recently shown great results in a vast set of image applications. The associated deep learning models are computationally very demanding and, therefore, several hardware solutions have been proposed to accelerate their computation. FPGAs have recently shown very good performances for these kind of applications and so it is considered a promising platform to accelerate the execution of deep learning algorithms. A common operation in these algorithms is multiply-accumulate (MACC) that is used to calculate dot-products. Since many dot products can be calculated in parallel, as long as memory bandwidth is available, it is very important to implement this operation very efficiently to increase the density of MACC units in an FPGA. In this paper, we propose an implementation of parallel MACC units in FPGA for dot-product operations with very high performance/area ratios using a mix of DSP blocks and LUTs. We consider fixed-point representations with 8 bits of size, but the method can be applied to other bit widths. The method allows us to achieve TOPs performances, even for low cost FPGAs.
Autores principais:	Véstias, Mário
Outros Autores:	Duarte, Rui; De Sousa, Jose; Cláudio de Campos Neto, Horácio
Assunto:	Multiply-accumulate Deep learning FPGA Multiplicar-acumular
Ano:	2017
País:	Portugal
Tipo de documento:	documento de conferência
Tipo de acesso:	acesso restrito
Instituição associada:	Instituto Politécnico de Lisboa
Idioma:	inglês
Origem:	Repositório Científico do Instituto Politécnico de Lisboa

_version_	1866887350534012928
author	Véstias, Mário
author2	Duarte, Rui De Sousa, Jose Cláudio de Campos Neto, Horácio
author2_role	author author author
author_facet	Véstias, Mário Duarte, Rui De Sousa, Jose Cláudio de Campos Neto, Horácio
author_role	author
contributor_name_str_mv	RCIPL
country_str	PT
creators_json_txt	[{\"Person.name\":\"Véstias, Mário\",\"Person.identifier.orcid\":\"0000-0001-8556-4507\"},{\"Person.name\":\"Duarte, Rui\",\"Person.identifier.orcid\":\"0000-0002-7060-4745\"},{\"Person.name\":\"De Sousa, Jose\",\"Person.identifier.orcid\":\"0000-0001-7525-7546\"},{\"Person.name\":\"Cláudio de Campos Neto, Horácio\",\"Person.identifier.orcid\":\"0000-0002-3621-8322\"}]
datacite.contributors.contributor.contributorName.fl_str_mv	RCIPL
datacite.creators.creator.creatorName.fl_str_mv	Véstias, Mário Duarte, Rui De Sousa, Jose Cláudio de Campos Neto, Horácio
datacite.date.Accepted.fl_str_mv	2017-10-05T00:00:00Z
datacite.date.available.fl_str_mv	2019-04-01T10:16:29Z
datacite.date.embargoed.fl_str_mv	2019-04-01T10:16:29Z
datacite.rights.fl_str_mv	http://purl.org/coar/access_right/c_16ec
datacite.subjects.subject.fl_str_mv	Multiply-accumulate Deep learning FPGA Multiplicar-acumular
datacite.titles.title.fl_str_mv	Parallel dot-products for deep learning on FPGA
dc.contributor.none.fl_str_mv	RCIPL
dc.creator.none.fl_str_mv	Véstias, Mário Duarte, Rui De Sousa, Jose Cláudio de Campos Neto, Horácio
dc.date.Accepted.fl_str_mv	2017-10-05T00:00:00Z
dc.date.available.fl_str_mv	2019-04-01T10:16:29Z
dc.date.embargoed.fl_str_mv	2019-04-01T10:16:29Z
dc.format.none.fl_str_mv	application/pdf
dc.identifier.none.fl_str_mv	http://hdl.handle.net/10400.21/9807
dc.language.none.fl_str_mv	eng
dc.publisher.none.fl_str_mv	Institute of Electrical and Electronics Engineers
dc.rights.none.fl_str_mv	http://purl.org/coar/access_right/c_16ec
dc.subject.none.fl_str_mv	Multiply-accumulate Deep learning FPGA Multiplicar-acumular
dc.title.fl_str_mv	Parallel dot-products for deep learning on FPGA
dc.type.none.fl_str_mv	http://purl.org/coar/resource_type/c_c94f
description	Deep neural networks have recently shown great results in a vast set of image applications. The associated deep learning models are computationally very demanding and, therefore, several hardware solutions have been proposed to accelerate their computation. FPGAs have recently shown very good performances for these kind of applications and so it is considered a promising platform to accelerate the execution of deep learning algorithms. A common operation in these algorithms is multiply-accumulate (MACC) that is used to calculate dot-products. Since many dot products can be calculated in parallel, as long as memory bandwidth is available, it is very important to implement this operation very efficiently to increase the density of MACC units in an FPGA. In this paper, we propose an implementation of parallel MACC units in FPGA for dot-product operations with very high performance/area ratios using a mix of DSP blocks and LUTs. We consider fixed-point representations with 8 bits of size, but the method can be applied to other bit widths. The method allows us to achieve TOPs performances, even for low cost FPGAs.
dirty	0
eu_rights_str_mv	restrictedAccess
format	conferenceObject
fulltext.url.fl_str_mv	https://repositorio.ipl.pt/bitstreams/5d82dc38-c7a3-4bfb-999d-5a35d0d96349/download
id	ripl_8f2e442fd09bb7eaa64ec9ebb18daab2
identifier.url.fl_str_mv	http://hdl.handle.net/10400.21/9807
instacron_str	ipl
institution	Instituto Politécnico de Lisboa
instname_str	Instituto Politécnico de Lisboa
language	eng
network_acronym_str	ripl
network_name_str	Repositório Científico do Instituto Politécnico de Lisboa
oai_identifier_str	oai:repositorio.ipl.pt:10400.21/9807
organization_str_mv	urn:organizationAcronym:ipl
person_str_mv	Véstias, Mário Véstias, Mário https://www.ciencia-id.pt/4717-C2C7-3F2C 4717-C2C7-3F2C http://orcid.org/0000-0001-8556-4507 0000-0001-8556-4507 Duarte, Rui Duarte, Rui https://www.ciencia-id.pt/B91E-770F-19A3 B91E-770F-19A3 http://orcid.org/0000-0002-7060-4745 0000-0002-7060-4745 De Sousa, Jose De Sousa, Jose https://www.ciencia-id.pt/BE18-E262-E0EC BE18-E262-E0EC http://orcid.org/0000-0001-7525-7546 0000-0001-7525-7546 Cláudio de Campos Neto, Horácio Cláudio de Campos Neto, Horácio https://www.ciencia-id.pt/9915-3BDF-5C35 9915-3BDF-5C35 http://orcid.org/0000-0002-3621-8322 0000-0002-3621-8322
publishDate	2017
publisher.none.fl_str_mv	Institute of Electrical and Electronics Engineers
reponame_str	Repositório Científico do Instituto Politécnico de Lisboa
repository_id_str	urn:repositoryAcronym:ripl
service_str_mv	urn:repositoryAcronym:ripl
spelling	engInstitute of Electrical and Electronics Engineerspt_PTDeep neural networks have recently shown great results in a vast set of image applications. The associated deep learning models are computationally very demanding and, therefore, several hardware solutions have been proposed to accelerate their computation. FPGAs have recently shown very good performances for these kind of applications and so it is considered a promising platform to accelerate the execution of deep learning algorithms. A common operation in these algorithms is multiply-accumulate (MACC) that is used to calculate dot-products. Since many dot products can be calculated in parallel, as long as memory bandwidth is available, it is very important to implement this operation very efficiently to increase the density of MACC units in an FPGA. In this paper, we propose an implementation of parallel MACC units in FPGA for dot-product operations with very high performance/area ratios using a mix of DSP blocks and LUTs. We consider fixed-point representations with 8 bits of size, but the method can be applied to other bit widths. The method allows us to achieve TOPs performances, even for low cost FPGAs.application/pdfpt_PTParallel dot-products for deep learning on FPGAPersonalVéstias, MárioDSpacehttp://dspace.org/items/a7d22b29-c961-45ac-bc09-cd5e1002f1e8DSpacehttp://dspace.org/items/a7d22b29-c961-45ac-bc09-cd5e1002f1e8VéstiasMárioCiência IDhttps://www.ciencia-id.pt4717-C2C7-3F2CORCIDhttp://orcid.org0000-0001-8556-4507Researcher IDhttps://www.researcherid.comH-9953-2012Scopus Author IDhttps://www.scopus.com14525867300PersonalDuarte, RuiDSpacehttp://dspace.org/items/f2b4b9e6-6c89-48c7-bc83-62d2e98a787bDSpacehttp://dspace.org/items/f2b4b9e6-6c89-48c7-bc83-62d2e98a787bDuarteRuiCiência IDhttps://www.ciencia-id.ptB91E-770F-19A3ORCIDhttp://orcid.org0000-0002-7060-4745Researcher IDhttps://www.researcherid.comI-4402-2015Scopus Author IDhttps://www.scopus.com24823991600PersonalDe Sousa, JoseDSpacehttp://dspace.org/items/d98a4d45-2d45-42ec-9f1d-14775723709bDSpacehttp://dspace.org/items/d98a4d45-2d45-42ec-9f1d-14775723709bde SousaJoseCiência IDhttps://www.ciencia-id.ptBE18-E262-E0ECORCIDhttp://orcid.org0000-0001-7525-7546Researcher IDhttps://www.researcherid.comL-6859-2015Scopus Author IDhttps://www.scopus.com7102813024PersonalCláudio de Campos Neto, HorácioDSpacehttp://dspace.org/items/38334d5e-83e8-494c-a9e0-396299376d97DSpacehttp://dspace.org/items/38334d5e-83e8-494c-a9e0-396299376d97Cláudio de Campos NetoHorácioCiência IDhttps://www.ciencia-id.pt9915-3BDF-5C35ORCIDhttp://orcid.org0000-0002-3621-8322HostingInstitutionOrganizationalRCIPLe-mailmailto:rcaap@sp.ipl.ptrcaap@sp.ipl.ptISBNIsPartOf978-9-0903-0428-1ISBNIsPartOf978-1-5386-2040-3ISSNIsPartOf1946-1488DOIIsPartOf10.23919/FPL.2017.80568632019-04-01T10:16:29Z2017-10-052017-10-05T00:00:00ZHandlehttp://hdl.handle.net/10400.21/9807http://purl.org/coar/access_right/c_16ecrestricted accessMultiply-accumulateDeep learningFPGAMultiplicar-acumular557510 bytesother research producthttp://purl.org/coar/resource_type/c_c94fconference objecthttp://purl.org/coar/access_right/c_16ecapplication/pdffulltexthttps://repositorio.ipl.pt/bitstreams/5d82dc38-c7a3-4bfb-999d-5a35d0d96349/download27th International Conference on Field Programmable Logic and Applications (FPL)144-8 Sept. 2017 - Ghent, Belgium
spellingShingle	Parallel dot-products for deep learning on FPGA Véstias, Mário Multiply-accumulate Deep learning FPGA Multiplicar-acumular
status	SINGLETON
subject.fl_str_mv	Multiply-accumulate Deep learning FPGA Multiplicar-acumular
title	Parallel dot-products for deep learning on FPGA
title_full	Parallel dot-products for deep learning on FPGA
title_fullStr	Parallel dot-products for deep learning on FPGA
title_full_unstemmed	Parallel dot-products for deep learning on FPGA
title_short	Parallel dot-products for deep learning on FPGA
title_sort	Parallel dot-products for deep learning on FPGA
topic	Multiply-accumulate Deep learning FPGA Multiplicar-acumular
topic_facet	Multiply-accumulate Deep learning FPGA Multiplicar-acumular
url	http://hdl.handle.net/10400.21/9807
visible	1

Publicação

Parallel dot-products for deep learning on FPGA

Registos relacionados