Publicação
Parallel dot-products for deep learning on FPGA
| Resumo: | Deep neural networks have recently shown great results in a vast set of image applications. The associated deep learning models are computationally very demanding and, therefore, several hardware solutions have been proposed to accelerate their computation. FPGAs have recently shown very good performances for these kind of applications and so it is considered a promising platform to accelerate the execution of deep learning algorithms. A common operation in these algorithms is multiply-accumulate (MACC) that is used to calculate dot-products. Since many dot products can be calculated in parallel, as long as memory bandwidth is available, it is very important to implement this operation very efficiently to increase the density of MACC units in an FPGA. In this paper, we propose an implementation of parallel MACC units in FPGA for dot-product operations with very high performance/area ratios using a mix of DSP blocks and LUTs. We consider fixed-point representations with 8 bits of size, but the method can be applied to other bit widths. The method allows us to achieve TOPs performances, even for low cost FPGAs. |
|---|---|
| Autores principais: | Véstias, Mário |
| Outros Autores: | Duarte, Rui; De Sousa, Jose; Cláudio de Campos Neto, Horácio |
| Assunto: | Multiply-accumulate Deep learning FPGA Multiplicar-acumular |
| Ano: | 2017 |
| País: | Portugal |
| Tipo de documento: | documento de conferência |
| Tipo de acesso: | acesso restrito |
| Instituição associada: | Instituto Politécnico de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Científico do Instituto Politécnico de Lisboa |
| _version_ | 1866887350534012928 |
|---|---|
| author | Véstias, Mário |
| author2 | Duarte, Rui De Sousa, Jose Cláudio de Campos Neto, Horácio |
| author2_role | author author author |
| author_facet | Véstias, Mário Duarte, Rui De Sousa, Jose Cláudio de Campos Neto, Horácio |
| author_role | author |
| contributor_name_str_mv | RCIPL |
| country_str | PT |
| creators_json_txt | [{\"Person.name\":\"Véstias, Mário\",\"Person.identifier.orcid\":\"0000-0001-8556-4507\"},{\"Person.name\":\"Duarte, Rui\",\"Person.identifier.orcid\":\"0000-0002-7060-4745\"},{\"Person.name\":\"De Sousa, Jose\",\"Person.identifier.orcid\":\"0000-0001-7525-7546\"},{\"Person.name\":\"Cláudio de Campos Neto, Horácio\",\"Person.identifier.orcid\":\"0000-0002-3621-8322\"}] |
| datacite.contributors.contributor.contributorName.fl_str_mv | RCIPL |
| datacite.creators.creator.creatorName.fl_str_mv | Véstias, Mário Duarte, Rui De Sousa, Jose Cláudio de Campos Neto, Horácio |
| datacite.date.Accepted.fl_str_mv | 2017-10-05T00:00:00Z |
| datacite.date.available.fl_str_mv | 2019-04-01T10:16:29Z |
| datacite.date.embargoed.fl_str_mv | 2019-04-01T10:16:29Z |
| datacite.rights.fl_str_mv | http://purl.org/coar/access_right/c_16ec |
| datacite.subjects.subject.fl_str_mv | Multiply-accumulate Deep learning FPGA Multiplicar-acumular |
| datacite.titles.title.fl_str_mv | Parallel dot-products for deep learning on FPGA |
| dc.contributor.none.fl_str_mv | RCIPL |
| dc.creator.none.fl_str_mv | Véstias, Mário Duarte, Rui De Sousa, Jose Cláudio de Campos Neto, Horácio |
| dc.date.Accepted.fl_str_mv | 2017-10-05T00:00:00Z |
| dc.date.available.fl_str_mv | 2019-04-01T10:16:29Z |
| dc.date.embargoed.fl_str_mv | 2019-04-01T10:16:29Z |
| dc.format.none.fl_str_mv | application/pdf |
| dc.identifier.none.fl_str_mv | http://hdl.handle.net/10400.21/9807 |
| dc.language.none.fl_str_mv | eng |
| dc.publisher.none.fl_str_mv | Institute of Electrical and Electronics Engineers |
| dc.rights.none.fl_str_mv | http://purl.org/coar/access_right/c_16ec |
| dc.subject.none.fl_str_mv | Multiply-accumulate Deep learning FPGA Multiplicar-acumular |
| dc.title.fl_str_mv | Parallel dot-products for deep learning on FPGA |
| dc.type.none.fl_str_mv | http://purl.org/coar/resource_type/c_c94f |
| description | Deep neural networks have recently shown great results in a vast set of image applications. The associated deep learning models are computationally very demanding and, therefore, several hardware solutions have been proposed to accelerate their computation. FPGAs have recently shown very good performances for these kind of applications and so it is considered a promising platform to accelerate the execution of deep learning algorithms. A common operation in these algorithms is multiply-accumulate (MACC) that is used to calculate dot-products. Since many dot products can be calculated in parallel, as long as memory bandwidth is available, it is very important to implement this operation very efficiently to increase the density of MACC units in an FPGA. In this paper, we propose an implementation of parallel MACC units in FPGA for dot-product operations with very high performance/area ratios using a mix of DSP blocks and LUTs. We consider fixed-point representations with 8 bits of size, but the method can be applied to other bit widths. The method allows us to achieve TOPs performances, even for low cost FPGAs. |
| dirty | 0 |
| eu_rights_str_mv | restrictedAccess |
| format | conferenceObject |
| fulltext.url.fl_str_mv | https://repositorio.ipl.pt/bitstreams/5d82dc38-c7a3-4bfb-999d-5a35d0d96349/download |
| id | ripl_8f2e442fd09bb7eaa64ec9ebb18daab2 |
| identifier.url.fl_str_mv | http://hdl.handle.net/10400.21/9807 |
| instacron_str | ipl |
| institution | Instituto Politécnico de Lisboa |
| instname_str | Instituto Politécnico de Lisboa |
| language | eng |
| network_acronym_str | ripl |
| network_name_str | Repositório Científico do Instituto Politécnico de Lisboa |
| oai_identifier_str | oai:repositorio.ipl.pt:10400.21/9807 |
| organization_str_mv | urn:organizationAcronym:ipl |
| person_str_mv | Véstias, Mário Véstias, Mário https://www.ciencia-id.pt/4717-C2C7-3F2C 4717-C2C7-3F2C http://orcid.org/0000-0001-8556-4507 0000-0001-8556-4507 Duarte, Rui Duarte, Rui https://www.ciencia-id.pt/B91E-770F-19A3 B91E-770F-19A3 http://orcid.org/0000-0002-7060-4745 0000-0002-7060-4745 De Sousa, Jose De Sousa, Jose https://www.ciencia-id.pt/BE18-E262-E0EC BE18-E262-E0EC http://orcid.org/0000-0001-7525-7546 0000-0001-7525-7546 Cláudio de Campos Neto, Horácio Cláudio de Campos Neto, Horácio https://www.ciencia-id.pt/9915-3BDF-5C35 9915-3BDF-5C35 http://orcid.org/0000-0002-3621-8322 0000-0002-3621-8322 |
| publishDate | 2017 |
| publisher.none.fl_str_mv | Institute of Electrical and Electronics Engineers |
| reponame_str | Repositório Científico do Instituto Politécnico de Lisboa |
| repository_id_str | urn:repositoryAcronym:ripl |
| service_str_mv | urn:repositoryAcronym:ripl |
| spelling | engInstitute of Electrical and Electronics Engineerspt_PTDeep neural networks have recently shown great results in a vast set of image applications. The associated deep learning models are computationally very demanding and, therefore, several hardware solutions have been proposed to accelerate their computation. FPGAs have recently shown very good performances for these kind of applications and so it is considered a promising platform to accelerate the execution of deep learning algorithms. A common operation in these algorithms is multiply-accumulate (MACC) that is used to calculate dot-products. Since many dot products can be calculated in parallel, as long as memory bandwidth is available, it is very important to implement this operation very efficiently to increase the density of MACC units in an FPGA. In this paper, we propose an implementation of parallel MACC units in FPGA for dot-product operations with very high performance/area ratios using a mix of DSP blocks and LUTs. We consider fixed-point representations with 8 bits of size, but the method can be applied to other bit widths. The method allows us to achieve TOPs performances, even for low cost FPGAs.application/pdfpt_PTParallel dot-products for deep learning on FPGAPersonalVéstias, MárioDSpacehttp://dspace.org/items/a7d22b29-c961-45ac-bc09-cd5e1002f1e8DSpacehttp://dspace.org/items/a7d22b29-c961-45ac-bc09-cd5e1002f1e8VéstiasMárioCiência IDhttps://www.ciencia-id.pt4717-C2C7-3F2CORCIDhttp://orcid.org0000-0001-8556-4507Researcher IDhttps://www.researcherid.comH-9953-2012Scopus Author IDhttps://www.scopus.com14525867300PersonalDuarte, RuiDSpacehttp://dspace.org/items/f2b4b9e6-6c89-48c7-bc83-62d2e98a787bDSpacehttp://dspace.org/items/f2b4b9e6-6c89-48c7-bc83-62d2e98a787bDuarteRuiCiência IDhttps://www.ciencia-id.ptB91E-770F-19A3ORCIDhttp://orcid.org0000-0002-7060-4745Researcher IDhttps://www.researcherid.comI-4402-2015Scopus Author IDhttps://www.scopus.com24823991600PersonalDe Sousa, JoseDSpacehttp://dspace.org/items/d98a4d45-2d45-42ec-9f1d-14775723709bDSpacehttp://dspace.org/items/d98a4d45-2d45-42ec-9f1d-14775723709bde SousaJoseCiência IDhttps://www.ciencia-id.ptBE18-E262-E0ECORCIDhttp://orcid.org0000-0001-7525-7546Researcher IDhttps://www.researcherid.comL-6859-2015Scopus Author IDhttps://www.scopus.com7102813024PersonalCláudio de Campos Neto, HorácioDSpacehttp://dspace.org/items/38334d5e-83e8-494c-a9e0-396299376d97DSpacehttp://dspace.org/items/38334d5e-83e8-494c-a9e0-396299376d97Cláudio de Campos NetoHorácioCiência IDhttps://www.ciencia-id.pt9915-3BDF-5C35ORCIDhttp://orcid.org0000-0002-3621-8322HostingInstitutionOrganizationalRCIPLe-mailmailto:rcaap@sp.ipl.ptrcaap@sp.ipl.ptISBNIsPartOf978-9-0903-0428-1ISBNIsPartOf978-1-5386-2040-3ISSNIsPartOf1946-1488DOIIsPartOf10.23919/FPL.2017.80568632019-04-01T10:16:29Z2017-10-052017-10-05T00:00:00ZHandlehttp://hdl.handle.net/10400.21/9807http://purl.org/coar/access_right/c_16ecrestricted accessMultiply-accumulateDeep learningFPGAMultiplicar-acumular557510 bytesother research producthttp://purl.org/coar/resource_type/c_c94fconference objecthttp://purl.org/coar/access_right/c_16ecapplication/pdffulltexthttps://repositorio.ipl.pt/bitstreams/5d82dc38-c7a3-4bfb-999d-5a35d0d96349/download27th International Conference on Field Programmable Logic and Applications (FPL)144-8 Sept. 2017 - Ghent, Belgium |
| spellingShingle | Parallel dot-products for deep learning on FPGA Véstias, Mário Multiply-accumulate Deep learning FPGA Multiplicar-acumular |
| status | SINGLETON |
| subject.fl_str_mv | Multiply-accumulate Deep learning FPGA Multiplicar-acumular |
| title | Parallel dot-products for deep learning on FPGA |
| title_full | Parallel dot-products for deep learning on FPGA |
| title_fullStr | Parallel dot-products for deep learning on FPGA |
| title_full_unstemmed | Parallel dot-products for deep learning on FPGA |
| title_short | Parallel dot-products for deep learning on FPGA |
| title_sort | Parallel dot-products for deep learning on FPGA |
| topic | Multiply-accumulate Deep learning FPGA Multiplicar-acumular |
| topic_facet | Multiply-accumulate Deep learning FPGA Multiplicar-acumular |
| url | http://hdl.handle.net/10400.21/9807 |
| visible | 1 |