Publicação
Efficient Implementation Of A Single-Precision Floating-Point Arithmetic Unit on FPGA
| Resumo: | This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse squareroot with high-performance and low resource usage. The design uses a piecewise 2nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floatingpoint multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx Core-Gen operators, and conclude that the approach has a high relative performance/area efficiency. © 2014 Technical University of Munich (TUM). |
|---|---|
| Autores principais: | José, Wilson |
| Outros Autores: | Silva, Ana Rita; Neto, Horácio; Véstias, Mário |
| Assunto: | Digital Arithmetic Field Programmable Gate Arrays (FPGA) Program Processors Efficient Implementation Floating-Point Arithmetic Floatingpoint Fused Multiply-Add Relative Performance Resource Usage Single Precision Square-Root |
| Ano: | 2014 |
| País: | Portugal |
| Tipo de documento: | documento de conferência |
| Tipo de acesso: | acesso restrito |
| Instituição associada: | Instituto Politécnico de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Científico do Instituto Politécnico de Lisboa |
| Resumo: | This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse squareroot with high-performance and low resource usage. The design uses a piecewise 2nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floatingpoint multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx Core-Gen operators, and conclude that the approach has a high relative performance/area efficiency. © 2014 Technical University of Munich (TUM). |
|---|