Publicação

Efficient Implementation Of A Single-Precision Floating-Point Arithmetic Unit on FPGA

Detalhes bibliográficos
Resumo:	This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse squareroot with high-performance and low resource usage. The design uses a piecewise 2nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floatingpoint multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx Core-Gen operators, and conclude that the approach has a high relative performance/area efficiency. © 2014 Technical University of Munich (TUM).
Autores principais:	José, Wilson
Outros Autores:	Silva, Ana Rita; Neto, Horácio; Véstias, Mário
Assunto:	Digital Arithmetic Field Programmable Gate Arrays (FPGA) Program Processors Efficient Implementation Floating-Point Arithmetic Floatingpoint Fused Multiply-Add Relative Performance Resource Usage Single Precision Square-Root
Ano:	2014
País:	Portugal
Tipo de documento:	documento de conferência
Tipo de acesso:	acesso restrito
Instituição associada:	Instituto Politécnico de Lisboa
Idioma:	inglês
Origem:	Repositório Científico do Instituto Politécnico de Lisboa

Descrição
Resumo:	This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse squareroot with high-performance and low resource usage. The design uses a piecewise 2nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floatingpoint multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx Core-Gen operators, and conclude that the approach has a high relative performance/area efficiency. © 2014 Technical University of Munich (TUM).