Publicação

Efficient Implementation Of A Single-Precision Floating-Point Arithmetic Unit on FPGA

Ver documento

Detalhes bibliográficos
Resumo:This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse squareroot with high-performance and low resource usage. The design uses a piecewise 2nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floatingpoint multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx Core-Gen operators, and conclude that the approach has a high relative performance/area efficiency. © 2014 Technical University of Munich (TUM).
Autores principais:José, Wilson
Outros Autores:Silva, Ana Rita; Neto, Horácio; Véstias, Mário
Assunto:Digital Arithmetic Field Programmable Gate Arrays (FPGA) Program Processors Efficient Implementation Floating-Point Arithmetic Floatingpoint Fused Multiply-Add Relative Performance Resource Usage Single Precision Square-Root
Ano:2014
País:Portugal
Tipo de documento:documento de conferência
Tipo de acesso:acesso restrito
Instituição associada:Instituto Politécnico de Lisboa
Idioma:inglês
Origem:Repositório Científico do Instituto Politécnico de Lisboa
Descrição
Resumo:This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse squareroot with high-performance and low resource usage. The design uses a piecewise 2nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floatingpoint multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx Core-Gen operators, and conclude that the approach has a high relative performance/area efficiency. © 2014 Technical University of Munich (TUM).