Publicação

Heterogeneous fault tolerance architecture based on Arm and RISC-V processors

Ver documento

Detalhes bibliográficos
Resumo:Safety-critical systems deployed in harsh environments rely on fault tolerance and redundancy techniques to keep them operating even in the presence of faults. Although there are effective techniques to mitigate one side faults, they are not enough to protect the system against simultaneously multi side faults. These kinds of faults trigger the same error in faulty redundant components, which makes resulting errors invisible and undetectable for fault tolerant mechanisms. To overcome this problem, design diversity is applied in fault tolerant system to mitigate the Common-Mode Failure (CMF) and build a more robust and reliable system. Despite several fault tolerance architectures based on FPGA are available in the literature, to the best of our knowledge, none of them aims both hardening of heterogeneous processors and applying design diversity at processor level. To address this lack of solutions in the current state of the art, this dissertation proposes a novel heterogeneous fault tolerance architecture, Lock-V, which enables design diversity at processors architecture level. It deals with CMF, as well as both error detection and recovery fault tolerance techniques to mitigate errors triggered by external environment interactions, e.g., radiation. To eliminate the CMF, Lock-V explores an implementation based on different processing units: a hard-core Arm Cortex-A9 and a soft-core RISC-V-based processors, to leverage design diversity through ISA heterogeneity. To implement fault tolerance, Lock-V proposes a hybrid DCLS solution where the error detection is done by hardware, resorting to a FPGA accelerator, while error recovery is performed by software using rollback technique. After the deployment of Lock-V on a Zynq-7000 SoC, over 45000 faults were injected. The results taken from such injection shows that when an application runs on the Lock-V architecture, besides its protection against the CMF due to processors design diversity, it is also protected against 97% of the triggered errors. Nevertheless implement Lock-V came up with some tradeoffs. It used 79% of the LUT and 34% of the FF available on the Zedboard FPGA platform. Regarding the software part, implementing Lock-V leads to an 8% increase in memory footprint and also an increase in the execution overhead around 12%, mainly in the worst case scenario as tested in the absence of errors. Knowing that all the redundancy has its cost, Lock-V proved to be able to grant a system with design diversity and fault tolerance capabilities.
Autores principais:Rodrigues, Cristiano António Azevedo
Assunto:Design diversity Fault tolerance Lockstep Redundancy Diversidade de desenho Redundância Tolerância a falhas Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Ano:2019
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade do Minho
Idioma:inglês
Origem:RepositóriUM - Universidade do Minho
Descrição
Resumo:Safety-critical systems deployed in harsh environments rely on fault tolerance and redundancy techniques to keep them operating even in the presence of faults. Although there are effective techniques to mitigate one side faults, they are not enough to protect the system against simultaneously multi side faults. These kinds of faults trigger the same error in faulty redundant components, which makes resulting errors invisible and undetectable for fault tolerant mechanisms. To overcome this problem, design diversity is applied in fault tolerant system to mitigate the Common-Mode Failure (CMF) and build a more robust and reliable system. Despite several fault tolerance architectures based on FPGA are available in the literature, to the best of our knowledge, none of them aims both hardening of heterogeneous processors and applying design diversity at processor level. To address this lack of solutions in the current state of the art, this dissertation proposes a novel heterogeneous fault tolerance architecture, Lock-V, which enables design diversity at processors architecture level. It deals with CMF, as well as both error detection and recovery fault tolerance techniques to mitigate errors triggered by external environment interactions, e.g., radiation. To eliminate the CMF, Lock-V explores an implementation based on different processing units: a hard-core Arm Cortex-A9 and a soft-core RISC-V-based processors, to leverage design diversity through ISA heterogeneity. To implement fault tolerance, Lock-V proposes a hybrid DCLS solution where the error detection is done by hardware, resorting to a FPGA accelerator, while error recovery is performed by software using rollback technique. After the deployment of Lock-V on a Zynq-7000 SoC, over 45000 faults were injected. The results taken from such injection shows that when an application runs on the Lock-V architecture, besides its protection against the CMF due to processors design diversity, it is also protected against 97% of the triggered errors. Nevertheless implement Lock-V came up with some tradeoffs. It used 79% of the LUT and 34% of the FF available on the Zedboard FPGA platform. Regarding the software part, implementing Lock-V leads to an 8% increase in memory footprint and also an increase in the execution overhead around 12%, mainly in the worst case scenario as tested in the absence of errors. Knowing that all the redundancy has its cost, Lock-V proved to be able to grant a system with design diversity and fault tolerance capabilities.