Publicação

Fault Revealing Test Oracles, Are We There Yet? Evaluating The Effectiveness Of Automatically Generated Test Oracles On Manually-Written And Automatically Generated Unit Tests

Ver documento

Detalhes bibliográficos
Resumo:Automated test suite generation tools have been used in real development scenarios and proven to be able to detect real faults. These tools, however, do not know the expected behavior of the system and generate tests that execute the faulty behavior, but fail to identify the fault due to poor test oracles. To solve this problem, researchers have developed several approaches to automatically generate test oracles that resemble manually-written ones. However, there remain some questions regarding the use of these tools in real development scenarios. In particular, how effective are automatically generated test oracles at revealing real faults? How long do these tools require to generate an oracle? To answer these questions, we applied a recent and promising test oracle generation approach (T5) to all fault-revealing test cases in the DEFECTS4J collection and investigated how effective are the generated test oracles at detecting real faults as well as the time required by the tool to generate them; Our results show that: (1) out-of-the-box, oracles generated by T5 do not compile; (2) after a simple procedure, out of the 1696 test oracles, only 466 compile and 58 of them manage to correctly identify the fault; (3) when considering the 835 bugs in DEFECTS4J, T5 was able to detect 27, i.e., 3.23% of the bugs. Moreover, T5 required, on average, 401.3 seconds to generate a test oracle. The approaches and datasets presented in this thesis bring automated test oracle generation one step closer to being used in real software, by providing insight into current problems of several tools as well as introducing a way to test automated test oracle generation tools that are being developed regarding their effectiveness on detecting real software faults.
Autores principais:Bento, Daniel Correia
Assunto:Teste de software Testes unitários Test oracle Estudo empírico Geração automática de oracles Teses de mestrado - 2023
Ano:2023
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade de Lisboa
Idioma:inglês
Origem:Repositório da Universidade de Lisboa
Descrição
Resumo:Automated test suite generation tools have been used in real development scenarios and proven to be able to detect real faults. These tools, however, do not know the expected behavior of the system and generate tests that execute the faulty behavior, but fail to identify the fault due to poor test oracles. To solve this problem, researchers have developed several approaches to automatically generate test oracles that resemble manually-written ones. However, there remain some questions regarding the use of these tools in real development scenarios. In particular, how effective are automatically generated test oracles at revealing real faults? How long do these tools require to generate an oracle? To answer these questions, we applied a recent and promising test oracle generation approach (T5) to all fault-revealing test cases in the DEFECTS4J collection and investigated how effective are the generated test oracles at detecting real faults as well as the time required by the tool to generate them; Our results show that: (1) out-of-the-box, oracles generated by T5 do not compile; (2) after a simple procedure, out of the 1696 test oracles, only 466 compile and 58 of them manage to correctly identify the fault; (3) when considering the 835 bugs in DEFECTS4J, T5 was able to detect 27, i.e., 3.23% of the bugs. Moreover, T5 required, on average, 401.3 seconds to generate a test oracle. The approaches and datasets presented in this thesis bring automated test oracle generation one step closer to being used in real software, by providing insight into current problems of several tools as well as introducing a way to test automated test oracle generation tools that are being developed regarding their effectiveness on detecting real software faults.