Publicação

Mining Software Model Repositories

Ver documento

Detalhes bibliográficos
Resumo:Modelling languages in software development are crucial for capturing requirements and representing software designs, architectures, and implementations. This dissertation focuses on UML class diagrams, a modelling language widely adopted in object-oriented software development. The quality of UML class diagram models can significantly impact the quality of the system they represent. Defects present in these models can hinder stakeholder understanding, introduce unnecessary complexity, and propagate to the developed system, leading to increased costs. Therefore, understanding the most common defects present in these diagrams is crucial. Further, with the growth of publicly available repositories, a wealth of valuable in- formation, including UML class diagrams, is accessible. This presents an opportunity to study a large number of models extracted from these repositories. In this dissertation, we present an automated evaluation tool to assess a dataset consisting of 103,103 UML class diagrams to identify the defects present in these diagrams. The creation of this dataset involved the development of a web scraping tool designed to extract UML class diagrams from public repository projects. The principles of the Physics of Notations proposed by Moody and the principles of diagram size and diagram flaws proposed by Störrle are incorporated into the automated evaluation tool to identify defects. This allowed us to analyse how UML class diagrams available in public repositories are built "in the wild", and to detect which are the most frequent violations of the modelling principles proposed by Moody and Störrle.
Autores principais:Lacão, Guilherme Ferreira
Assunto:UML Class Diagram Model Quality Factors Physics of Notations Mining Repositories Web Scraping Modelling
Ano:2023
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade Nova de Lisboa
Idioma:inglês
Origem:Repositório Institucional da UNL
Descrição
Resumo:Modelling languages in software development are crucial for capturing requirements and representing software designs, architectures, and implementations. This dissertation focuses on UML class diagrams, a modelling language widely adopted in object-oriented software development. The quality of UML class diagram models can significantly impact the quality of the system they represent. Defects present in these models can hinder stakeholder understanding, introduce unnecessary complexity, and propagate to the developed system, leading to increased costs. Therefore, understanding the most common defects present in these diagrams is crucial. Further, with the growth of publicly available repositories, a wealth of valuable in- formation, including UML class diagrams, is accessible. This presents an opportunity to study a large number of models extracted from these repositories. In this dissertation, we present an automated evaluation tool to assess a dataset consisting of 103,103 UML class diagrams to identify the defects present in these diagrams. The creation of this dataset involved the development of a web scraping tool designed to extract UML class diagrams from public repository projects. The principles of the Physics of Notations proposed by Moody and the principles of diagram size and diagram flaws proposed by Störrle are incorporated into the automated evaluation tool to identify defects. This allowed us to analyse how UML class diagrams available in public repositories are built "in the wild", and to detect which are the most frequent violations of the modelling principles proposed by Moody and Störrle.