Publicação
Mining Software Model Repositories
| Resumo: | Modelling languages in software development are crucial for capturing requirements and representing software designs, architectures, and implementations. This dissertation focuses on UML class diagrams, a modelling language widely adopted in object-oriented software development. The quality of UML class diagram models can significantly impact the quality of the system they represent. Defects present in these models can hinder stakeholder understanding, introduce unnecessary complexity, and propagate to the developed system, leading to increased costs. Therefore, understanding the most common defects present in these diagrams is crucial. Further, with the growth of publicly available repositories, a wealth of valuable in- formation, including UML class diagrams, is accessible. This presents an opportunity to study a large number of models extracted from these repositories. In this dissertation, we present an automated evaluation tool to assess a dataset consisting of 103,103 UML class diagrams to identify the defects present in these diagrams. The creation of this dataset involved the development of a web scraping tool designed to extract UML class diagrams from public repository projects. The principles of the Physics of Notations proposed by Moody and the principles of diagram size and diagram flaws proposed by Störrle are incorporated into the automated evaluation tool to identify defects. This allowed us to analyse how UML class diagrams available in public repositories are built "in the wild", and to detect which are the most frequent violations of the modelling principles proposed by Moody and Störrle. |
|---|---|
| Autores principais: | Lacão, Guilherme Ferreira |
| Assunto: | UML Class Diagram Model Quality Factors Physics of Notations Mining Repositories Web Scraping Modelling |
| Ano: | 2023 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
| Resumo: | Modelling languages in software development are crucial for capturing requirements and representing software designs, architectures, and implementations. This dissertation focuses on UML class diagrams, a modelling language widely adopted in object-oriented software development. The quality of UML class diagram models can significantly impact the quality of the system they represent. Defects present in these models can hinder stakeholder understanding, introduce unnecessary complexity, and propagate to the developed system, leading to increased costs. Therefore, understanding the most common defects present in these diagrams is crucial. Further, with the growth of publicly available repositories, a wealth of valuable in- formation, including UML class diagrams, is accessible. This presents an opportunity to study a large number of models extracted from these repositories. In this dissertation, we present an automated evaluation tool to assess a dataset consisting of 103,103 UML class diagrams to identify the defects present in these diagrams. The creation of this dataset involved the development of a web scraping tool designed to extract UML class diagrams from public repository projects. The principles of the Physics of Notations proposed by Moody and the principles of diagram size and diagram flaws proposed by Störrle are incorporated into the automated evaluation tool to identify defects. This allowed us to analyse how UML class diagrams available in public repositories are built "in the wild", and to detect which are the most frequent violations of the modelling principles proposed by Moody and Störrle. |
|---|