Publicação

Mining github software repositories to look for programming language cocktails

Ver documento

Detalhes bibliográficos
Resumo:In light of specific development needs, it is common to concurrently apply different technologies to build complex applications. Given that lowering risks, costs, and other negative factors, while improving their positive counterparts is paramount to a better development environment, it becomes relevant to find out what technologies work best for each intended purpose in a project. In order to reach these findings, it is necessary to analyse and study the technologies applied in these projects and how they interconnect and relate to each other. The theory behind Programming Cocktails (meaning the set of programming technologies - Ingredients - that are used to develop complex systems) can support these analysis. However, due to the sheer amount of data that is required to construct and analyse these Cocktails, it becomes unsustainable to manually obtain them. From the desire to accelerate this process comes the need for a tool that automates the data collection and its conversion into an appropriate format for analysis. As such, the project proposed in this paper revolves around the development of a web-scraping application that can generate Cocktail Identity Cards (CIC) from source code repositories hosted on GitHub. Said CICs contain the Ingredients (programming languages, libraries and frameworks) used in the corresponding GitHub repository and follow the ontology previously established in a larger research project to model each Programming Cocktail. This paper presents a survey of current Source Version Control Systems (SVCSs) and web-scrapping technologies, an overview of Programming Cocktails and its current foundations, and the design of a tool that can automate the gathering of CICs from GitHub repositories.
Autores principais:Loureiro, João
Outros Autores:Costa Neto, Alvaro; Pereira, Maria João; Henriques, Pedro Rangel
Assunto:Software repository mining Source version control GitHub scraping Programming cocktails
Ano:2025
País:Portugal
Tipo de documento:comunicação em conferência
Tipo de acesso:acesso aberto
Instituição associada:Instituto Politécnico de Bragança
Idioma:inglês
Origem:Biblioteca Digital do IPB
_version_ 1867173417081372672
author Loureiro, João
author2 Costa Neto, Alvaro
Pereira, Maria João
Henriques, Pedro Rangel
author2_role author
author
author
author_facet Loureiro, João
Costa Neto, Alvaro
Pereira, Maria João
Henriques, Pedro Rangel
author_role author
contributor_name_str_mv Biblioteca Digital do IPB
country_str PT
creators_json_txt [{\"Person.name\":\"Loureiro, João\"},{\"Person.name\":\"Costa Neto, Alvaro\"},{\"Person.name\":\"Pereira, Maria João\",\"Person.identifier.orcid\":\"0000-0001-6323-0071\"},{\"Person.name\":\"Henriques, Pedro Rangel\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Biblioteca Digital do IPB
datacite.creators.creator.creatorName.fl_str_mv Loureiro, João
Costa Neto, Alvaro
Pereira, Maria João
Henriques, Pedro Rangel
datacite.date.Accepted.fl_str_mv 2025-01-01T00:00:00Z
datacite.date.available.fl_str_mv 2026-03-18T15:04:12Z
datacite.date.embargoed.fl_str_mv 2026-03-18T15:04:12Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Software repository mining
Source version control
GitHub scraping
Programming cocktails
datacite.titles.title.fl_str_mv Mining github software repositories to look for programming language cocktails
dc.contributor.none.fl_str_mv Biblioteca Digital do IPB
dc.creator.none.fl_str_mv Loureiro, João
Costa Neto, Alvaro
Pereira, Maria João
Henriques, Pedro Rangel
dc.date.Accepted.fl_str_mv 2025-01-01T00:00:00Z
dc.date.available.fl_str_mv 2026-03-18T15:04:12Z
dc.date.embargoed.fl_str_mv 2026-03-18T15:04:12Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv http://hdl.handle.net/10198/36135
dc.language.none.fl_str_mv eng
dc.publisher.none.fl_str_mv Schloss Dagstuhl - Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
dc.rights.cclincense.fl_str_mv http://creativecommons.org/licenses/by-sa/4.0/
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv Software repository mining
Source version control
GitHub scraping
Programming cocktails
dc.title.fl_str_mv Mining github software repositories to look for programming language cocktails
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_5794
description In light of specific development needs, it is common to concurrently apply different technologies to build complex applications. Given that lowering risks, costs, and other negative factors, while improving their positive counterparts is paramount to a better development environment, it becomes relevant to find out what technologies work best for each intended purpose in a project. In order to reach these findings, it is necessary to analyse and study the technologies applied in these projects and how they interconnect and relate to each other. The theory behind Programming Cocktails (meaning the set of programming technologies - Ingredients - that are used to develop complex systems) can support these analysis. However, due to the sheer amount of data that is required to construct and analyse these Cocktails, it becomes unsustainable to manually obtain them. From the desire to accelerate this process comes the need for a tool that automates the data collection and its conversion into an appropriate format for analysis. As such, the project proposed in this paper revolves around the development of a web-scraping application that can generate Cocktail Identity Cards (CIC) from source code repositories hosted on GitHub. Said CICs contain the Ingredients (programming languages, libraries and frameworks) used in the corresponding GitHub repository and follow the ontology previously established in a larger research project to model each Programming Cocktail. This paper presents a survey of current Source Version Control Systems (SVCSs) and web-scrapping technologies, an overview of Programming Cocktails and its current foundations, and the design of a tool that can automate the gathering of CICs from GitHub repositories.
dirty 0
eu_rights_str_mv openAccess
format conferencePaper
fulltext.url.fl_str_mv https://bibliotecadigital.ipb.pt/bitstreams/b6fae6fa-c288-4b3e-b36d-579d0a3a5cfc/download
funding.funder.alternateName_str_mv FCT
FCT
funding.funder.identifier_str_mv http://doi.org/10.13039/501100001871
http://doi.org/10.13039/501100001871
funding.funder.name_str_mv Fundação para a Ciência e a Tecnologia
Fundação para a Ciência e a Tecnologia
funding.identifier_str_mv UIDP/05757/2020
LA/P/0007/2020
funding.name_str_mv 6817 - DCRRNI ID
6817 - DCRRNI ID
funding_str_mv UIDP/05757/2020
info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F05757%2F2020/PT
LA/P/0007/2020
info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/LA%2FP%2F0007%2F2020/PT
id ipb_2d8d2d974bf6ebc748e3f5246d303fbe
identifier.url.fl_str_mv http://hdl.handle.net/10198/36135
instacron_str ipb
institution Instituto Politécnico de Bragança
instname_str Instituto Politécnico de Bragança
language eng
network_acronym_str ipb
network_name_str Biblioteca Digital do IPB
oai_identifier_str oai:bibliotecadigital.ipb.pt:10198/36135
organization_str_mv urn:organizationAcronym:ipb
person_str_mv Loureiro, João
Costa Neto, Alvaro
Pereira, Maria João
Pereira, Maria João
https://www.ciencia-id.pt/C912-4A49-A3B3
C912-4A49-A3B3
http://orcid.org/0000-0001-6323-0071
0000-0001-6323-0071
Henriques, Pedro Rangel
publishDate 2025
publisher.none.fl_str_mv Schloss Dagstuhl - Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
reponame_str Biblioteca Digital do IPB
repository_id_str urn:repositoryAcronym:ipb
service_str_mv urn:repositoryAcronym:ipb
spelling engSchloss Dagstuhl - Leibniz-Zentrum fur Informatik GmbH, Dagstuhl PublishingengIn light of specific development needs, it is common to concurrently apply different technologies to build complex applications. Given that lowering risks, costs, and other negative factors, while improving their positive counterparts is paramount to a better development environment, it becomes relevant to find out what technologies work best for each intended purpose in a project. In order to reach these findings, it is necessary to analyse and study the technologies applied in these projects and how they interconnect and relate to each other. The theory behind Programming Cocktails (meaning the set of programming technologies - Ingredients - that are used to develop complex systems) can support these analysis. However, due to the sheer amount of data that is required to construct and analyse these Cocktails, it becomes unsustainable to manually obtain them. From the desire to accelerate this process comes the need for a tool that automates the data collection and its conversion into an appropriate format for analysis. As such, the project proposed in this paper revolves around the development of a web-scraping application that can generate Cocktail Identity Cards (CIC) from source code repositories hosted on GitHub. Said CICs contain the Ingredients (programming languages, libraries and frameworks) used in the corresponding GitHub repository and follow the ontology previously established in a larger research project to model each Programming Cocktail. This paper presents a survey of current Source Version Control Systems (SVCSs) and web-scrapping technologies, an overview of Programming Cocktails and its current foundations, and the design of a tool that can automate the gathering of CICs from GitHub repositories.application/pdfengMining github software repositories to look for programming language cocktailsLoureiro, JoãoCosta Neto, AlvaroPersonalPereira, Maria JoãoDSpacehttp://dspace.org/items/a20ccfa6-4e84-4c25-ab0d-8d6ba196ffc2DSpacehttp://dspace.org/items/a20ccfa6-4e84-4c25-ab0d-8d6ba196ffc2PereiraMaria JoãoCiência IDhttps://www.ciencia-id.ptC912-4A49-A3B3ORCIDhttp://orcid.org0000-0001-6323-0071Researcher IDhttps://www.researcherid.comG-5999-2011Scopus Author IDhttps://www.scopus.com13907870300Henriques, Pedro RangelHostingInstitutionOrganizationalBiblioteca Digital do IPBe-mailmailto:dspace@ipb.ptdspace@ipb.ptISBNIsPartOf978-395977387-4ISSNIsPartOf2190-6807DOIIsPartOf10.4230/2025.132026-03-18T15:04:12Z20252025-01-01T00:00:00ZHandlehttp://hdl.handle.net/10198/36135http://purl.org/coar/access_right/c_abf2open accessSoftware repository miningSource version controlGitHub scrapingProgramming cocktails886376 bytesFundação para a Ciência e a TecnologiaResearch Centre in Digitalization and Intelligent Roboticsinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F05757%2F2020/PTUIDP/05757/20206817 - DCRRNI IDCrossref Funder IDhttp://doi.org/10.13039/501100001871Fundação para a Ciência e a TecnologiaAssociate Laboratory for Sustainability and Tecnology in Mountain Regionsinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/LA%2FP%2F0007%2F2020/PTLA/P/0007/20206817 - DCRRNI IDCrossref Funder IDhttp://doi.org/10.13039/501100001871other research producthttp://purl.org/coar/resource_type/c_5794conference paper2025http://creativecommons.org/licenses/by-sa/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://bibliotecadigital.ipb.pt/bitstreams/b6fae6fa-c288-4b3e-b36d-579d0a3a5cfc/download14th Symposium on Languages, Applications and Technologies, SLATE 202513513116Faro, Portugal
spellingShingle Mining github software repositories to look for programming language cocktails
Loureiro, João
Software repository mining
Source version control
GitHub scraping
Programming cocktails
status SINGLETON
subject.fl_str_mv Software repository mining
Source version control
GitHub scraping
Programming cocktails
title Mining github software repositories to look for programming language cocktails
title_full Mining github software repositories to look for programming language cocktails
title_fullStr Mining github software repositories to look for programming language cocktails
title_full_unstemmed Mining github software repositories to look for programming language cocktails
title_short Mining github software repositories to look for programming language cocktails
title_sort Mining github software repositories to look for programming language cocktails
topic Software repository mining
Source version control
GitHub scraping
Programming cocktails
topic_facet Software repository mining
Source version control
GitHub scraping
Programming cocktails
url http://hdl.handle.net/10198/36135
visible 1

Atividades financiadas

Carregando projetos financiados...