Publicação
Mining github software repositories to look for programming language cocktails
| Resumo: | In light of specific development needs, it is common to concurrently apply different technologies to build complex applications. Given that lowering risks, costs, and other negative factors, while improving their positive counterparts is paramount to a better development environment, it becomes relevant to find out what technologies work best for each intended purpose in a project. In order to reach these findings, it is necessary to analyse and study the technologies applied in these projects and how they interconnect and relate to each other. The theory behind Programming Cocktails (meaning the set of programming technologies - Ingredients - that are used to develop complex systems) can support these analysis. However, due to the sheer amount of data that is required to construct and analyse these Cocktails, it becomes unsustainable to manually obtain them. From the desire to accelerate this process comes the need for a tool that automates the data collection and its conversion into an appropriate format for analysis. As such, the project proposed in this paper revolves around the development of a web-scraping application that can generate Cocktail Identity Cards (CIC) from source code repositories hosted on GitHub. Said CICs contain the Ingredients (programming languages, libraries and frameworks) used in the corresponding GitHub repository and follow the ontology previously established in a larger research project to model each Programming Cocktail. This paper presents a survey of current Source Version Control Systems (SVCSs) and web-scrapping technologies, an overview of Programming Cocktails and its current foundations, and the design of a tool that can automate the gathering of CICs from GitHub repositories. |
|---|---|
| Autores principais: | Loureiro, João |
| Outros Autores: | Costa Neto, Alvaro; Pereira, Maria João; Henriques, Pedro Rangel |
| Assunto: | Software repository mining Source version control GitHub scraping Programming cocktails |
| Ano: | 2025 |
| País: | Portugal |
| Tipo de documento: | comunicação em conferência |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Instituto Politécnico de Bragança |
| Idioma: | inglês |
| Origem: | Biblioteca Digital do IPB |
| _version_ | 1867173417081372672 |
|---|---|
| author | Loureiro, João |
| author2 | Costa Neto, Alvaro Pereira, Maria João Henriques, Pedro Rangel |
| author2_role | author author author |
| author_facet | Loureiro, João Costa Neto, Alvaro Pereira, Maria João Henriques, Pedro Rangel |
| author_role | author |
| contributor_name_str_mv | Biblioteca Digital do IPB |
| country_str | PT |
| creators_json_txt | [{\"Person.name\":\"Loureiro, João\"},{\"Person.name\":\"Costa Neto, Alvaro\"},{\"Person.name\":\"Pereira, Maria João\",\"Person.identifier.orcid\":\"0000-0001-6323-0071\"},{\"Person.name\":\"Henriques, Pedro Rangel\"}] |
| datacite.contributors.contributor.contributorName.fl_str_mv | Biblioteca Digital do IPB |
| datacite.creators.creator.creatorName.fl_str_mv | Loureiro, João Costa Neto, Alvaro Pereira, Maria João Henriques, Pedro Rangel |
| datacite.date.Accepted.fl_str_mv | 2025-01-01T00:00:00Z |
| datacite.date.available.fl_str_mv | 2026-03-18T15:04:12Z |
| datacite.date.embargoed.fl_str_mv | 2026-03-18T15:04:12Z |
| datacite.rights.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| datacite.subjects.subject.fl_str_mv | Software repository mining Source version control GitHub scraping Programming cocktails |
| datacite.titles.title.fl_str_mv | Mining github software repositories to look for programming language cocktails |
| dc.contributor.none.fl_str_mv | Biblioteca Digital do IPB |
| dc.creator.none.fl_str_mv | Loureiro, João Costa Neto, Alvaro Pereira, Maria João Henriques, Pedro Rangel |
| dc.date.Accepted.fl_str_mv | 2025-01-01T00:00:00Z |
| dc.date.available.fl_str_mv | 2026-03-18T15:04:12Z |
| dc.date.embargoed.fl_str_mv | 2026-03-18T15:04:12Z |
| dc.format.none.fl_str_mv | application/pdf |
| dc.identifier.none.fl_str_mv | http://hdl.handle.net/10198/36135 |
| dc.language.none.fl_str_mv | eng |
| dc.publisher.none.fl_str_mv | Schloss Dagstuhl - Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |
| dc.rights.cclincense.fl_str_mv | http://creativecommons.org/licenses/by-sa/4.0/ |
| dc.rights.none.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| dc.subject.none.fl_str_mv | Software repository mining Source version control GitHub scraping Programming cocktails |
| dc.title.fl_str_mv | Mining github software repositories to look for programming language cocktails |
| dc.type.none.fl_str_mv | http://purl.org/coar/resource_type/c_5794 |
| description | In light of specific development needs, it is common to concurrently apply different technologies to build complex applications. Given that lowering risks, costs, and other negative factors, while improving their positive counterparts is paramount to a better development environment, it becomes relevant to find out what technologies work best for each intended purpose in a project. In order to reach these findings, it is necessary to analyse and study the technologies applied in these projects and how they interconnect and relate to each other. The theory behind Programming Cocktails (meaning the set of programming technologies - Ingredients - that are used to develop complex systems) can support these analysis. However, due to the sheer amount of data that is required to construct and analyse these Cocktails, it becomes unsustainable to manually obtain them. From the desire to accelerate this process comes the need for a tool that automates the data collection and its conversion into an appropriate format for analysis. As such, the project proposed in this paper revolves around the development of a web-scraping application that can generate Cocktail Identity Cards (CIC) from source code repositories hosted on GitHub. Said CICs contain the Ingredients (programming languages, libraries and frameworks) used in the corresponding GitHub repository and follow the ontology previously established in a larger research project to model each Programming Cocktail. This paper presents a survey of current Source Version Control Systems (SVCSs) and web-scrapping technologies, an overview of Programming Cocktails and its current foundations, and the design of a tool that can automate the gathering of CICs from GitHub repositories. |
| dirty | 0 |
| eu_rights_str_mv | openAccess |
| format | conferencePaper |
| fulltext.url.fl_str_mv | https://bibliotecadigital.ipb.pt/bitstreams/b6fae6fa-c288-4b3e-b36d-579d0a3a5cfc/download |
| funding.funder.alternateName_str_mv | FCT FCT |
| funding.funder.identifier_str_mv | http://doi.org/10.13039/501100001871 http://doi.org/10.13039/501100001871 |
| funding.funder.name_str_mv | Fundação para a Ciência e a Tecnologia Fundação para a Ciência e a Tecnologia |
| funding.identifier_str_mv | UIDP/05757/2020 LA/P/0007/2020 |
| funding.name_str_mv | 6817 - DCRRNI ID 6817 - DCRRNI ID |
| funding_str_mv | UIDP/05757/2020 info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F05757%2F2020/PT LA/P/0007/2020 info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/LA%2FP%2F0007%2F2020/PT |
| id | ipb_2d8d2d974bf6ebc748e3f5246d303fbe |
| identifier.url.fl_str_mv | http://hdl.handle.net/10198/36135 |
| instacron_str | ipb |
| institution | Instituto Politécnico de Bragança |
| instname_str | Instituto Politécnico de Bragança |
| language | eng |
| network_acronym_str | ipb |
| network_name_str | Biblioteca Digital do IPB |
| oai_identifier_str | oai:bibliotecadigital.ipb.pt:10198/36135 |
| organization_str_mv | urn:organizationAcronym:ipb |
| person_str_mv | Loureiro, João Costa Neto, Alvaro Pereira, Maria João Pereira, Maria João https://www.ciencia-id.pt/C912-4A49-A3B3 C912-4A49-A3B3 http://orcid.org/0000-0001-6323-0071 0000-0001-6323-0071 Henriques, Pedro Rangel |
| publishDate | 2025 |
| publisher.none.fl_str_mv | Schloss Dagstuhl - Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing |
| reponame_str | Biblioteca Digital do IPB |
| repository_id_str | urn:repositoryAcronym:ipb |
| service_str_mv | urn:repositoryAcronym:ipb |
| spelling | engSchloss Dagstuhl - Leibniz-Zentrum fur Informatik GmbH, Dagstuhl PublishingengIn light of specific development needs, it is common to concurrently apply different technologies to build complex applications. Given that lowering risks, costs, and other negative factors, while improving their positive counterparts is paramount to a better development environment, it becomes relevant to find out what technologies work best for each intended purpose in a project. In order to reach these findings, it is necessary to analyse and study the technologies applied in these projects and how they interconnect and relate to each other. The theory behind Programming Cocktails (meaning the set of programming technologies - Ingredients - that are used to develop complex systems) can support these analysis. However, due to the sheer amount of data that is required to construct and analyse these Cocktails, it becomes unsustainable to manually obtain them. From the desire to accelerate this process comes the need for a tool that automates the data collection and its conversion into an appropriate format for analysis. As such, the project proposed in this paper revolves around the development of a web-scraping application that can generate Cocktail Identity Cards (CIC) from source code repositories hosted on GitHub. Said CICs contain the Ingredients (programming languages, libraries and frameworks) used in the corresponding GitHub repository and follow the ontology previously established in a larger research project to model each Programming Cocktail. This paper presents a survey of current Source Version Control Systems (SVCSs) and web-scrapping technologies, an overview of Programming Cocktails and its current foundations, and the design of a tool that can automate the gathering of CICs from GitHub repositories.application/pdfengMining github software repositories to look for programming language cocktailsLoureiro, JoãoCosta Neto, AlvaroPersonalPereira, Maria JoãoDSpacehttp://dspace.org/items/a20ccfa6-4e84-4c25-ab0d-8d6ba196ffc2DSpacehttp://dspace.org/items/a20ccfa6-4e84-4c25-ab0d-8d6ba196ffc2PereiraMaria JoãoCiência IDhttps://www.ciencia-id.ptC912-4A49-A3B3ORCIDhttp://orcid.org0000-0001-6323-0071Researcher IDhttps://www.researcherid.comG-5999-2011Scopus Author IDhttps://www.scopus.com13907870300Henriques, Pedro RangelHostingInstitutionOrganizationalBiblioteca Digital do IPBe-mailmailto:dspace@ipb.ptdspace@ipb.ptISBNIsPartOf978-395977387-4ISSNIsPartOf2190-6807DOIIsPartOf10.4230/2025.132026-03-18T15:04:12Z20252025-01-01T00:00:00ZHandlehttp://hdl.handle.net/10198/36135http://purl.org/coar/access_right/c_abf2open accessSoftware repository miningSource version controlGitHub scrapingProgramming cocktails886376 bytesFundação para a Ciência e a TecnologiaResearch Centre in Digitalization and Intelligent Roboticsinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F05757%2F2020/PTUIDP/05757/20206817 - DCRRNI IDCrossref Funder IDhttp://doi.org/10.13039/501100001871Fundação para a Ciência e a TecnologiaAssociate Laboratory for Sustainability and Tecnology in Mountain Regionsinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/LA%2FP%2F0007%2F2020/PTLA/P/0007/20206817 - DCRRNI IDCrossref Funder IDhttp://doi.org/10.13039/501100001871other research producthttp://purl.org/coar/resource_type/c_5794conference paper2025http://creativecommons.org/licenses/by-sa/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://bibliotecadigital.ipb.pt/bitstreams/b6fae6fa-c288-4b3e-b36d-579d0a3a5cfc/download14th Symposium on Languages, Applications and Technologies, SLATE 202513513116Faro, Portugal |
| spellingShingle | Mining github software repositories to look for programming language cocktails Loureiro, João Software repository mining Source version control GitHub scraping Programming cocktails |
| status | SINGLETON |
| subject.fl_str_mv | Software repository mining Source version control GitHub scraping Programming cocktails |
| title | Mining github software repositories to look for programming language cocktails |
| title_full | Mining github software repositories to look for programming language cocktails |
| title_fullStr | Mining github software repositories to look for programming language cocktails |
| title_full_unstemmed | Mining github software repositories to look for programming language cocktails |
| title_short | Mining github software repositories to look for programming language cocktails |
| title_sort | Mining github software repositories to look for programming language cocktails |
| topic | Software repository mining Source version control GitHub scraping Programming cocktails |
| topic_facet | Software repository mining Source version control GitHub scraping Programming cocktails |
| url | http://hdl.handle.net/10198/36135 |
| visible | 1 |
Atividades financiadas
Carregando projetos financiados...