Publicação
From source code identifiers to natural language terms
| Resumo: | Program comprehension techniques often explore program identifiers, to infer knowledge about programs. The relevance of source code identifiers as one relevant source of information about programs is already established in the literature, as well as their direct impact on future comprehension tasks. Most programming languages enforce some constrains on identifiers strings (e.g., white spaces or commas are not allowed). Also, programmers often use word combinations and abbreviations, to devise strings that represent single, or multiple, domain concepts in order to increase programming linguistic efficiency (convey more semantics writing less). These strings do not always use explicit marks to distinguish the terms used (e.g., CamelCase or underscores), so techniques often referred as hard splitting are not enough. This paper introduces Lingua::IdSplitter a dictionary based algorithm for splitting and expanding strings that compose multi-term identifiers. It explores the use of general programming and abbreviations dictionaries, but also a custom dictionary automatically generated from software natural language content, prone to include application domain terms and specific abbreviations. This approach was applied to two software packages, written in C, achieving a f-measure of around 90% for correctly splitting and expanding identifiers. A comparison with current state-of-the-art approaches is also presented. |
|---|---|
| Autores principais: | Carvalho, Nuno Ramos |
| Outros Autores: | Almeida, José João; Henriques, Pedro Rangel; Pereira, Maria João |
| Assunto: | Program comprehension Natural language processing Identifier splitting |
| Ano: | 2015 |
| País: | Portugal |
| Tipo de documento: | artigo |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Instituto Politécnico de Bragança |
| Idioma: | inglês |
| Origem: | Biblioteca Digital do IPB |
Registos relacionados
article From source code identifiers to natural language terms
por: Carvalho, Nuno Alexandre Ramos
Publicado em: (2015)
por: Carvalho, Nuno Alexandre Ramos
Publicado em: (2015)
article Probabilistic synSet based concept location
por: Carvalho, Nuno Ramos
Publicado em: (2012)
por: Carvalho, Nuno Ramos
Publicado em: (2012)
article Probabilistic SynSet based concept location
por: Carvalho, Nuno Ramos
Publicado em: (2012)
por: Carvalho, Nuno Ramos
Publicado em: (2012)
article Conclave: ontology-driven measurement of semantic relatedness between source code elements and problem domain concepts
por: Carvalho, Nuno Ramos
Publicado em: (2014)
por: Carvalho, Nuno Ramos
Publicado em: (2014)
article Conclave: writing programs to understand programs
por: Carvalho, Nuno Ramos
Publicado em: (2014)
por: Carvalho, Nuno Ramos
Publicado em: (2014)
school Code Reviews for Visual Programming Languages
por: Ragusa, Giuliano Giorgio
Publicado em: (2018)
por: Ragusa, Giuliano Giorgio
Publicado em: (2018)
article Comparing general-purpose and domain-specific languages: an empirical study
por: Kosar, Tomaz
Publicado em: (2010)
por: Kosar, Tomaz
Publicado em: (2010)
article A language processing tool for program comprehension
por: Berón, Mario
Publicado em: (2006)
por: Berón, Mario
Publicado em: (2006)
article Applying program comprehension techniques to Karel robot programs
por: Oliveira, Nuno
Publicado em: (2009)
por: Oliveira, Nuno
Publicado em: (2009)
article Static and dynamic strategies to understand C programs by code annotation
por: Berón, Mario
Publicado em: (2007)
por: Berón, Mario
Publicado em: (2007)
groups Towards an evolutionary-based approach for natural language processing
por: Manzoni, Luca
Publicado em: (2020)
por: Manzoni, Luca
Publicado em: (2020)
article Code inspection approaches for program visualization
por: Cruz, Daniela
Publicado em: (2009)
por: Cruz, Daniela
Publicado em: (2009)
school Generation of business rules code from natural language
por: Gonçalves, Nuno Miguel Sousa
Publicado em: (2025)
por: Gonçalves, Nuno Miguel Sousa
Publicado em: (2025)
groups Second Workshop on Digital Humanities and Natural Language Processing
por: Trojahn, Cassia
Publicado em: (2022)
por: Trojahn, Cassia
Publicado em: (2022)
article Dicionário-aberto: a source of resources for the portuguese language processing
por: Simões, Alberto
Publicado em: (2012)
por: Simões, Alberto
Publicado em: (2012)
article GUI code tracing through direct program interaction
por: Santos, A.
Publicado em: (2014)
por: Santos, A.
Publicado em: (2014)
school Explaining software faults in source code
por: Ribeiro, Francisco José Torres
Publicado em: (2024)
por: Ribeiro, Francisco José Torres
Publicado em: (2024)
groups CEUR Proceedings of the PROPOR Workshop on Digital Humanities and Natural Language Processing
por: Vieira, Renata
Publicado em: (2021)
por: Vieira, Renata
Publicado em: (2021)
article Conclave: Writing programs to understand programs
por: Carvalho, Nuno Alexandre Ramos
Publicado em: (2014)
por: Carvalho, Nuno Alexandre Ramos
Publicado em: (2014)
groups Jask: Generation of questions about learners’ code in Java
por: Santos, A. L.
Publicado em: (2022)
por: Santos, A. L.
Publicado em: (2022)
book DEBACER: a method for slicing moderated debates
por: Ferraz, Thomas Palmeira
Publicado em: (2021)
por: Ferraz, Thomas Palmeira
Publicado em: (2021)
article A Strange Metapaper On Computing Natural Language
por: Portela, Manuel
Publicado em: (2018)
por: Portela, Manuel
Publicado em: (2018)
article Characterization and identification of programming languages
por: Alves, Júlio
Publicado em: (2023)
por: Alves, Júlio
Publicado em: (2023)
groups PTCRIS_OrgID: portuguese organisation identifiers authoritative system
por: Amante, Maria João
Publicado em: (2017)
por: Amante, Maria João
Publicado em: (2017)
article Strategies for program inspection and visualization
por: Cruz, Daniela
Publicado em: (2008)
por: Cruz, Daniela
Publicado em: (2008)
article How to interconnect operational and behavioral views of web applications
por: Fonseca, Ruben
Publicado em: (2008)
por: Fonseca, Ruben
Publicado em: (2008)
article Using natural language processing for phishing detection
por: Jonker, Richard A.A.
Publicado em: (2021)
por: Jonker, Richard A.A.
Publicado em: (2021)
article Color identifying system for color blind people
por: Neiva, Miguel
Publicado em: (2009)
por: Neiva, Miguel
Publicado em: (2009)
article The Categorization of Occupation in Identified Skeletal Collections: A Source of Bias
por: Cardoso, Francisca Alves
Publicado em: (2012)
por: Cardoso, Francisca Alves
Publicado em: (2012)
article The Relationship between Metacomprehension and Reading Comprehension in Spanish as a Second Language
por: Míguez-Álvarez, Carla
Publicado em: (2021)
por: Míguez-Álvarez, Carla
Publicado em: (2021)
school Benchmarking Large Language Models for Code Generation
por: Nogueira, Rodrigo Pato de Carvalho
Publicado em: (2025)
por: Nogueira, Rodrigo Pato de Carvalho
Publicado em: (2025)
book Natural language processing and cloud computing in disease prevention and management
por: Ferreira, Ricardo
Publicado em: (2023)
por: Ferreira, Ricardo
Publicado em: (2023)
article ALMA versus DDD
por: Cruz, Daniela
Publicado em: (2008)
por: Cruz, Daniela
Publicado em: (2008)
article Language comprehenders are sensitive to multiple states of semantically similar objects
por: Horchak, O. V.
Publicado em: (2024)
por: Horchak, O. V.
Publicado em: (2024)
article Real people or mere numbers? The influence of kill-save ratios and identifiability on moral judgements
por: Costa-Lopes, Rui
Publicado em: (2021)
por: Costa-Lopes, Rui
Publicado em: (2021)
groups Natural Language Processing applied to Food Data: a smart food description mapping system
por: Tomé, Sidney
Publicado em: (2019)
por: Tomé, Sidney
Publicado em: (2019)
article Is complex visual information implicated during language comprehension? The case of cast shadows
por: Horchak, O. V.
Publicado em: (2020)
por: Horchak, O. V.
Publicado em: (2020)
school Rich Large-Scale Portuguese Language Models from Large Portuguese Corpora
por: Lopes, Ricardo Valverde
Publicado em: (2023)
por: Lopes, Ricardo Valverde
Publicado em: (2023)
article Investigating object orientation effects across 18 languages
por: Chen, S.-C.
Publicado em: (2025)
por: Chen, S.-C.
Publicado em: (2025)
school Identifying Deception in Online Reviews: Application of Machine Learning, Deep Learning and Natural Language Processing
por: Roy, Bhupendra
Publicado em: (2020)
por: Roy, Bhupendra
Publicado em: (2020)
Atividades financiadas
progress_activity Carregando projetos financiados...
Registos relacionados
-
article From source code identifiers to natural language terms
por: Carvalho, Nuno Alexandre Ramos
Publicado em: (2015) -
article Probabilistic synSet based concept location
por: Carvalho, Nuno Ramos
Publicado em: (2012) -
article Probabilistic SynSet based concept location
por: Carvalho, Nuno Ramos
Publicado em: (2012) -
article Conclave: ontology-driven measurement of semantic relatedness between source code elements and problem domain concepts
por: Carvalho, Nuno Ramos
Publicado em: (2014) -
article Conclave: writing programs to understand programs
por: Carvalho, Nuno Ramos
Publicado em: (2014)