Publicação

Idea Engineering: Design and Implementation of a Decision Support System for Generating Research Topics

Ver documento

Detalhes bibliográficos
Resumo:The selection of research topics is one of the most challenging stages in the development of master's theses, traditionally requiring extensive manual review of literature in order to identify gaps in knowledge. In addition, the exponential growth of academic production makes this process impractical, thus demonstrating the vast need for the creation of automated systems capable of synthesizing prospective knowledge contained in large volumes of scientific documents. This research developed a topic recommendation system that combines Text Mining techniques with Generative Artificial Intelligence, enabling the automatic extraction and transformation of future research proposals into academically structured suggestions. The methodology integrates a pipeline that operates through the automated collection of ‘Future Work’ sections from documents, extracted from OpenAlex and Unpaywall databases, whose abstract includes the filtered topic (e.g., ‘Text Mining in Higher Education’), rigorous pre-processing, and subsequent application of the Latent Dirichlet Allocation algorithm to identify latent topics optimized by the Coherence Score metric, ending with a linguistic synthesis via the Gemini API controlled by Prompt Engineering. The analysis of 242 DOIs resulted in 22 final documents, identifying 8 distinct latent topics with a coherence of 0.4729 and a 99.58% reduction in vocabulary. The system enabled the generation of 3 linguistically fluid and academically appropriate proposals per each of the 8 topics, proving the feasibility of integrating unsupervised pattern discovery and advanced linguistic synthesis. It has also validated the applicability of hybrid Latent Dirichlet Allocation-Large Language Model architectures in academic guidance, offering a scalable approach to automating knowledge discovery processes.
Autores principais:Rodrigues, Carolina Ochoa Gomes
Assunto:Text Mining Topic Modelling Latent Dirichlet Allocation Large Language Models Natural Language Generation Thesis Topic Recommendations
Ano:2026
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade Nova de Lisboa
Idioma:inglês
Origem:Repositório Institucional da UNL
Descrição
Resumo:The selection of research topics is one of the most challenging stages in the development of master's theses, traditionally requiring extensive manual review of literature in order to identify gaps in knowledge. In addition, the exponential growth of academic production makes this process impractical, thus demonstrating the vast need for the creation of automated systems capable of synthesizing prospective knowledge contained in large volumes of scientific documents. This research developed a topic recommendation system that combines Text Mining techniques with Generative Artificial Intelligence, enabling the automatic extraction and transformation of future research proposals into academically structured suggestions. The methodology integrates a pipeline that operates through the automated collection of ‘Future Work’ sections from documents, extracted from OpenAlex and Unpaywall databases, whose abstract includes the filtered topic (e.g., ‘Text Mining in Higher Education’), rigorous pre-processing, and subsequent application of the Latent Dirichlet Allocation algorithm to identify latent topics optimized by the Coherence Score metric, ending with a linguistic synthesis via the Gemini API controlled by Prompt Engineering. The analysis of 242 DOIs resulted in 22 final documents, identifying 8 distinct latent topics with a coherence of 0.4729 and a 99.58% reduction in vocabulary. The system enabled the generation of 3 linguistically fluid and academically appropriate proposals per each of the 8 topics, proving the feasibility of integrating unsupervised pattern discovery and advanced linguistic synthesis. It has also validated the applicability of hybrid Latent Dirichlet Allocation-Large Language Model architectures in academic guidance, offering a scalable approach to automating knowledge discovery processes.