Publicação
Idea Engineering: Design and Implementation of a Decision Support System for Generating Research Topics
| Resumo: | The selection of research topics is one of the most challenging stages in the development of master's theses, traditionally requiring extensive manual review of literature in order to identify gaps in knowledge. In addition, the exponential growth of academic production makes this process impractical, thus demonstrating the vast need for the creation of automated systems capable of synthesizing prospective knowledge contained in large volumes of scientific documents. This research developed a topic recommendation system that combines Text Mining techniques with Generative Artificial Intelligence, enabling the automatic extraction and transformation of future research proposals into academically structured suggestions. The methodology integrates a pipeline that operates through the automated collection of ‘Future Work’ sections from documents, extracted from OpenAlex and Unpaywall databases, whose abstract includes the filtered topic (e.g., ‘Text Mining in Higher Education’), rigorous pre-processing, and subsequent application of the Latent Dirichlet Allocation algorithm to identify latent topics optimized by the Coherence Score metric, ending with a linguistic synthesis via the Gemini API controlled by Prompt Engineering. The analysis of 242 DOIs resulted in 22 final documents, identifying 8 distinct latent topics with a coherence of 0.4729 and a 99.58% reduction in vocabulary. The system enabled the generation of 3 linguistically fluid and academically appropriate proposals per each of the 8 topics, proving the feasibility of integrating unsupervised pattern discovery and advanced linguistic synthesis. It has also validated the applicability of hybrid Latent Dirichlet Allocation-Large Language Model architectures in academic guidance, offering a scalable approach to automating knowledge discovery processes. |
|---|---|
| Autores principais: | Rodrigues, Carolina Ochoa Gomes |
| Assunto: | Text Mining Topic Modelling Latent Dirichlet Allocation Large Language Models Natural Language Generation Thesis Topic Recommendations |
| Ano: | 2026 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
| Resumo: | The selection of research topics is one of the most challenging stages in the development of master's theses, traditionally requiring extensive manual review of literature in order to identify gaps in knowledge. In addition, the exponential growth of academic production makes this process impractical, thus demonstrating the vast need for the creation of automated systems capable of synthesizing prospective knowledge contained in large volumes of scientific documents. This research developed a topic recommendation system that combines Text Mining techniques with Generative Artificial Intelligence, enabling the automatic extraction and transformation of future research proposals into academically structured suggestions. The methodology integrates a pipeline that operates through the automated collection of ‘Future Work’ sections from documents, extracted from OpenAlex and Unpaywall databases, whose abstract includes the filtered topic (e.g., ‘Text Mining in Higher Education’), rigorous pre-processing, and subsequent application of the Latent Dirichlet Allocation algorithm to identify latent topics optimized by the Coherence Score metric, ending with a linguistic synthesis via the Gemini API controlled by Prompt Engineering. The analysis of 242 DOIs resulted in 22 final documents, identifying 8 distinct latent topics with a coherence of 0.4729 and a 99.58% reduction in vocabulary. The system enabled the generation of 3 linguistically fluid and academically appropriate proposals per each of the 8 topics, proving the feasibility of integrating unsupervised pattern discovery and advanced linguistic synthesis. It has also validated the applicability of hybrid Latent Dirichlet Allocation-Large Language Model architectures in academic guidance, offering a scalable approach to automating knowledge discovery processes. |
|---|