Publicação
BIOMEDICAL DOCUMENT RETRIEVAL FOR DATABASE CURATION
| Resumo: | This dissertation explores state-of-the-art deep learning models for document retrieval in biomedical research, using the Exposome-Explorer database as a case study, which contains manually curated entries on biomarkers of exposure to environmental risk factors for various diseases. Previous works have employed simple machine learning algorithms to reduce expert workload by enhancing the accuracy and efficiency of document retrieval. In this dissertation traditional document retrieval methods, such as BM25, are evaluated alongside transformer models like MonoBERT, DistilBERT, and PubMedBERT, to assess their suitability for the task. Results demonstrate that PubMedBERT, pre-trained on biomedical text, offers the best performance in retrieving relevant documents, with BM25 contributing significantly to initial dataset refinement. However, challenges such as curated data variability and variability in precision and recall persist, particularly with smaller datasets for which fewer training examples are available like pollutant biomarkers. This research represents a step forward in automating and refining the curation of biomedical databases, ensuring faster and more reliable results. Future work will involve applying the trained models to the latest version of the Exposome-Explorer database and enhancing BM25 with RM3 query expansion for improved document ranking. Additional optimization of the models will be explored to address performance variability and improve overall retrieval accuracy across different biomarker datasets. |
|---|---|
| Autores principais: | Ramos, Diogo Luís Embaixador |
| Assunto: | DEEP LEARNING DOCUMENT RETRIEVAL DATABASE CURATION BIOMEDICAL LITERATURE INFORMATION RETRIEVAL |
| Ano: | 2024 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
Registos relacionados
article Biomedical text mining applied to document retrieval and semantic indexing
por: Lourenço, Anália
Publicado em: (2009)
por: Lourenço, Anália
Publicado em: (2009)
article BioDR: semantic indexing networks for biomedical document retrieval
por: Lourenço, Anália
Publicado em: (2010)
por: Lourenço, Anália
Publicado em: (2010)
school Neural information retrieval for biomedical question-answering
por: Almeida, Tiago Alexandre Melo
Publicado em: (2019)
por: Almeida, Tiago Alexandre Melo
Publicado em: (2019)
school Dense and hybrid models for information retrieval
por: Frias, José André Lopes
Publicado em: (2022)
por: Frias, José André Lopes
Publicado em: (2022)
article Development of an information retrieval tool for biomedical patents
por: Alves, T.
Publicado em: (2018)
por: Alves, T.
Publicado em: (2018)
article @Note: a workbench for biomedical text mining
por: Lourenço, Anália
Publicado em: (2009)
por: Lourenço, Anália
Publicado em: (2009)
school Lifelog and information retrieval from daily digital data
por: Ribeiro, Ricardo Ferreira
Publicado em: (2024)
por: Ribeiro, Ricardo Ferreira
Publicado em: (2024)
groups Document retrieval for question answering : a quantitative evaluation of text preprocessing
por: Carvalho, Gracinda
Publicado em: (2007)
por: Carvalho, Gracinda
Publicado em: (2007)
school Retrieval-Augmented Generation for Biomedical Protocols: Optimising Knowledge Retrieval to Support Healthcare Technicians in Anatomical Pathology Workflows
por: Pires, Diogo Filipe Moreira
Publicado em: (2025)
por: Pires, Diogo Filipe Moreira
Publicado em: (2025)
article LEAFDATA: a literature-curated database for Arabidopsis leaf development
por: Szakonyi, Dóra
Publicado em: (2016)
por: Szakonyi, Dóra
Publicado em: (2016)
book Proceedings of the 9th International Workshop on Information Retrieval on Current Research Information Systems
por: Tenreiro De Magalhaes, Sérgio
Publicado em: (2006)
por: Tenreiro De Magalhaes, Sérgio
Publicado em: (2006)
article Towards a segment-based temporal information retrieval model
por: Craveiro, Olga
Publicado em: (2010)
por: Craveiro, Olga
Publicado em: (2010)
school Optimizing document reranking in a retrieval-augmented generation pipeline for Portuguese legal research
por: Wollny, Carolyn Svea
Publicado em: (2025)
por: Wollny, Carolyn Svea
Publicado em: (2025)
article TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval
por: Li, Qiqi
Publicado em: (2023)
por: Li, Qiqi
Publicado em: (2023)
groups Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database
por: Quaresma, Andreia
Publicado em: (2022)
por: Quaresma, Andreia
Publicado em: (2022)
image Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database
por: Quaresma, Andreia
Publicado em: (2022)
por: Quaresma, Andreia
Publicado em: (2022)
article A methodology to create ontology-based information retrieval systems
por: Saias, José
Publicado em: (2012)
por: Saias, José
Publicado em: (2012)
article Semantic enrichment of a web legal information retrieval system
por: Saias, José
Publicado em: (2012)
por: Saias, José
Publicado em: (2012)
groups Using Geographic Signatures as Query and Document Scopes in Geographic IR
por: Cardoso, Nuno
Publicado em: (2012)
por: Cardoso, Nuno
Publicado em: (2012)
article Development of text mining tools for information retrieval from patents
por: Alves, T.
Publicado em: (2017)
por: Alves, T.
Publicado em: (2017)
school Biomedical information extraction for matching patients to clinical trials
por: Araújo, Gonçalo Carmo de
Publicado em: (2018)
por: Araújo, Gonçalo Carmo de
Publicado em: (2018)
school Enhancing breast cancer diagnosis : a mammogram retrieval system and ground truth application
por: Roriz, Cátia Inês Melo
Publicado em: (2024)
por: Roriz, Cátia Inês Melo
Publicado em: (2024)
book A Question-Answering System for Legal Information Retrieval
por: Quaresma, Paulo
Publicado em: (2012)
por: Quaresma, Paulo
Publicado em: (2012)
book Verification of Uncurated Protein Annotations
por: Rebholz-Schuhmann, Dietrich
Publicado em: (2009)
por: Rebholz-Schuhmann, Dietrich
Publicado em: (2009)
article Satisfying Information Needs on the Web: a Survey of Web Information Retrieval
por: Escudeiro,Nuno Filipe
Publicado em: (2008)
por: Escudeiro,Nuno Filipe
Publicado em: (2008)
school Enhancing the Efficiency of Diffusion Models: A Retrieval-Based Approach
por: Kutsenko, Anton
Publicado em: (2025)
por: Kutsenko, Anton
Publicado em: (2025)
article A Comparison Study of Deep Learning Methodologies for Music Emotion Recognition
por: Louro, Pedro
Publicado em: (2024)
por: Louro, Pedro
Publicado em: (2024)
article A Comparison Study of Deep Learning Methodologies for Music Emotion Recognition
por: Louro, Pedro Lima
Publicado em: (2024)
por: Louro, Pedro Lima
Publicado em: (2024)
assignment NewsSearch: An Architecture for Information Retrieval of Online News
por: Maria, Nuno
Publicado em: (1999)
por: Maria, Nuno
Publicado em: (1999)
article Lifelog retrieval from daily digital data: narrative review
por: Ribeiro, Ricardo
Publicado em: (2022)
por: Ribeiro, Ricardo
Publicado em: (2022)
groups A systematic literature review on LLM-based information retrieval: The issue of contents classification
por: Cosme, D.
Publicado em: (2024)
por: Cosme, D.
Publicado em: (2024)
school Attribute Selection for Unsupervised and Language Independent Classification of Documents
por: Fazenda, Gonçalo Abrantes
Publicado em: (2022)
por: Fazenda, Gonçalo Abrantes
Publicado em: (2022)
article A question-answering system for Portuguese juridical documents
por: Quaresma, Paulo
Publicado em: (2012)
por: Quaresma, Paulo
Publicado em: (2012)
article Mammogram retrieval system: Aggregating image classifiers for enhanced breast cancer diagnosis
por: Roriz, Cátia
Publicado em: (2024)
por: Roriz, Cátia
Publicado em: (2024)
school Context-based retrieval in software development
por: Antunes, Bruno Emanuel Machado
Publicado em: (2013)
por: Antunes, Bruno Emanuel Machado
Publicado em: (2013)
article Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome
por: Pérez-Pérez, Martín
Publicado em: (2022)
por: Pérez-Pérez, Martín
Publicado em: (2022)
groups MERGE App: A Prototype Software for Multi-User Emotion-Aware Music Management
por: Louro, Pedro
Publicado em: (2024)
por: Louro, Pedro
Publicado em: (2024)
groups MERGE App: A Prototype Software for Multi-User Emotion-Aware Music Management
por: Louro, Pedro Lima
Publicado em: (2024)
por: Louro, Pedro Lima
Publicado em: (2024)
assignment A Language Modeling Approach for the Classification of Audio Music
por: Marques, Gonçalo
Publicado em: (2009)
por: Marques, Gonçalo
Publicado em: (2009)
school Discovery and retrieval of Geographic data using Google
por: Abargues Casanova, Carlos
Publicado em: (2009)
por: Abargues Casanova, Carlos
Publicado em: (2009)
Registos relacionados
-
article Biomedical text mining applied to document retrieval and semantic indexing
por: Lourenço, Anália
Publicado em: (2009) -
article BioDR: semantic indexing networks for biomedical document retrieval
por: Lourenço, Anália
Publicado em: (2010) -
school Neural information retrieval for biomedical question-answering
por: Almeida, Tiago Alexandre Melo
Publicado em: (2019) -
school Dense and hybrid models for information retrieval
por: Frias, José André Lopes
Publicado em: (2022) -
article Development of an information retrieval tool for biomedical patents
por: Alves, T.
Publicado em: (2018)