Author(s):
Bandeira, Lucas ; Consoli, Bernardo ; Vieira, Renata ; Bordini, Rafael
Date: 2023
Persistent ID: http://hdl.handle.net/10174/35773
Origin: Repositório Científico da Universidade de Évora
Subject(s): Semantic similarity; Eletronic Health Records
Description
With the growing importance of the use of information from electronic patient records in the development of machine learning models, there is also a need for a holistic understanding of those records, in particular abridging the clinical notes so that important information is used in the training process without the repetition that is commonly found in such notes. This paper presents the pre-processing of clinical notes from the BRATECA Dataset, a Brazilian tertiary care data collection, aiming at removing repeated information resulting from the interaction between healthcare providers and patients, considering assigned values of semantic similarity between sentences in clinical notes.
CEECIND/01997/2017