The definition of rigorous and well-structured annotation schemes is a key element in the advancement of Natural Language Processing (NLP). This paper aims to compare the perfor- mance of a general-purpose annotation scheme - Text2Story, based on the ISO 24617-1 stan- dard - with that of a domain-specific scheme - i2b2 - in the context of clinical narrative annotation; and to assess the feasibility of har- moni...
High-quality annotation is essential for the ef- fective predictions of machine learning mod- els. When annotations are dense, achieving accurate human labeling can be challenging since the most used annotation tools present an overloaded visualization of labels. Thus, we present Vitra (Visualizer of temporal relation annotations), a tool designed for viewing anno- tations made in corpora, specifically focusing...
We present PolyNarrative, a new multilingual dataset of news articles, annotated for narra- tives. Narratives are overt or implicit claims, recurring across articles and languages, promot- ing a specific interpretation or viewpoint on an ongoing topic, often propagating mis/disinfor- mation. We developed two-level taxonomies with coarse- and fine-grained narrative labels for two domains: (i) climate change and ...
We present an annotation scheme designed to capture information related to the maintenance or change in the price of some goods (fuels, wa- ter, and vehicles) in news articles in Portuguese. The methodology we used involved adapting an existing annotation scheme, the Text2Story scheme (Silvano et al., 2021; Leal et al., 2022), which is based on different parts of ISO 24617 to capture the essential information f...
The relationship of a patient with a hospital from admission to discharge is often kept in a series of textual documents that describe the patient's journey. These documents are important to analyze the di"erent steps of the clinical process and to make aggregated studies of the paths of patients in the hospital. In this paper, we explore the potential of Large Language Models (LLMs) to generate realistic and c...
Manual text annotation is a complex and time-consuming task. However, recent advancements demonstrate that such a task can be accelerated with automated pre-annotation. In this paper, we present a methodology to improve the efficiency of manual text annotation by leveraging LLMs for text pre-annotation. For this purpose, we train a BERT model for a token classification task and integrate it into the INCEpTION a...
We introduce SemEval-2025 Task 10 on Multilingual Characterization and Extraction of Narratives from Online News, which focuses on the identification and analysis of narratives in online news media. The task is structured into three subtasks: (1) Entity Framing, to identify the roles that relevant entities play within narratives, (2) Narrative Classification, to assign documents fine-grained narratives accordin...
The development of a robust annotation scheme and corresponding guidelines is crucial for pro- ducing annotated datasets that advance both lin- guistic and computational research. This paper presents a case study that outlines a method- ology for designing an annotation scheme and its guidelines, specifically aimed at represent- ing morphosyntactic and semantic information regarding temporal features, as well a...