The modal functions performed by some discourse markers have been the subject of analysis for several languages. This paper analyzes the different values conveyed by the Portuguese discourse marker claro, and takes a contrastive perspective with French, based on monolingual and parallel, written and spoken, corpora, in order to identify discourse markers that are functional equivalents of claro in different con...
This paper presents the PORTULAN CLARIN Research Infrastructure for the Science and Technology of Language, which is part of the European research infrastructure CLARIN ERIC as its Portuguese national node, and belongs to the Portuguese National Roadmap of Research Infrastructures of Strategic Relevance. The PORTULAN CLARIN includes a helpdesk, a repository, where resources, such as corpora, lexicons and proces...
The automatic diagnosis and analysis of the production of foreign language learners can contribute to overcome linguistic barriers that hinder the integration of migrant populations. The richness and complexity of the phenomena observed in this context and the multiplicity of objectives served by automatic analysis tools demonstrate the inevitability of manual annotation of data and the importance of producing ...
Question-answer pairs are typically associated to spoken discourse and to directive speech acts, although they are also found in written texts. We analyse contexts extracted from the CRPC-DB, a written subcorpus annotated with discourse relations in the PDTB-style. We focus on the nature of the question and of the answer in interactional contexts, but also in contexts where a single locutor poses the question a...
This paper describes the Portuguese core- ference corpus Corref-PT, annotated semi-automatically using the coreference annotation tool CORP, and manually revised with the editing tool CorrefVisual. It includes a total of 182 texts, mostly news (corpus CSTNews, corpus LE-PAROLE, FAPESP magazine) but also articles from Wikipedia. The result is a corpus that includes a total of 3898 reference chains. We present th...
This work presents a comparative study between two different approaches to build an automatic classification system for Modality values in the Portuguese language. One approach uses a single multi-class classifier with the full dataset that includes eleven modal verbs; the other builds different classifiers, one for each verb. The performance is measured using precision, recall and F 1 . Due to the unbalanced n...
We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough clas...
We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough clas...