Publicação
Shallow Processing of Portuguese: From Sentence Chunking to Nominal Lemmatization
| Resumo: | This dissertation proposes a set of procedures for the computational processing of Portuguese. Five tasks are covered: Sentence Segmentation, Tokenization, Part-of-Speech Tagging, Nominal Featurization and Nominal Lemmatization. These are some of the initial steps producing linguistic information Ñ such as POS categories or lemmas Ñ that is important to most subsequent processing (e.g. syntactic and semantic analysis). I follow a shallow processing approach, where linguistic information is associated to text based on local information (i.e. using the word itself or perhaps a limited window of context containing just a few words). I begin by identifying and describing the key problems raised by each task, with special focus on the problems that are speci?c to Portuguese. After an overview of existing approaches and tools, I describe the solutions I followed to the issues raised previously. I then report on my implementation of these solutions, which are found either to yield state-of-the-art performance or, in some cases, to advance the state-of-the-art. The major result of this dissertation is thus threefold: A description of the problems found in NLP of Portuguese, a set of algorithms and the corresponding tools to tackle those problems, together with their evaluation results |
|---|---|
| Autores principais: | Silva, João |
| Assunto: | Natural language processing Shallow processing Sentence segmentation, Tokenization Morphosyntatcic annotation Morphological analysis Lemmatizati |
| Ano: | 2007 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso restrito |
| Instituição associada: | Universidade de Lisboa |
| Idioma: | português |
| Origem: | Repositório da Universidade de Lisboa |
Registos relacionados
school Shallow processing of portuguese: from sentence chunking to nominal lemmatization
por: Silva, João Ricardo Martins Ferreira da
Publicado em: (2007)
por: Silva, João Ricardo Martins Ferreira da
Publicado em: (2007)
assignment Tokenization of Portuguese: resolving the hard cases
por: Branco, António Horta
Publicado em: (2003)
por: Branco, António Horta
Publicado em: (2003)
school Developing reliability metrics and validation tools for datasets with deep linguistic information
por: Castro, Sérgio Ricardo de
Publicado em: (2011)
por: Castro, Sérgio Ricardo de
Publicado em: (2011)
article Words matter: Judges’ value judgments in sentence pronouncements remarks.
por: Castro Rodrigues, Andreia de
Publicado em: (2023)
por: Castro Rodrigues, Andreia de
Publicado em: (2023)
groups Susceptibility assessment of shallow slides failure and run-out
por: Melo, Raquel
Publicado em: (2019)
por: Melo, Raquel
Publicado em: (2019)
groups Susceptibility assessment of shallow slides failure and run-out
por: Melo, Raquel
Publicado em: (2019)
por: Melo, Raquel
Publicado em: (2019)
article Identifying and characterizing concepts in unstructured texts using automatic annotation
por: Fraga, Tiago
Publicado em: (2022)
por: Fraga, Tiago
Publicado em: (2022)
draft Legislative politics and sentencing policymaking : statutory Severity in the U.S. and western europe
por: Mendes, Silvia M.
Publicado em: (2006)
por: Mendes, Silvia M.
Publicado em: (2006)
article A finite volume scheme for the shallow-water system with the polynomial reconstruction
por: Clain, Stéphane
Publicado em: (2012)
por: Clain, Stéphane
Publicado em: (2012)
category Control and minimization of crud in a shallow layer gravity settler
por: Ribeiro, M. M. M.
Publicado em: (2008)
por: Ribeiro, M. M. M.
Publicado em: (2008)
article Prison sentences: last resort or the default sanction?
por: Castro-Rodrigues, Andreia
Publicado em: (2019)
por: Castro-Rodrigues, Andreia
Publicado em: (2019)
article Sentence repetition task for European Portuguese: results from a study with monolingual and Portuguese-German bilingual children
por: Correia, Liliana
Publicado em: (2024)
por: Correia, Liliana
Publicado em: (2024)
article Assessment of physical vulnerability and potential losses of buildings due to shallow slides
por: Silva, M.
Publicado em: (2014)
por: Silva, M.
Publicado em: (2014)
article Landslide susceptibility assessment at the basin scale for rainfall -and earthquake- triggered shallow slides
por: Gordo, Cristina
Publicado em: (2019)
por: Gordo, Cristina
Publicado em: (2019)
groups Subject-Verb inversion in declarative-exclamative sentences in the Portuguese language
por: Valadas, Rita
Publicado em: (2014)
por: Valadas, Rita
Publicado em: (2014)
article Dicionário-aberto: a source of resources for the portuguese language processing
por: Simões, Alberto
Publicado em: (2012)
por: Simões, Alberto
Publicado em: (2012)
article A well-balanced scheme for the shallow-water equations with topography or Manning friction
por: Michel-Dansac, V.
Publicado em: (2017)
por: Michel-Dansac, V.
Publicado em: (2017)
article Cognitive load eliminates the effect of perceptual information on judgments of learning with sentences
por: Luna, Karlos
Publicado em: (2019)
por: Luna, Karlos
Publicado em: (2019)
article It Is the time for Portuguese texts!
por: Craveiro, Olga
Publicado em: (2012)
por: Craveiro, Olga
Publicado em: (2012)
article Technical note: assessing predictive capacity and conditional independence of landslide predisposing factors for shallow landslide susceptibility models
por: Pereira, Susana
Publicado em: (2012)
por: Pereira, Susana
Publicado em: (2012)
article Error annotation in the COPLE2 corpus
por: del Río, Iria
Publicado em: (2018)
por: del Río, Iria
Publicado em: (2018)
article Modelling the rainfall threshold for shallow landslides considering the landslide predisposing factors in Portugal
por: Villaça, Caio
Publicado em: (2024)
por: Villaça, Caio
Publicado em: (2024)
book HPSG 2005 - The 12th International Conference on Head-Driven Phrase Structure Grammar: Conference Notes
por: Branco, António Horta
Publicado em: (2005)
por: Branco, António Horta
Publicado em: (2005)
article Second-order finite volume mood method for the shallow water with dry/wet interface
por: Figueiredo, Jorge Manuel
Publicado em: (2015)
por: Figueiredo, Jorge Manuel
Publicado em: (2015)
article A MOOD-MUSCL hybrid formulation for the non-conservative shallow-water system
por: Figueiredo, Jorge
Publicado em: (2021)
por: Figueiredo, Jorge
Publicado em: (2021)
school O financiamento através de cryptoassets : token sales : aspetos societários
por: Basílio, Tiago Azevedo
Publicado em: (2019)
por: Basílio, Tiago Azevedo
Publicado em: (2019)
article PE2LGP: tradutor de português europeu para língua gestual portuguesa em glosas
por: Gonçalves, Matilde
Publicado em: (2021)
por: Gonçalves, Matilde
Publicado em: (2021)
article Combining data-driven models to assess susceptibility of shallow slides failure and run-out
por: Melo, Raquel
Publicado em: (2019)
por: Melo, Raquel
Publicado em: (2019)
article Combination of statistical and physically based methods to assess shallow slide susceptibility at the basin scale
por: Oliveira, Sérgio
Publicado em: (2017)
por: Oliveira, Sérgio
Publicado em: (2017)
article Shallow water flow around an elongated bridge pier
por: Lima, M. M. C. L.
Publicado em: (2014)
por: Lima, M. M. C. L.
Publicado em: (2014)
article The role of intonation and visual cues in the perception of sentence types: Evidence from European Portuguese varieties
por: Cruz, Marisa
Publicado em: (2017)
por: Cruz, Marisa
Publicado em: (2017)
article A deep learning classifier for sentence classification in biomedical and computer science abstracts
por: Goncalves, Sergio
Publicado em: (2020)
por: Goncalves, Sergio
Publicado em: (2020)
school Perceptions of individuals serving community orders regarding crime and sentences
por: Andrade, Joana Raquel Mendes
Publicado em: (2019)
por: Andrade, Joana Raquel Mendes
Publicado em: (2019)
article Psycholinguistics is definitely tied up to prosody
por: Lourenço-Gomes, Maria do Carmo
Publicado em: (2016)
por: Lourenço-Gomes, Maria do Carmo
Publicado em: (2016)
book Road network exposure to deep-seated and shallow slides at the basin-scale (Grande da Pipa River Basin, Portugal)
por: Branco, Igor
Publicado em: (2023)
por: Branco, Igor
Publicado em: (2023)
article On the solution of the slope beach problem in the context of shallow-water code benchmarking: Why non-linearization of the initial waveforms is essential
por: Figueiredo, Jorge
Publicado em: (2020)
por: Figueiredo, Jorge
Publicado em: (2020)
article A molecular and multivariate approach to the microbial community of a commercial shallow raceway marine recirculation system operating with a moving bed biofilter
por: Matos, Ana
Publicado em: (2011)
por: Matos, Ana
Publicado em: (2011)
article A deep learning approach for sentence classification of scientific abstracts
por: Goncalves, Sergio
Publicado em: (2018)
por: Goncalves, Sergio
Publicado em: (2018)
article Physically-based modelling of shallow slides susceptibility at the basin scale using proxy soil thickness and geotechnical data
por: Melo, Raquel
Publicado em: (2025)
por: Melo, Raquel
Publicado em: (2025)
school Deep linguistic processing of portuguese noun phrases
por: Costa, Francisco Nuno Quintiliano Mendonça Carapeto
Publicado em: (2007)
por: Costa, Francisco Nuno Quintiliano Mendonça Carapeto
Publicado em: (2007)
Registos relacionados
-
school Shallow processing of portuguese: from sentence chunking to nominal lemmatization
por: Silva, João Ricardo Martins Ferreira da
Publicado em: (2007) -
assignment Tokenization of Portuguese: resolving the hard cases
por: Branco, António Horta
Publicado em: (2003) -
school Developing reliability metrics and validation tools for datasets with deep linguistic information
por: Castro, Sérgio Ricardo de
Publicado em: (2011) -
article Words matter: Judges’ value judgments in sentence pronouncements remarks.
por: Castro Rodrigues, Andreia de
Publicado em: (2023) -
groups Susceptibility assessment of shallow slides failure and run-out
por: Melo, Raquel
Publicado em: (2019)