Publicação

Semantic Similarity Match for Data Quality

Detalhes bibliográficos
Resumo:	Data quality is a critical aspect of applications that support business operations. Often entities are represented more than once in data repositories. Since duplicate records do not share a common key, they are hard to detect. Duplicate detection over text is usually performed using lexical approaches, which do not capture text sense. The difficulties increase when the duplicate detection must be performed using the text sense. This work presents a semantic similarity approach, based on a text sense matching mechanism, that performs the detection of text units which are similar in sense. The goal of the proposed semantic similarity approach is therefore to perform the duplicate detection task in a data quality process
Autores principais:	Martins, Fernando
Outros Autores:	Falcão, André; Couto, Francisco M.
Assunto:	semantic similarity data cleaning data quality wordnet similarity match
Ano:	2007
País:	Portugal
Tipo de documento:	relatório
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade de Lisboa
Idioma:	português
Origem:	Repositório da Universidade de Lisboa

Descrição
Resumo:	Data quality is a critical aspect of applications that support business operations. Often entities are represented more than once in data repositories. Since duplicate records do not share a common key, they are hard to detect. Duplicate detection over text is usually performed using lexical approaches, which do not capture text sense. The difficulties increase when the duplicate detection must be performed using the text sense. This work presents a semantic similarity approach, based on a text sense matching mechanism, that performs the detection of text units which are similar in sense. The goal of the proposed semantic similarity approach is therefore to perform the duplicate detection task in a data quality process