Document details

Efficient clustering of web-derived data sets

Author(s): Luís Sarmento ; Alexander Kehlenbeck ; Eugénio Oliveira ; Lyle Ungar

Date: 2009

Origin: Repositório Aberto da Universidade do Porto

Subject(s): Informática, Ciências da computação e da informação; Informatics, Computer and information sciences


Description

Many data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation. where large classes are incorrectly divided into many smaller clusters. and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well oil web-type data.

Document Type Book
Language English
facebook logo  linkedin logo  twitter logo 
mendeley logo

Related documents