Author(s):
Gomes, Daniel ; Santos, André L. ; Silva, Mário J.
Date: 2004
Persistent ID: http://hdl.handle.net/10451/14204
Origin: Repositório da Universidade de Lisboa
Subject(s): Webstore; web collections; storage management, performance evaluation; incremental crawler; tumba
Description
This technical report details the design, implementation, and experimental results of Webstore, a manager for web data. Webstore addresses the requirements of warehousing applications that need to incrementally store and maintain contents gathered from the web. In web warehouses the existence of duplicated contents is prevalent. Webstore provides an efficient elimination of duplicates mechanism based on the analysis of the contents without requiring any additional meta-data. It provides unlimited growth of storage capacity, and distinct semantics of operation adaptable to various usage contexts. Our experiments showed that Webstore outperforms NFS by 68% in read operations and by 50% in write operations