Publicação

Clickstream data warehousing for web crawlers profiling

Ver documento

Detalhes bibliográficos
Resumo:Web sites routinely monitor visitor traffic as a useful measure of their overall success. However, simple summaries such as the total number of visits per month provide little insight about individual site patterns, especially in a changing environment like the Web. In this paper it is described an approach to usage profiling based on clickstream data collected on several Web servers' sites and stored in a specialized clickstream data warehousing. We aim at providing valuable insights about common users, but also preventing unauthorised access to contents and any form of overload that might deteriorate site performance. Common crawler detection heuristics help to classify sessions, enabling the construction of site-specific profile training sets. Then, classification algorithms are used for building predictive models that can evaluate unseen sessions, namely their nature and potential site hazard, when they are still ongoing.
Autores principais:Lourenço, Anália
Outros Autores:Belo, Orlando
Assunto:Data warehousing Clickstream data Web housing Web usage mining Web crawler profiling
Ano:2011
País:Portugal
Tipo de documento:comunicação em conferência
Tipo de acesso:acesso restrito
Instituição associada:Universidade do Minho
Idioma:inglês
Origem:RepositóriUM - Universidade do Minho
Descrição
Resumo:Web sites routinely monitor visitor traffic as a useful measure of their overall success. However, simple summaries such as the total number of visits per month provide little insight about individual site patterns, especially in a changing environment like the Web. In this paper it is described an approach to usage profiling based on clickstream data collected on several Web servers' sites and stored in a specialized clickstream data warehousing. We aim at providing valuable insights about common users, but also preventing unauthorised access to contents and any form of overload that might deteriorate site performance. Common crawler detection heuristics help to classify sessions, enabling the construction of site-specific profile training sets. Then, classification algorithms are used for building predictive models that can evaluate unseen sessions, namely their nature and potential site hazard, when they are still ongoing.