Publicação

Information search in large-scale peer-to-peer systems

Detalhes bibliográficos
Resumo:	Only recently has the research community fully adopted the peer–to–peer concept and started exploring this design to build very–large scalable systems. Peer–to–peer systems are a particular type of distributed systems that have no centralized entities. The system builds on multiple hosts that are autonomous (and symmetric) in relation to each other. Without any central entity, the system has no single point of failure and will continue to work despite the failure of any of its hosts. The lack of a central entity also implies that the system must adapt to host failures autonomously in order to maintain availability. The ability to withstand host failures without requiring external intervention and in maintaining availability during this period is an interesting property of peer–to–peer systems. The genesis of this thesis was short after the first scalable systems were developed. With the promise of high scalability and availability, the quantity of shared information within such large system would be huge. However, no system offered an efficient way to search the information shared between hosts. The purpose of this thesis was then set to be the development of peer– to–peer systems that can search for information efficiently within a very large number of hosts. Index structures are (important) auxiliary structures for speeding up search. Soon the decision of implementing an index on peer–to–peer systems was adopted by many. However, index structures have a negative effect on load balancing, i.e., using an index imposes an unbalanced load distribution on storage and network resources. Since peer–to–peer systems assume that resource load is evenly distributed across peers, this unbalancing limits the scalability of index implementations for very large systems. The contributions of this thesis tackle the issue of efficient load balancing on a (structured) peer–to–peer system when using a distributed index. The distributed index was used on two distinct data sets: a textual corpus with keyword based searching and geographical data with range queries. The contributions vary from simple balancing and filtering techniques that prevent local peak loads (hot-spots) to more elaborate distributed algorithms that improve significantly the balancing of storage and network loads across the entire system and without any centralized entity. The evaluations show that the simple adaptation of index structures on top of structured peer–to–peer systems is not, by itself, sufficient to create efficient systems. Additionally, the evaluations show that these techniques are able to improve the load balancing on the system, and consequently build efficient large-scale indexing systems.
Autores principais:	Lopes, Nuno Alberto Ferreira
Ano:	2009
País:	Portugal
Tipo de documento:	tese de doutoramento
Tipo de acesso:	acesso restrito
Instituição associada:	Universidade do Minho
Idioma:	inglês
Origem:	RepositóriUM - Universidade do Minho

Descrição
Resumo:	Only recently has the research community fully adopted the peer–to–peer concept and started exploring this design to build very–large scalable systems. Peer–to–peer systems are a particular type of distributed systems that have no centralized entities. The system builds on multiple hosts that are autonomous (and symmetric) in relation to each other. Without any central entity, the system has no single point of failure and will continue to work despite the failure of any of its hosts. The lack of a central entity also implies that the system must adapt to host failures autonomously in order to maintain availability. The ability to withstand host failures without requiring external intervention and in maintaining availability during this period is an interesting property of peer–to–peer systems. The genesis of this thesis was short after the first scalable systems were developed. With the promise of high scalability and availability, the quantity of shared information within such large system would be huge. However, no system offered an efficient way to search the information shared between hosts. The purpose of this thesis was then set to be the development of peer– to–peer systems that can search for information efficiently within a very large number of hosts. Index structures are (important) auxiliary structures for speeding up search. Soon the decision of implementing an index on peer–to–peer systems was adopted by many. However, index structures have a negative effect on load balancing, i.e., using an index imposes an unbalanced load distribution on storage and network resources. Since peer–to–peer systems assume that resource load is evenly distributed across peers, this unbalancing limits the scalability of index implementations for very large systems. The contributions of this thesis tackle the issue of efficient load balancing on a (structured) peer–to–peer system when using a distributed index. The distributed index was used on two distinct data sets: a textual corpus with keyword based searching and geographical data with range queries. The contributions vary from simple balancing and filtering techniques that prevent local peak loads (hot-spots) to more elaborate distributed algorithms that improve significantly the balancing of storage and network loads across the entire system and without any centralized entity. The evaluations show that the simple adaptation of index structures on top of structured peer–to–peer systems is not, by itself, sufficient to create efficient systems. Additionally, the evaluations show that these techniques are able to improve the load balancing on the system, and consequently build efficient large-scale indexing systems.