Scalable peer-to-peer Web search using highly discriminative keys

Luu, Toan Vinh

doi:10.5075/epfl-thesis-3945

Luu, Toan Vinh

2007

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Standard general-purpose Web retrieval relies on centralized search engines that do not realistically scale when applied to the exponentially growing number of documents available on the Web. By taking advantage of the resource sharing principle, Peer-to-Peer (P2P) techniques represent a promising architectural alternative for building decentralized Web search engines offering true Web-size scalability, provided that enough peers are available. However, in all such P2P approaches proposed so far, excessive network bandwidth consumption during retrieval, caused by the necessary transmission of possibly very long posting lists, was identified as the major bottleneck for implementing truly scalable P2P full-text Web retrieval. The main objective of the present research is thus to find a decentralized indexing/retrieval strategy that fully exploits the distributed computation possibilities provided by P2P networks, but keeps the required network bandwidth consumption scalable, while guaranteeing an acceptable retrieval quality. To address this problem we introduce a novel indexing/retrieval model based on Highly Discriminative Keys (HDKs), which correspond to carefully selected indexing terms and term sets associated with posting lists truncated to the top-k most relevant documents with respect to the associated key. Using HDKs for indexing thus increases the number of indexing features but, at the same time, strictly limits the size of the associated posting lists. When combined with an adequate retrieval model, this leads to strongly reduced network traffic. More precisely, our experimental results show that HDK-based indexing and retrieval lead to storage and bandwidth requirements that remain manageable with respect to the growth of document collection while preserving a fully acceptable retrieval quality.

Details

Title Scalable peer-to-peer Web search using highly discriminative keys

Author(s) Luu, Toan Vinh

Advisor(s)

Rajman, Martin

Pagination 124

Date 2007

Publisher Lausanne, EPFL

Keywords

Distributed Information Retrieval; Highly Discriminative Keys; P2P System; Web Search Engine; moteurs de recherche; pair-à-pair; clés hautement discriminatives; système décentralisé

Language English

DOI https://doi.org/10.5075/epfl-thesis-3945

Other identifier(s) urn: urn:nbn:ch:bel-epfl-thesis3945-1

Laboratories LIA

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > LIA - Artificial Intelligence Laboratory
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2007-09-12

Files

Abstract

Details

Actions