poster

Scheduling queries across replicas

Authors:
Ana Freire

University of A Coruña, A Coruña, Spain

University of A Coruña, A Coruña, Spain
View Profile

,
Craig Macdonald

University of Glasgow, Glasgow, United Kingdom

University of Glasgow, Glasgow, United Kingdom
View Profile

,
Nicola Tonellotto

National Research Council of Italy, Pisa, Italy

National Research Council of Italy, Pisa, Italy
View Profile

,
Iadh Ounis

University of Glasgow, Glasgow, United Kingdom

University of Glasgow, Glasgow, United Kingdom
View Profile

,
Fidel Cacheda

University of A Coruña, A Coruña, Spain

University of A Coruña, A Coruña, Spain
View Profile

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalAugust 2012Pages 1139–1140https://doi.org/10.1145/2348283.2348508

Published:12 August 2012Publication History

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Pages 1139–1140

ABSTRACT

For increased efficiency, an information retrieval system can split its index into multiple shards, and then replicate these shards across many query servers. For each new query, an appropriate replica for each shard must be selected, such that the query is answered as quickly as possible. Typically, the replica with the lowest number of queued queries is selected. However, not every query takes the same time to execute, particularly if a dynamic pruning strategy is applied by each query server. Hence, the replica's queue length is an inaccurate indicator of the workload of a replica, and can result in inefficient usage of the replicas. In this work, we propose that improved replica selection can be obtained by using query efficiency prediction to measure the expected workload of a replica. Experiments are conducted using 2.2k queries, over various numbers of shards and replicas for the large GOV2 collection. Our results show that query waiting and completion times can be markedly reduced, showing that accurate response time predictions can improve scheduling accuracy and attesting the benefit of the proposed scheduling algorithm.

References

A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. CIKM 2003. Google ScholarDigital Library
F. Cacheda, V. Carneiro, V. Plachouras, and I. Ounis. Performance analysis of distributed information retrieval architectures using an improved network simulation model. Information Processing and Management, 43:204--224, 2007. Google ScholarDigital Library
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189--1232, 2000.Google ScholarCross Ref
C. Macdonald, N. Tonellotto, and I. Ounis. Learning to Predict Response Times for Online Query Scheduling. In Proc. SIGIR 2012. Google ScholarDigital Library
N. Tonellotto, C. Macdonald, and I. Ounis. Query efficiency prediction for dynamic pruning. In Proc. LSDS-IR 2011. Google ScholarDigital Library
F. Cacheda, V. Carneiro, V. Plachouras and I. Ounis. Performance Comparison of Clustered and Replicated Information Retrieval Systems. In Proc. ECIR 2007. Google ScholarDigital Library

Index Terms

Scheduling queries across replicas
1. Information systems
  1. Information retrieval

Recommendations

Hybrid query scheduling for a replicated search engine
ECIR'13: Proceedings of the 35th European conference on Advances in Information Retrieval

Search engines use replication and distribution of large indices across many query servers to achieve efficient retrieval. Under high query load, queries can be scheduled to replicas that are expected to be idle soonest, facilitated by the use of ...
Read More
Quorum-based synchronization protocols for multimedia replicas

Multiple replicas of multimedia objects are distributed to peers in overlay networks. In quorum-based (QB) protocols, every replica may not be up-to-date and the up-to-date replica can be found in the version counter. Multimedia objects are ...
Read More
Multivariate modeling and two-level scheduling of analytic queries
Highlights
- We create a multivariate regression model that can leverage query semantics to accurately predict the execution time of jobs and queries.
Abstract
Analytic queries are typically compiled into execution plans in the form of directed acyclic graphs (DAGs) of MapReduce jobs. Jobs in the DAGs are dispatched to the MapReduce processing engine as soon as their dependencies are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
August 2012
1236 pages
ISBN:9781450314725
DOI:10.1145/2348283
General Chair:
William Hersh
Oregon Health & Science University, USA
,
Program Chairs:
Jamie Callan
Carnegie Mellon University, USA
,
Yoelle Maarek
Yahoo! Research, Israel
,
Mark Sanderson
Royal Melbourne Institute of Technology, Australia
Copyright © 2012 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
query scheduling
simulation
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 173
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scheduling queries across replicas

SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hybrid query scheduling for a replicated search engine

Quorum-based synchronization protocols for multimedia replicas

Multivariate modeling and two-level scheduling of analytic queries