Skip to main content

Query Planning in the Presence of Overlapping Sources

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3896))

Abstract

Navigational queries on Web-accessible life science sources pose unique query optimization challenges. The objects in these sources are interconnected to objects in other sources, forming a large and complex graph, and there is an overlap of objects in the sources. Answering a query requires the traversal of multiple alternate paths through these sources. Each path can be associated with the benefit or the cardinality of the target object set (TOS) of objects reached in the result. There is also an evaluation cost of reaching the TOS.

We present dual problems in selecting the best set of paths. The first problem is to select a set of paths that satisfy a constraint on the evaluation cost while maximizing the benefit (number of distinct objects in the TOS). The dual problem is to select a set of paths that satisfies a threshold of the TOS benefit with minimal evaluation cost. The two problems can be mapped to the budgeted maximum coverage problem and the maximal set cover with a threshold. To solve these problems, we explore several solutions including greedy heuristics, a randomized search, and a traditional IP/LP formulation with bounds. We perform experiments on a real-world graph of life sciences objects from NCBI and report on the computational overhead of our solutions and their performance compared to the optimal solution.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Khuller, S., Moss, A., Naor, J.S.: The budgeted maximum coverage problem. Inf. Process. Lett. 70, 39–45 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  2. Lacroix, Z., Murthy, H., Naumann, F., Raschid, L.: Links and paths through life sciences data sources. In: Rahm, E. (ed.) DILS 2004. LNCS (LNBI), vol. 2994, pp. 203–211. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Mihaila, G., Naumann, F., Raschid, L., Vidal, M.E.: A data model and query language to explore enhanced links and paths in life science sources. In: Proceedings of the ACM SIGMOD Workshop on The Web and Databases, WebDB (2005)

    Google Scholar 

  4. Raschid, L., Vidal, M.E., Cardenas, M., Marquez, N., Wu, Y.: Challenges of navigational queries: Finding best paths in graphs. Technical report, University of Maryland (2005)

    Google Scholar 

  5. Khuller, S., Raschid, L., Wu, Y.: LP randomized rounding for maximum coverage problem and minimum set cover with threshold problem. Technical report, University of Maryland (2005)

    Google Scholar 

  6. Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge University Press, Cambridge (1995)

    MATH  Google Scholar 

  7. Goos, G.: Vorlesungen über Informatik - Paralleles Rechnen und nicht-analytische Lösungsverfahren, vol. 4. Springer, Berlin (1998)

    Google Scholar 

  8. Gruser, J.R., Raschid, L., Zadorozhny, V., Zhan, T.: Learning response time for websources using query feedback and application in query optimization. VLDB Journal 9, 18–37 (2000)

    Article  Google Scholar 

  9. Nie, Z., Kambhampati, S.: A frequency-based approach for mining coverage statistics in data integration. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 387–398 (2004)

    Google Scholar 

  10. Polyzotis, N., Garofalakis, M.: Structure and value synopses for XML data graphs. In: Proc. of the Int. Conf. on Very Large Databases, VLDB (2002)

    Google Scholar 

  11. Selinger, P., Astrahan, M., Chamberlin, D., Lorie, R., Price, T.: Access path selection in a relational database management system. In: Proce. of the ACM Int. Conf. on Management of Data (SIGMOD), Boston, MA, pp. 23–34 (1979)

    Google Scholar 

  12. Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: LEO - DB2’s LEarning Optimizer. In: Proc. of the Int. Conf. on Very Large Databases (VLDB), Rome, Italy, pp. 19–28 (2001)

    Google Scholar 

  13. Kossmann, D.: The state of the art in distributed query processing. ACM Computing Surveys 32, 422–469 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bleiholder, J., Khuller, S., Naumann, F., Raschid, L., Wu, Y. (2006). Query Planning in the Presence of Overlapping Sources. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_48

Download citation

  • DOI: https://doi.org/10.1007/11687238_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32960-2

  • Online ISBN: 978-3-540-32961-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics