Skip to main content

Experiences with IR TOP N Optimization in a Main Memory DBMS: Applying ‘the Database Approach’ in New Domains

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2097))

Abstract

Data abstraction and query processing techniques are usually studied in the domain of administrative applications. We present a case-study in the non-standard domain of (multimedia) information retrieval, mainly intended as a feasibility study in favor of the ’database approach’ to data management.

Top-N queries form a natural query class when dealing with content retrieval. In the IR field, a lot of research has been done on processing top-N queries efficiently. Unfortunately, these results cannot directly be ported to the database environment, because their tuple-oriented nature would seriously limit the freedom of the query optimizer to select appropriate query plans.

By horizontally fragmenting our database containing document statistics, we are able to combine some of the best of the IR and database optimization principles, providing good retrieval quality as well as database ‘goodies’ like flexibility, scalability, effciency, and generality. Key issues we address in this paper concern the effects of our fragmentation approach on speed and quality of the answers, opportunities for scalability, supported by experimental results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H.E. Blok, S. Choenni, H.M. Blanken, and P.M.G. Apers. A selectivity model for fragmented relations in information retrieval. CTIT Technical Report Series 01-02, CTIT, Enschede, The Netherlands, feb 2001.

    Google Scholar 

  2. Peter A. Boncz and Martin L. Kersten. MIL Primitives For Querying A Fragmented World. VLDB Journal, 8(2), oct 1999.

    Google Scholar 

  3. Peter A. Boncz, Stefan Manegold, and Martin Kersten. Database Architecture Optimized for the new Bottleneck: Memory Access. In Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie, editors, Proceedings of the 25th VLDB Conference. VLDB, Morgan Kaufmann, sep 1999.

    Google Scholar 

  4. Peter A. Boncz, Annita N. Wilschut, and Martin L. Kersten. Flattening an Object Algebra to Provide Performance. In Proceedings of the 14th International Conference on Data Engineering (ICDE’98), IEEE Transactions on Knowledge and Data Engineering. IEEE Computer Society, feb 1998.

    Google Scholar 

  5. Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proceedings of the Seventh International World Wide Web Conference (WWW7). WWW7 Consortium, apr 1998.

    Google Scholar 

  6. Eric W. Brown. Execution Performance Issues in Full-Text Information Retrieval. Ph.D. Thesis/Technical Report 95-81, University of Massachusetts, Amherst, okt 1995.

    Google Scholar 

  7. J.P. Callan, W.B. Croft, and S.M. Harding. The INQUERY Retrieval System. In A. Min Tjoa and Isidro Ramos, editors, 3rd International Conference on Database and Expert Systems Applications (DEXA’92), pages 78–83, 1992.

    Google Scholar 

  8. Douglass R. Cutting and Jan O. Pedersen. Space Optimizations for Total Ranking. In Proceedings of RAIO’97, Computer-Assisted Information Searching on Internet, pages 401–412, jun 1997.

    Google Scholar 

  9. A.P. de Vries. Content and multimedia database management systems. PhD thesis, University of Twente, Enschede, The Netherlands, December 1999.

    Google Scholar 

  10. A.P. de Vries, M.G.L.M. van Doorn, H.M. Blanken, and P.M.G. Apers. The Mirror MMDBMS architecture. In Proceedings of 25th International Conference on Very Large Databases (VLDB’ 99), pages 758–761, Edinburgh, Scotland, UK, September 1999. Technical demo.

    Google Scholar 

  11. A.P. de Vries and A.N. Wilschut. On the integration of IR and databases. In Database issues in multimedia; short paper proceedings, international conference on database semantics (DS-8), pages 16–31, Rotorua, New Zealand, January 1999.

    Google Scholar 

  12. Ronald Fagin. Fuzzy Queries in Multimedia Database Systems. In Proceedings of the 1998 ACM SIGMOD International Conference on Principles of Database Systems (PODS’98), pages 1–10. ACM Press, 1998.

    Google Scholar 

  13. Ronald Fagin. Combining fuzzy information from multiple systems. Journal on Computer and System Sciences, 58(1):83–99, feb 1999. Special issue for selected papers from the 1996 ACM SIGMOD PODS Conference.

    Article  MATH  MathSciNet  Google Scholar 

  14. Ronald Fagin and Yoëlle S. Maarek. Allowing users to weight search terms. Retrieved from authors website.

    Google Scholar 

  15. David A. Grossman and Ophir Frieder. Information retrieval: algorithms and heuristics. The Kluwer international series in engineering and computer science. Kluwer Academic, Boston, 1998.

    MATH  Google Scholar 

  16. D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Voorhees and Harman [18].

    Google Scholar 

  17. C.J. van Rijsbergen. Information Retrieval. Butterworths, London, 2nd. edition, 1979.

    Google Scholar 

  18. E.M. Voorhees and D.K. Harman, editors. Proceedings of the Seventh Text Retrieval Conference (TREC-7), NIST Special publications, Gaithersburg, Maryland, nov 1999.

    Google Scholar 

  19. A.P. de Vries and D. Hiemstra. The Mirror DBMS at TREC. In Voorhees and Harman [18], pages 725–734.

    Google Scholar 

  20. G.K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading, MA, USA, 1949.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Blok, H.E., de Vries, A.P., Blanken, H.M., Apers, P.M.G. (2001). Experiences with IR TOP N Optimization in a Main Memory DBMS: Applying ‘the Database Approach’ in New Domains. In: Read, B. (eds) Advances in Databases. BNCOD 2001. Lecture Notes in Computer Science, vol 2097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45754-2_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-45754-2_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42265-5

  • Online ISBN: 978-3-540-45754-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics