Abstract
Data abstraction and query processing techniques are usually studied in the domain of administrative applications. We present a case-study in the non-standard domain of (multimedia) information retrieval, mainly intended as a feasibility study in favor of the ’database approach’ to data management.
Top-N queries form a natural query class when dealing with content retrieval. In the IR field, a lot of research has been done on processing top-N queries efficiently. Unfortunately, these results cannot directly be ported to the database environment, because their tuple-oriented nature would seriously limit the freedom of the query optimizer to select appropriate query plans.
By horizontally fragmenting our database containing document statistics, we are able to combine some of the best of the IR and database optimization principles, providing good retrieval quality as well as database ‘goodies’ like flexibility, scalability, effciency, and generality. Key issues we address in this paper concern the effects of our fragmentation approach on speed and quality of the answers, opportunities for scalability, supported by experimental results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
H.E. Blok, S. Choenni, H.M. Blanken, and P.M.G. Apers. A selectivity model for fragmented relations in information retrieval. CTIT Technical Report Series 01-02, CTIT, Enschede, The Netherlands, feb 2001.
Peter A. Boncz and Martin L. Kersten. MIL Primitives For Querying A Fragmented World. VLDB Journal, 8(2), oct 1999.
Peter A. Boncz, Stefan Manegold, and Martin Kersten. Database Architecture Optimized for the new Bottleneck: Memory Access. In Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie, editors, Proceedings of the 25th VLDB Conference. VLDB, Morgan Kaufmann, sep 1999.
Peter A. Boncz, Annita N. Wilschut, and Martin L. Kersten. Flattening an Object Algebra to Provide Performance. In Proceedings of the 14th International Conference on Data Engineering (ICDE’98), IEEE Transactions on Knowledge and Data Engineering. IEEE Computer Society, feb 1998.
Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proceedings of the Seventh International World Wide Web Conference (WWW7). WWW7 Consortium, apr 1998.
Eric W. Brown. Execution Performance Issues in Full-Text Information Retrieval. Ph.D. Thesis/Technical Report 95-81, University of Massachusetts, Amherst, okt 1995.
J.P. Callan, W.B. Croft, and S.M. Harding. The INQUERY Retrieval System. In A. Min Tjoa and Isidro Ramos, editors, 3rd International Conference on Database and Expert Systems Applications (DEXA’92), pages 78–83, 1992.
Douglass R. Cutting and Jan O. Pedersen. Space Optimizations for Total Ranking. In Proceedings of RAIO’97, Computer-Assisted Information Searching on Internet, pages 401–412, jun 1997.
A.P. de Vries. Content and multimedia database management systems. PhD thesis, University of Twente, Enschede, The Netherlands, December 1999.
A.P. de Vries, M.G.L.M. van Doorn, H.M. Blanken, and P.M.G. Apers. The Mirror MMDBMS architecture. In Proceedings of 25th International Conference on Very Large Databases (VLDB’ 99), pages 758–761, Edinburgh, Scotland, UK, September 1999. Technical demo.
A.P. de Vries and A.N. Wilschut. On the integration of IR and databases. In Database issues in multimedia; short paper proceedings, international conference on database semantics (DS-8), pages 16–31, Rotorua, New Zealand, January 1999.
Ronald Fagin. Fuzzy Queries in Multimedia Database Systems. In Proceedings of the 1998 ACM SIGMOD International Conference on Principles of Database Systems (PODS’98), pages 1–10. ACM Press, 1998.
Ronald Fagin. Combining fuzzy information from multiple systems. Journal on Computer and System Sciences, 58(1):83–99, feb 1999. Special issue for selected papers from the 1996 ACM SIGMOD PODS Conference.
Ronald Fagin and Yoëlle S. Maarek. Allowing users to weight search terms. Retrieved from authors website.
David A. Grossman and Ophir Frieder. Information retrieval: algorithms and heuristics. The Kluwer international series in engineering and computer science. Kluwer Academic, Boston, 1998.
D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Voorhees and Harman [18].
C.J. van Rijsbergen. Information Retrieval. Butterworths, London, 2nd. edition, 1979.
E.M. Voorhees and D.K. Harman, editors. Proceedings of the Seventh Text Retrieval Conference (TREC-7), NIST Special publications, Gaithersburg, Maryland, nov 1999.
A.P. de Vries and D. Hiemstra. The Mirror DBMS at TREC. In Voorhees and Harman [18], pages 725–734.
G.K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading, MA, USA, 1949.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Blok, H.E., de Vries, A.P., Blanken, H.M., Apers, P.M.G. (2001). Experiences with IR TOP N Optimization in a Main Memory DBMS: Applying ‘the Database Approach’ in New Domains. In: Read, B. (eds) Advances in Databases. BNCOD 2001. Lecture Notes in Computer Science, vol 2097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45754-2_9
Download citation
DOI: https://doi.org/10.1007/3-540-45754-2_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42265-5
Online ISBN: 978-3-540-45754-1
eBook Packages: Springer Book Archive