Skip to main content

Keyword Search on Large-Scale Structured, Semi-Structured, and Unstructured Data

  • Chapter
  • First Online:
Handbook of Data Intensive Computing

Abstract

As our world is now in its information era, huge amounts of structured, semi-structured, and unstructured data are accumulated everyday. A real universal challenge nowadays is to retrieve interesting and meaningful information from these large collections of data with the purpose of capturing users’ information needs. Keyword search is a type of search that looks for matching objects which contain one or more keywords specified by a user. Keyword search provides a simple but relatively powerful solution for millions of users to search information from large-scale data. Due to the high demands of managing and processing large collections of structured, semi-structured, and unstructured data in various emerging applications, keyword search has become an important technique. In the past decade, many efficient and effective techniques for keyword search have been developed. In this chapter, we survey several representative techniques in the literature. These techniques have several desirable characteristics which are very useful in different application scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The dimension Product is a primary key in Table 29.1a, and a foreign key in Table 29.1b.

  2. 2.

    http://www.keyworddiscovery.com/keyword-stats.html

References

  1. Sanjay Agrawal, Surajit Chaudhuri, and Gautam Das. DBXplorer: A system for keyword-based search over relational databases. In Proceedings of the 18th International Conference on Data Engineering (ICDE’02), pages 5–16, Washington, DC, USA, 2002. IEEE Computer Society.

    Google Scholar 

  2. S. Amer-Yahia, P. Case, T. Rölleke, J. Shanmugasundaram, and G. Weikum. Report on the DB/IR panel at sigmod 2005. SIGMOD Record, 34(4):71–74, 2005.

    Article  Google Scholar 

  3. Ricardo A. Baeza-Yates, Carlos A. Hurtado, and Marcelo Mendoza. Query recommendation using query logs in search engines. In EDBT Workshops, volume 3268 of Lecture Notes in Computer Science, pages 588–596. Springer, 2004.

    Google Scholar 

  4. Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. Modern Information Retrieval. ACM Press/Addison-Wesley, 1999.

    Google Scholar 

  5. Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. Objectrank: authority-based keyword search in databases. In Proceedings of the Thirtieth international conference on Very large data bases (VLDB’04), pages 564–575. VLDB Endowment, 2004.

    Google Scholar 

  6. Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In Proceedings of the 18th International Conference on Data Engineering (ICDE’02), pages 431–440. IEEE Computer Society, 2002.

    Google Scholar 

  7. Huanhuan Cao, Daxin Jiang, Jian Pei, Enhong Chen, and Hang Li. Towards context-aware search by learning a very large variable length hidden markov model from search logs. In Proceedings of the 18th International World Wide Web Conference (WWW’09), pages 191–200, Madrid, Spain, April 20-24 2009.

    Google Scholar 

  8. Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, and Hang Li. Context-aware query suggestion by mining click-through and session data. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’08), pages 875–883, New York, NY, USA, 2008. ACM.

    Google Scholar 

  9. Surajit Chaudhuri and Gautam Das. Keyword querying and ranking in databases. PVLDB, 2(2):1658–1659, 2009.

    MathSciNet  Google Scholar 

  10. Surajit Chaudhuri, Raghu Ramakrishnan, and Gerhard Weikum. Integrating DB and IR technologies: What is the sound of one hand clapping? In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR’05), pages 1–12, 2005.

    Google Scholar 

  11. Yi Chen, Wei Wang, Ziyang Liu, and Xuemin Lin. Keyword search on structured and semi-structured data. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD’09), pages 1005–1010. ACM, 2009.

    Google Scholar 

  12. Paul Alexandru Chirita, Claudiu S. Firan, and Wolfgang Nejdl. Personalized query expansion for the web. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR’07), pages 7–14, New York, NY, USA, 2007. ACM.

    Google Scholar 

  13. Kenneth Church and Bo Thiesson. The wild thing! In Proceedings of the ACL 2005 on Interactive poster and demonstration sessions (ACL’05), pages 93–96, Morristown, NJ, USA, 2005. Association for Computational Linguistics.

    Google Scholar 

  14. Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. Probabilistic query expansion using query logs. In Proceedings of the 11th international conference on World Wide Web (WWW’02), pages 325–332, New York, NY, USA, 2002. ACM.

    Google Scholar 

  15. Bhavana Bharat Dalvi, Meghana Kshirsagar, and S. Sudarshan. Keyword search on external memory data graphs. Proc. VLDB Endow., 1(1):1189–1204, 2008.

    Google Scholar 

  16. Bolin Ding, Jeffrey Xu Yu, Shan Wang, Lu Qin, Xiao Zhang, and Xuemin Lin. Finding top-k min-cost connected trees in databases. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE’07), pages 836–845, Washington, DC, USA, 2007. IEEE Computer Society.

    Google Scholar 

  17. S. E. Dreyfus and R. A. Wagner. The steiner problem in graphs. Networks, 1:195–207, 1972.

    Article  MATH  MathSciNet  Google Scholar 

  18. Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. Database Systems: The Complete Book. Prentice Hall Press, Upper Saddle River, NJ, USA, 2 edition, 2008.

    Google Scholar 

  19. Donna Harman, R. Baeza-Yates, Edward Fox, and W. Lee. Inverted files. In Information retrieval: data structures and algorithms, pages 28–43, Upper Saddle River, NJ, USA, 1992. Prentice-Hall, Inc.

    Google Scholar 

  20. Hao He, Haixun Wang, Jun Yang, and Philip S. Yu. Blinks: ranked keyword searches on graphs. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD’07), pages 305–316, New York, NY, USA, 2007. ACM.

    Google Scholar 

  21. Vagelis Hristidis, Luis Gravano, and Yannis Papakonstantinou. Efficient ir-style keyword search over relational databases. In Proceedings of the 29st international conference on Very large data bases (VLDB’03), pages 850–861, 2003.

    Google Scholar 

  22. Vagelis Hristidis and Yannis Papakonstantinou. Discover: Keyword search in relational databases. In Proceedings of the 28st international conference on Very large data bases (VLDB’02), pages 670–681. Morgan Kaufmann, 2002.

    Google Scholar 

  23. Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web (WWW’06), pages 387–396, New York, NY, USA, 2006. ACM.

    Google Scholar 

  24. Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S. Sudarshan, Rushi Desai, and Hrishikesh Karambelkar. Bidirectional expansion for keyword search on graph databases. In Proceedings of the 31st international conference on Very large data bases (VLDB’05), pages 505–516. ACM, 2005.

    Google Scholar 

  25. Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S. Sudarshan, Rushi Desai, and Hrishikesh Karambelkar. Bidirectional expansion for keyword search on graph databases. In Proceedings of the 31st international conference on Very large data bases (VLDB’05), pages 505–516. ACM, 2005.

    Google Scholar 

  26. Benny Kimelfeld and Yehoshua Sagiv. Finding and approximating top-k answers in keyword proximity search. In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS’06), pages 173–182, New York, NY, USA, 2006. ACM.

    Google Scholar 

  27. Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA’98), pages 668–677. ACM, 1998.

    Google Scholar 

  28. Guoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, and Lizhu Zhou. Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD’08), pages 903–914, New York, NY, USA, 2008. ACM.

    Google Scholar 

  29. Jianxin Li, Chengfei Liu, Rui Zhou, and Wei Wang. Suggestion of promising result types for xml keyword search. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10), pages 561–572. ACM, 2010.

    Google Scholar 

  30. Mu Li, Yang Zhang, Muhua Zhu, and Ming Zhou. Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (ACL’06), pages 1025–1032, Morristown, NJ, USA, 2006. Association for Computational Linguistics.

    Google Scholar 

  31. Wen-Syan Li, K. Selçuk Candan, Quoc Vu, and Divyakant Agrawal. Query relaxation by structure and semantics for retrieval of logical web documents. IEEE Trans. on Knowl. and Data Eng., 14(4):768–791, 2002.

    Google Scholar 

  32. Fang Liu, Clement Yu, Weiyi Meng, and Abdur Chowdhury. Effective keyword search in relational databases. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD’06), pages 563–574, New York, NY, USA, 2006. ACM.

    Google Scholar 

  33. Yi Luo, Xuemin Lin, Wei Wang, and Xiaofang Zhou. Spark: top-k keyword query in relational databases. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD’07), pages 115–126, New York, NY, USA, 2007. ACM.

    Google Scholar 

  34. Mark Magennis and Cornelis J. van Rijsbergen. The potential and actual effectiveness of interactive query expansion. In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR’97), pages 324–332, New York, NY, USA, 1997. ACM.

    Google Scholar 

  35. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.

    Google Scholar 

  36. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University, 1998.

    Google Scholar 

  37. Lu Qin, Je Xu Yu, and Lijun Chang. Keyword search in databases: the power of rdbms. In Proceedings of the 35th SIGMOD International Conference on Management of Data (SIGMOD’09), pages 681–694, Providence, Rhode Island, USA, 2009. ACM Press.

    Google Scholar 

  38. Lu Qin, Jeffrey Xu Yu, Lijun Chang, and Yufei Tao. Querying communities in relational databases. In Proceedings of the 25th International Conference on Data Engineering (ICDE’09), pages 724–735. IEEE, 2009.

    Google Scholar 

  39. Mehran Sahami and Timothy D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th international conference on World Wide Web (WWW’06), pages 377–386, New York, NY, USA, 2006. ACM.

    Google Scholar 

  40. Kamal Taha and Ramez Elmasri. Bussengine: a business search engine. Knowledge and Information Systems, 23(2):153–197, 2010.

    Article  Google Scholar 

  41. Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Random walk with restart: fast solutions and applications. Knowledge and Information Systems, 14(3):327–346, 2008.

    Article  MATH  Google Scholar 

  42. http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html.

  43. http://download.oracle.com/docs/cd/B28359_01/text.111/b28303/toc.htm.

  44. http://en.wikipedia.org/wiki/Keyword_search.

  45. http://msdn.microsoft.com/en-us/library/ms142571.aspx.

  46. Gerhard Weikum. DB&IR: both sides now. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD’07), pages 25–30, New York, NY, USA, 2007. ACM.

    Google Scholar 

  47. Ji-Rong Wen, Jian-Yun Nie, and Hong-Jiang Zhang. Clustering user queries of a search engine. In Proceedings of the 10th international conference on World Wide Web (WWW’01), pages 162–168, New York, NY, USA, 2001. ACM.

    Google Scholar 

  48. Jeffrey Xu Yu, Lu Qin, and Lijun Chang. Keyword Search in Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2010.

    Google Scholar 

  49. Jeffrey Xu Yu, Lu Qin, and Lijun Chang. Keyword search in relational databases: A survey. IEEE Data Eng. Bull., 33(1):67–78, 2010.

    Google Scholar 

  50. Xuan Zhou, Julien Gaugaz, Wolf-Tilo Balke, and Wolfgang Nejdl. Query relaxation using malleable schemas. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD’07), pages 545–556, New York, NY, USA, 2007. ACM.

    Google Scholar 

  51. N. Ziviani, E. Silva de Moura, G. Navarro, and R. Baeza-Yates. Compression: A key for next generation text retrieval systems. Computers, 33(11):37–44, 2000.

    Article  Google Scholar 

  52. J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 1(1):1–30, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Zhou, B. (2011). Keyword Search on Large-Scale Structured, Semi-Structured, and Unstructured Data. In: Furht, B., Escalante, A. (eds) Handbook of Data Intensive Computing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1415-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1415-5_29

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-1414-8

  • Online ISBN: 978-1-4614-1415-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics