Abstract
As our world is now in its information era, huge amounts of structured, semi-structured, and unstructured data are accumulated everyday. A real universal challenge nowadays is to retrieve interesting and meaningful information from these large collections of data with the purpose of capturing users’ information needs. Keyword search is a type of search that looks for matching objects which contain one or more keywords specified by a user. Keyword search provides a simple but relatively powerful solution for millions of users to search information from large-scale data. Due to the high demands of managing and processing large collections of structured, semi-structured, and unstructured data in various emerging applications, keyword search has become an important technique. In the past decade, many efficient and effective techniques for keyword search have been developed. In this chapter, we survey several representative techniques in the literature. These techniques have several desirable characteristics which are very useful in different application scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
References
Sanjay Agrawal, Surajit Chaudhuri, and Gautam Das. DBXplorer: A system for keyword-based search over relational databases. In Proceedings of the 18th International Conference on Data Engineering (ICDE’02), pages 5–16, Washington, DC, USA, 2002. IEEE Computer Society.
S. Amer-Yahia, P. Case, T. Rölleke, J. Shanmugasundaram, and G. Weikum. Report on the DB/IR panel at sigmod 2005. SIGMOD Record, 34(4):71–74, 2005.
Ricardo A. Baeza-Yates, Carlos A. Hurtado, and Marcelo Mendoza. Query recommendation using query logs in search engines. In EDBT Workshops, volume 3268 of Lecture Notes in Computer Science, pages 588–596. Springer, 2004.
Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. Modern Information Retrieval. ACM Press/Addison-Wesley, 1999.
Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. Objectrank: authority-based keyword search in databases. In Proceedings of the Thirtieth international conference on Very large data bases (VLDB’04), pages 564–575. VLDB Endowment, 2004.
Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In Proceedings of the 18th International Conference on Data Engineering (ICDE’02), pages 431–440. IEEE Computer Society, 2002.
Huanhuan Cao, Daxin Jiang, Jian Pei, Enhong Chen, and Hang Li. Towards context-aware search by learning a very large variable length hidden markov model from search logs. In Proceedings of the 18th International World Wide Web Conference (WWW’09), pages 191–200, Madrid, Spain, April 20-24 2009.
Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, and Hang Li. Context-aware query suggestion by mining click-through and session data. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’08), pages 875–883, New York, NY, USA, 2008. ACM.
Surajit Chaudhuri and Gautam Das. Keyword querying and ranking in databases. PVLDB, 2(2):1658–1659, 2009.
Surajit Chaudhuri, Raghu Ramakrishnan, and Gerhard Weikum. Integrating DB and IR technologies: What is the sound of one hand clapping? In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR’05), pages 1–12, 2005.
Yi Chen, Wei Wang, Ziyang Liu, and Xuemin Lin. Keyword search on structured and semi-structured data. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD’09), pages 1005–1010. ACM, 2009.
Paul Alexandru Chirita, Claudiu S. Firan, and Wolfgang Nejdl. Personalized query expansion for the web. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR’07), pages 7–14, New York, NY, USA, 2007. ACM.
Kenneth Church and Bo Thiesson. The wild thing! In Proceedings of the ACL 2005 on Interactive poster and demonstration sessions (ACL’05), pages 93–96, Morristown, NJ, USA, 2005. Association for Computational Linguistics.
Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. Probabilistic query expansion using query logs. In Proceedings of the 11th international conference on World Wide Web (WWW’02), pages 325–332, New York, NY, USA, 2002. ACM.
Bhavana Bharat Dalvi, Meghana Kshirsagar, and S. Sudarshan. Keyword search on external memory data graphs. Proc. VLDB Endow., 1(1):1189–1204, 2008.
Bolin Ding, Jeffrey Xu Yu, Shan Wang, Lu Qin, Xiao Zhang, and Xuemin Lin. Finding top-k min-cost connected trees in databases. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE’07), pages 836–845, Washington, DC, USA, 2007. IEEE Computer Society.
S. E. Dreyfus and R. A. Wagner. The steiner problem in graphs. Networks, 1:195–207, 1972.
Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. Database Systems: The Complete Book. Prentice Hall Press, Upper Saddle River, NJ, USA, 2 edition, 2008.
Donna Harman, R. Baeza-Yates, Edward Fox, and W. Lee. Inverted files. In Information retrieval: data structures and algorithms, pages 28–43, Upper Saddle River, NJ, USA, 1992. Prentice-Hall, Inc.
Hao He, Haixun Wang, Jun Yang, and Philip S. Yu. Blinks: ranked keyword searches on graphs. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD’07), pages 305–316, New York, NY, USA, 2007. ACM.
Vagelis Hristidis, Luis Gravano, and Yannis Papakonstantinou. Efficient ir-style keyword search over relational databases. In Proceedings of the 29st international conference on Very large data bases (VLDB’03), pages 850–861, 2003.
Vagelis Hristidis and Yannis Papakonstantinou. Discover: Keyword search in relational databases. In Proceedings of the 28st international conference on Very large data bases (VLDB’02), pages 670–681. Morgan Kaufmann, 2002.
Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web (WWW’06), pages 387–396, New York, NY, USA, 2006. ACM.
Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S. Sudarshan, Rushi Desai, and Hrishikesh Karambelkar. Bidirectional expansion for keyword search on graph databases. In Proceedings of the 31st international conference on Very large data bases (VLDB’05), pages 505–516. ACM, 2005.
Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S. Sudarshan, Rushi Desai, and Hrishikesh Karambelkar. Bidirectional expansion for keyword search on graph databases. In Proceedings of the 31st international conference on Very large data bases (VLDB’05), pages 505–516. ACM, 2005.
Benny Kimelfeld and Yehoshua Sagiv. Finding and approximating top-k answers in keyword proximity search. In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS’06), pages 173–182, New York, NY, USA, 2006. ACM.
Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA’98), pages 668–677. ACM, 1998.
Guoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, and Lizhu Zhou. Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD’08), pages 903–914, New York, NY, USA, 2008. ACM.
Jianxin Li, Chengfei Liu, Rui Zhou, and Wei Wang. Suggestion of promising result types for xml keyword search. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10), pages 561–572. ACM, 2010.
Mu Li, Yang Zhang, Muhua Zhu, and Ming Zhou. Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (ACL’06), pages 1025–1032, Morristown, NJ, USA, 2006. Association for Computational Linguistics.
Wen-Syan Li, K. Selçuk Candan, Quoc Vu, and Divyakant Agrawal. Query relaxation by structure and semantics for retrieval of logical web documents. IEEE Trans. on Knowl. and Data Eng., 14(4):768–791, 2002.
Fang Liu, Clement Yu, Weiyi Meng, and Abdur Chowdhury. Effective keyword search in relational databases. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD’06), pages 563–574, New York, NY, USA, 2006. ACM.
Yi Luo, Xuemin Lin, Wei Wang, and Xiaofang Zhou. Spark: top-k keyword query in relational databases. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD’07), pages 115–126, New York, NY, USA, 2007. ACM.
Mark Magennis and Cornelis J. van Rijsbergen. The potential and actual effectiveness of interactive query expansion. In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR’97), pages 324–332, New York, NY, USA, 1997. ACM.
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University, 1998.
Lu Qin, Je Xu Yu, and Lijun Chang. Keyword search in databases: the power of rdbms. In Proceedings of the 35th SIGMOD International Conference on Management of Data (SIGMOD’09), pages 681–694, Providence, Rhode Island, USA, 2009. ACM Press.
Lu Qin, Jeffrey Xu Yu, Lijun Chang, and Yufei Tao. Querying communities in relational databases. In Proceedings of the 25th International Conference on Data Engineering (ICDE’09), pages 724–735. IEEE, 2009.
Mehran Sahami and Timothy D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th international conference on World Wide Web (WWW’06), pages 377–386, New York, NY, USA, 2006. ACM.
Kamal Taha and Ramez Elmasri. Bussengine: a business search engine. Knowledge and Information Systems, 23(2):153–197, 2010.
Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Random walk with restart: fast solutions and applications. Knowledge and Information Systems, 14(3):327–346, 2008.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html.
http://download.oracle.com/docs/cd/B28359_01/text.111/b28303/toc.htm.
Gerhard Weikum. DB&IR: both sides now. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD’07), pages 25–30, New York, NY, USA, 2007. ACM.
Ji-Rong Wen, Jian-Yun Nie, and Hong-Jiang Zhang. Clustering user queries of a search engine. In Proceedings of the 10th international conference on World Wide Web (WWW’01), pages 162–168, New York, NY, USA, 2001. ACM.
Jeffrey Xu Yu, Lu Qin, and Lijun Chang. Keyword Search in Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2010.
Jeffrey Xu Yu, Lu Qin, and Lijun Chang. Keyword search in relational databases: A survey. IEEE Data Eng. Bull., 33(1):67–78, 2010.
Xuan Zhou, Julien Gaugaz, Wolf-Tilo Balke, and Wolfgang Nejdl. Query relaxation using malleable schemas. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD’07), pages 545–556, New York, NY, USA, 2007. ACM.
N. Ziviani, E. Silva de Moura, G. Navarro, and R. Baeza-Yates. Compression: A key for next generation text retrieval systems. Computers, 33(11):37–44, 2000.
J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 1(1):1–30, 1998.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Zhou, B. (2011). Keyword Search on Large-Scale Structured, Semi-Structured, and Unstructured Data. In: Furht, B., Escalante, A. (eds) Handbook of Data Intensive Computing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1415-5_29
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1415-5_29
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1414-8
Online ISBN: 978-1-4614-1415-5
eBook Packages: Computer ScienceComputer Science (R0)