Skip to main content
Log in

Combining Local Scoring and Global Aggregation to Rank Entities for Deep Web Queries

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

With the rapid growth of Web databases, it is necessary to extract and integrate large-scale data available in Deep Web automatically. But current Web search engines conduct page-level ranking, which are becoming inadequate for entity-oriented vertical search. In this paper, we present an entity-level ranking mechanism called LG-ERM for Deep Web queries based on local scoring and global aggregation. Unlike traditional approaches, LG-ERM considers more rank influencing factors including the uncertainty of entity extraction, the style information of the entities and the importance of the Web sources, as well as the entity relationship. By combining local scoring and global aggregation in ranking, the query result can be more accurate and effective to meet users' needs. The experiments demonstrate the feasibility and effectiveness of the key techniques of LG-ERM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chang K C, He B, Li C, Patel M, Zhang Z. Structured databases on the web: Observations and implications. SIGMOD Record, 2004, 33(3): 61–70.

    Article  Google Scholar 

  2. Dong X, Halevy A Y, Yu C. Data integration with uncertainty. In Proc. the 33rd VLDB, Vienna, Austria, September 23–27, 2007, pp.687–698.

  3. Jin R, Valizadegan H, Li H. Ranking refinement and its application to information retrieval. In Proc. the 17th WWW, Beijing, China, April 21–25, 2008, pp.397–406.

  4. Qin T, Liu T, Zhang X, Wang D, Xiong W, Li H. Learning to rank relational objects and its application to Web search. In Proc. the 17th WWW, Beijing, China, April 21–25, 2008, pp.407–416.

  5. Chaudhuri S, Ramakrishnan R, Weikum G. Integrating DB and IR Technologies: What is the Sound of one hand clapping. In Proc. the 2nd CIDR, CA, USA, January 4–7, 2005, pp.1–12.

  6. Chakrabarti K, Ganti V, Han J W, Xin D. Ranking objects by exploiting relationships: Computing top-k over aggregation. In Proc. the 25th SIGMOD, Illinois, USA, June 27–29, 2006, pp.371–382.

  7. Cheng T, Yan X, Chang K C C. EntityRank: Searching entities directly and holistically. In Proc. the 33rd VLDB, Vienna, Austria, September 23–27, 2007, pp.387–398.

  8. Cheng T, Chang K C C. Entity search engine: Towards agile best-effort information integration over theWeb. In Proc. the 3rd CIDR, USA, January 7–10, 2007, pp.108–113.

  9. Nie Z, Ma Y, Shi S, Wen J, Ma W. Web object retrieval. In Proc. the 16th WWW, Alberta, Canada, May 8–12, 2007, pp.81–90.

  10. Nie Z, Wen J, Ma W. Object-level vertical search. In Proc. the 3rd CIDR, CA, USA, January 7–10, 2007, pp.235–246.

  11. Etzioni O, Cafarella M, Downey D. Web-scale information extraction in KnowItAll. In Proc. the 13th WWW, NY, USA, May 17–20, 2004, pp.100–110.

  12. Cai D, Yu S,Wen J, Ma W. Block-basedWeb search. In Proc. the 27th SIGIR, Sheffield, UK, July 25–29, 2004, pp.456–463.

  13. Zhu J, Nie Z, Wen J, Zhang B, Ma W. Simultaneous record detection and attribute labeling in Web data extraction. In Proc. the 12th KDD, PA, USA, August 20–23, 2006, pp.494–503.

  14. Kou Y, Li D, Shen D, Yu G, Nie T. D-EEM: A DOM-tree based entity extraction mechanism for deep Web. In Proc. the 5th CNCC, Xian, China, September 25–27, 2008, p.21.

  15. Nambiar U, Kambhampati S. Mining approximate functional dependencies and concept similarities to answer imprecise queries. In Proc. the 7th WebDB, Paris, France, June 17–18, 2004, pp.73–78.

  16. Nigam K, McCallum A K, Thrun S. Text classification from labeled and unlabeled documents using EM. Machine Learning, 2000, 39(2): 103–134.

    Article  MATH  Google Scholar 

  17. Lertnattee V, Theeramunkong T. Effect of term distributions on centroid-based text categorization. Information Sciences, 2004, 158(1): 89–115.

    Article  Google Scholar 

  18. Song R, Liu H, Wen J. Learning block importance models for Web pages. In Proc. the 13th WWW, NY, USA, May 17–20, 2004, pp.203–211.

  19. Bianchini M, Gori M, Scarselli F. Inside PageRank. ACM Transactions on Internet Technology, 2005, 5(1): 92–128.

    Article  Google Scholar 

  20. Parreira J X, Weikum G. JXP: Global authority scores in a P2P network. In Proc. the 8th WebDB, Maryland, USA, June 16–17, 2005, pp.31–36.

  21. Vazirgiannis M, Drosos D, Senellart P, Vlachou A. Web page rank prediction with Markov models. In Proc. the 17th WWW, Beijing, China, April 21–25, 2008, pp.1075–1076.

  22. Kou Y, Shen D, Li D, Nie T. A deep Web entity identification mechanism based on semantics and statistical analysis. Journal of Software, 2008, 19(2): 194–208.

    Article  Google Scholar 

  23. Yagoub K, Florescu D, Issarny V. Caching strategies for data-intensive Web sites. In Proc. the 26th VLDB, Cairo, Egypt, September 10–14, 2000, pp.188–199.

  24. Shi L, Han Y, Ding X, Wei L. An SPN-based integrated model forWeb prefetching and caching. J. Comput. Sci. & Technol, 2006, 21(4): 482–489.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Kou.

Additional information

Supported by the National Natural Science Foundation of China under Grant No. 60673139 and the National High Technology Development and Research 863 Program of China under Grant No. 2008AA01Z146.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 88.4 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kou, Y., Shen, DR., Yu, G. et al. Combining Local Scoring and Global Aggregation to Rank Entities for Deep Web Queries. J. Comput. Sci. Technol. 24, 626–637 (2009). https://doi.org/10.1007/s11390-009-9263-y

Download citation

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-009-9263-y

Keywords

Navigation