Abstract
Despite decades of effort, intelligent object search remains elusive. Neither search engine nor semantic web technologies alone have managed to provide usable systems for simple questions such as “find me a flat with a garden and more than two bedrooms near a supermarket.”
We introduce deqa, a conceptual framework that achieves this elusive goal through combining state-of-the-art semantic technologies with effective data extraction. To that end, we apply deqa, to the UK real estate domain and show that it can answer a significant percentage of such questions correctly. deqa achieves this by mapping natural language questions to Sparql patterns. These patterns are then evaluated on an RDF database of current real estate offers. The offers are obtained using OXPath, a state-of-the-art data extraction system, on the major agencies in the Oxford area and linked through Limes to background knowledge such as the location of supermarkets.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
The research leading to these results has received funding under the European Commission’s Seventh Framework Programme (FP7/2007–2013) from ERC grant agreement DIADEM, no. 246858, IP grant agreement LOD2, no. 257943 and Eurostars E!4604 SCMS.
References
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, pp. 2670–2676 (2007)
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)
Bizer, C., Schultz, A.: The R2R framework: Publishing and discovering mappings on the web. In: COLD (2010)
Bleiholder, J., Naumann, F.: Data fusion. ACM Comput. Surv. 41(1), 1–41 (2008)
Bühmann, L., Lehmann, J.: Universal OWL Axiom Enrichment for Large Knowledge Bases. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 57–71. Springer, Heidelberg (2012)
Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE TKDE 18(10), 1411–1428 (2006)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165, 91–134 (2005)
Furche, T., Gottlob, G., Grasso, G., Schallhart, C., Sellers, A.: OXPath: A language for scalable, memory-efficient data extraction from web applications. In: VLDB, pp. 1016–1027 (2011)
Gerber, D., Ngonga Ngomo, A.-C.: Bootstrapping the Linked Data Web. In: Proc. of WekEx at ISWC (2011)
Grant, C., George, C.P., Gumbs, J.D., Wilson, J.N., Dobbins, P.J.: Morpheus: a deep web question answering system. In: iiWAS, pp. 841–844 (2010)
Gulhane, P., Madaan, A., Mehta, R., Ramamirtham, J., Rastogi, R., Satpal, S., Sengamedu, S.H., Tengli, A., Tiwari, C.: Web-scale information extraction with vertex. In: ICDE, pp. 1209–1220 (2011)
Kayed, M., Chang, C.H.: FiVaTech: Page-level web data extraction from template pages. IEEE TKDE 22(2), 249–263 (2010)
Köpcke, H., Thor, A., Rahm, E.: Comparative evaluation of entity resolution approaches with fever. In: VLDB, pp. 1574–1577 (2009)
Kranzdorf, J., Sellers, A., Grasso, G., Schallhart, C., Furche, T.: Spotting the tracks on the OXPath. In: WWW (2012)
Lehmann, J., Auer, S., Bhmann, L., Tramp, S.: Class expression learning for ontology engineering. J. of Web Semantics 9, 71–81 (2011)
Lin, J.: The Web as a resource for question answering: Perspectives and challenges. In: LREC 2002 (2002)
Lopez, V., Uren, V., Sabou, M., Motta, E.: Is question answering fit for the semantic web? A survey. Semantic Web J. 2, 125–155 (2011)
Lopez, V., Fernández, M., Motta, E., Stieler, N.: PowerAqua: Supporting users in querying and exploring the Semantic Web content. Semantic Web Journal (2012), http://www.semantic-web-journal.net/
Mollá, D., Vicedo, J.L.: Question answering in restricted domains: An overview. Comput. Linguist. 33(1), 41–61 (2007)
Ngonga Ngomo, A.-C., Lyko, K.: EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 149–163. Springer, Heidelberg (2012)
Ngonga Ngomo, A.-C.: A time-efficient hybrid approach to link discovery. In: OM@ISWC (2011)
Ngonga Ngomo, A.-C., Auer, S.: A time-efficient approach for large-scale link discovery on the web of data. In: IJCAI (2011)
Ngonga Ngomo, A.-C., Lehmann, J., Auer, S., Höffner, K.: Raven – active learning of link specifications. In: OM@ISWC (2011)
Nikolov, A., d’Aquin, M., Motta, E.: Unsupervised Learning of Link Discovery Configuration. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 119–133. Springer, Heidelberg (2012)
Song, D., Heflin, J.: Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 649–664. Springer, Heidelberg (2011)
Unger, C., Cimiano, P.: Pythia: Compositional meaning construction for ontology-based question answering on the Semantic Web. In: NLDB (2011)
Unger, C., Bühmann, L., Lehmann, J., Ngomo, A.C.N., Gerber, D., Cimiano, P.: Template-based question answering over RDF data. In: WWW, pp. 639–648 (2012)
Xiao, C., Wang, W., Lin, X., Yu, J.X.: Efficient similarity joins for near duplicate detection. In: WWW, pp. 131–140 (2008)
Zhai, Y., Liu, B.: Structured Data Extraction from the Web Based on Partial Tree Alignment. IEEE TKDE 18(12), 1614–1628 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lehmann, J. et al. (2012). deqa: Deep Web Extraction for Question Answering. In: Cudré-Mauroux, P., et al. The Semantic Web – ISWC 2012. ISWC 2012. Lecture Notes in Computer Science, vol 7650. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35173-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-35173-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35172-3
Online ISBN: 978-3-642-35173-0
eBook Packages: Computer ScienceComputer Science (R0)