Editorial Notes
Computationally Replicable. The experimental results of this paper were replicated by a SIGMOD Review Committee and were found to support the central results reported in the paper. Details of the review process are found here
ABSTRACT
Users are tapping into massive, heterogeneous entity graphs for many applications. It is challenging to select entity graphs for a particular need, given abundant datasets from many sources and the oftentimes scarce information for them. We propose methods to produce preview tables for compact presentation of important entity types and relationships in entity graphs. The preview tables assist users in attaining a quick and rough preview of the data. They can be shown in a limited display space for a user to browse and explore, before she decides to spend time and resources to fetch and investigate the complete dataset. We formulate several optimization problems that look for previews with the highest scores according to intuitive goodness measures, under various constraints on preview size and distance between preview tables. The optimization problem under distance constraint is NP-hard. We design a dynamic-programming algorithm and an Apriori-style algorithm for finding optimal previews. Results from experiments, comparison with related work and user studies demonstrated the scoring measures' accuracy and the discovery algorithms' efficiency.
Supplemental Material
Available for Download
Rights information
Data, Experiments
- R. Agarwal and R. Srikant. Fast algorithms for mining association rules. In VLDB, pages 487--499, 1994. Google ScholarDigital Library
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. DBpedia: A nucleus for a Web of open data. In ISWC, pages 722--735, 2007. Google ScholarDigital Library
- A. Balmin, V. Hristidis, and Y. Papakonstantinou. Objectrank: Authority-based keyword search in databases. In VLDB, pages 564--575, 2004. Google ScholarDigital Library
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, pages 1247--1250, 2008. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107--117, 1998. Google ScholarDigital Library
- C. Bron and J. Kerbosch. Algorithm 457: finding all cliques of an undirected graph. CACM, 16(9):575--577, Sept. 1973. Google ScholarDigital Library
- J. Cohen. Statistical Power Analysis for the Behavioral Sciences. Academic Press, 1988.Google Scholar
- X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In KDD, pages 601--610, 2014. Google ScholarDigital Library
- Y. Huang, Z. Liu, and Y. Chen. Query biased snippet generation in xml search. In SIGMOD, pages 315--326, 2008. Google ScholarDigital Library
- M. Jayapandian and H. V. Jagadish. Automated creation of a forms-based database query interface. PVLDB, 1(1):695--709, Aug. 2008. Google ScholarDigital Library
- F. Kose, W. Weckwerth, T. Linke, and O. Fiehn. Visualizing plant metabolomic correlation networks using clique-metabolite matrices. Bioinformatics, 17(12):1198--1208, Dec. 2001.Google ScholarCross Ref
- T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, Mar. 2009. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarCross Ref
- A. Nandi and H. V. Jagadish. Qunits: queried units in database search. In CIDR, 2009.Google Scholar
- S. E. Schaeffer. Survey: Graph clustering. Comput. Sci. Rev., 1(1):27--64, Aug. 2007. Google ScholarDigital Library
- F. M. Suchanek, G. Kasneci, and G. Weikum. YAGO: a core of semantic knowledge unifying WordNet and Wikipedia. In WWW, pages 697--706, 2007. Google ScholarDigital Library
- Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In SIGMOD, pages 567--580, 2008. Google ScholarDigital Library
- W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In SIGMOD, pages 481--492, 2012. Google ScholarDigital Library
- X. Yang, C. M. Procopiuc, and D. Srivastava. Summarizing relational databases. PVLDB, 2(1):634--645, 2009. Google ScholarDigital Library
- X. Yang, C. M. Procopiuc, and D. Srivastava. Summary graphs for relational database schemas. PVLDB, 4(11):899--910, 2011.Google ScholarDigital Library
- C. Yu and H. V. Jagadish. Schema summarization. In VLDB, pages 319--330, 2006. Google ScholarDigital Library
- N. Zhang, Y. Tian, and J. M. Patel. Discovery-driven graph summarization. In ICDE, pages 880--891, 2010.Google ScholarCross Ref
Index Terms
- Generating Preview Tables for Entity Graphs
Recommendations
Generating Chordal Graphs Included in Given Graphs
A chordal graph is a graph which contains no chordless cycle of at least four edges as an induced subgraph. The class of chordal graphs contains many famous graph classes such as trees, interval graphs, and split graphs, and is also a subclass of ...
Enumerating and generating labeled k-degenerate graphs
ALENEX '10: Proceedings of the Meeting on Algorithm Engineering & ExpermimentsA k-degenerate graph is a graph in which every induced subgraph has a vertex with degree at most k. The class of k-degenerate graphs is interesting from a theoretical point of view and it plays an interesting role in the theory of fixed parameter ...
On generating planar graphs
A 3-valent graph G is cyclically n-connected provided one must cut at least n edges in order to separate any two circuits of G. If G is cyclically n-connected but any separation of G by cutting n edges yields a component consisting of a simple circuit, ...
Comments