Abstract
In this paper, we study the following problem. Given a database and a set of queries, we want to find a set of views that can compute the answers to the queries, such that the amount of space, in bytes, required to store the viewset is minimum on the given database. (We also handle problem instances where the input has a set of database instances, as described by an oracle that returns the sizes of view relations for given view definitions.) This problem is important for applications such as distributed databases, data warehousing, and data integration. We explore the decidability and complexity of the problem for workloads of conjunctive queries. We show that results differ significantly depending on whether the workload queries have self-joins. Further, for queries without self-joins we describe a very compact search space of views, which contains all views in at least one optimal viewset. We present techniques for finding a minimum-size viewset for a single query without self-joins by using the shape of the query and its constraints, and validate the approach by extensive experiments.
Similar content being viewed by others
References
Bernstein, P.A., Goodman, N., Wong, E., Reeve, C.L., Rothnie, Jr., J.B.: Query processing in a system for distributed databases (SDD-1). ACM Trans. Database Syst. (TODS) 6(4), 602–625 (1981)
Ceri, S., Pelagatti, G.: Distributed Databases: Principles and Systems. McGraw-Hill Book Company, New York (1984)
Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Prentice-Hall, Englewood Cliffs, NJ (1999)
Lenzerini, M.: Data integration: A theoretical perspective. PODS pp. 233–246 (2002)
Hacıgümüş, H., Iyer, B., Mehrotra, S.: Providing database as a service. ICDE (2002)
Wiederhold, G.: Mediators in the architecture of future information systems. IEEE Comput. 25(3), 38–49 (1992)
Haas, L.M., Kossmann, D., Wimmers, E.L., Yang, J.: Optimizing queries across diverse data sources. In: Proc. VLDB pp. 276–285 (1997)
Hacıgümüş, H., Iyer, B., Li, C., Mehrotra, S.: Executing SQL over encrypted data in the database-service-provider model. SIGMOD (2002)
Gupta, H.: Selection of views to materialize in a data warehouse. ICDT (1997)
Theodoratos, D., Ligoudistianos, S., Sellis, T.: Designing the global data warehouse with spj views. CAiSE (1999)
Theodoratos, D., Sellis, T.: Data warehouse configuration. In: Proc. VLDB (1997)
Yang, J., Karlapalem, K., Li, Q.: Algorithms for materialized view design in data warehousing environment. In: Proc. VLDB (1997)
Baralis, E., Paraboschi, S., Teniente, E.: Materialized view selection in a multidimensional database. In: Proc. VLDB (1997)
Gupta, H., Harinarayan, V., Rajaraman, A., Ullman, J.: Index selection in olap. ICDE (1997)
Harinarayan, V., Rajaraman, A., Ullman, J.: Implementing data cubes efficiently. SIGMOD (1996)
Bello, R., Dias, K., Downing, A., Feenan, J., Finnerty, J., Norcott, W., Sun, H., Witkowski, A., Ziauddin, M.: Materialized views in Oracle. In: Proc. VLDB pp. 659–664 (1998)
Goldstein, J., Larson, P.-A.: Optimizing queries using materialized views: A practical, scalable solution. SIGMOD 331–342 (2001)
Zaharioudakis, M., Cochrane, R., Lapis, G., Pirahesh, H., Urata, M.: Answering complex SQL queries using automatic summary tables. SIGMOD pp. 105–116 (2000)
Agrawal, S., Chaudhuri, S., Narasayya, V.: Automated selection of materialized views and indexes in Microsoft SQL Server. Proc. VLDB 496–505 (2000)
Chirkova, R.: The view-selection problem has an exponential-time lower bound for conjunctive queries and views. In: PODS pp. 159–168 (2002)
Chirkova, R., Genesereth, M.R.: Linearly bounded reformulations of conjunctive databases. DOOD (2000)
Chirkova, R., Halevy, A.Y., Suciu, D.: A formal perspective on the view selection problem. VLDB J. 11(3), pp. 216–237. (2002)
Afrati, F., Li, C., Ullman, J.D.: Generating efficient plans using views. pp. 319–330, SIGMOD (2001)
Levy, A., Mendelzon, A.O., Sagiv, Y., Srivastava, D.: Answering queries using views. pp. 95–104, PODS (1995)
Halevy, A.: Answering queries using views: A survey. VLDB J. (2001)
Calvanese, D., De Giacomo, G., Lenzerini, M.: Answering queries using views over description logics knowledge bases. PODS pp. 386–391 (2000)
Calvanese, D., De Giacomo, G., Lenzerini, M., Vardi, M.Y.: Answering regular path queries using views. ICDE pp. 389–398 (2000)
Li, C., Bawa, M., Ullman, J.D.: Minimizing view sets without losing query-answering power. ICDT pp. 99–113 (2001)
Chen, Z., Seshadri, P.: An algebraic compression framework for query results. ICDE pp. 177–188 (2000)
TPC-H: http://www.tpc.org/tpch/
Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. STOC pp. 77–90 (1977)
Sagiv, Y., Yannakakis, M.: Equivalences among relational expressions with the union and difference operators. J. ACM 27(4), 633–655 (1980)
Pottinger, R., Levy, A.: A scalable algorithm for answering queries using views. Proc. VLDB (2000)
Yannakakis, M.: Algorithms for acyclic database schemes. Proc. VLDB, IEEE Computer Society Press 82–94 (1981)
Gupta, A., Sagiv, Y., Ullman, J.D., Widom, J.: Constraint checking with partial information. PODS pp. 45–55 (1994)
Klug, A.: On conjunctive queries containing inequalities. J. ACM 35(1), 146–160 (1988)
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. SIGMOD pp. 23–34 (1979)
Graham, M.: On the universal relation. Technical report, University of Toronto, Canada (1979)
Barsalou, T., Keller, A.M., Siambela, N., Wiederhold, G.: Updating relational databases through object-based views. In: Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data, pp. 248–257 (1991)
Ioannidis, Y.E., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Proceedings of the SIGMOD Conference, pp. 233–244 (1995)
Gibbons, P.B., Matias, Y.: New sampling-based summary statistics for improving approximate query answers. In: Proceedings of the SIGMOD Conference, pp. 331–342 (1998)
Chaudhuri, S.: An overview of query optimization in relational systems. PODS pp. 34–43 (1998)
Haas, P.J., Naughton, J.F., Seshadri, S., Stokes, L.: Sampling-based estimation of the number of distinct values of an attribute. In: Proc. VLDB pp. 311–322 (1995)
Author information
Authors and Affiliations
Corresponding author
Additional information
Part of this article was published elsewhere [Chirkova, R., Li, C.: Materializing views with minimal size to answer queries. PODS (2003)]. In addition to the prior materials, this article contains new theoretical results, as well as new results on how to efficiently implement the proposed techniques (Sects. 5 and 5.4)
Rights and permissions
About this article
Cite this article
Chirkova, R., Li, C. & Li, J. Answering queries using materialized views with minimum size. The VLDB Journal 15, 191–210 (2006). https://doi.org/10.1007/s00778-005-0162-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-005-0162-8