Abstract
The exploitation of large volume of XML (eXtensible Markup Language) data with a limited storage space implies the development of a special and reliable treatment to compress data and query them. This work studies and treats these processes in order to combine them via a mediator while facilitating querying compressed XML data without recourse to the decompression process. We propose a new technique to compress, re-index and query XML data while improving XMill and B+Tree algorithms. We show the reliability and the speed up of the proposed querying system towards response time and answers’ exactitude.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
World Wide Web Consortium, eXtensible Markup Language (XML) 1.0, W3C Recommendation (2006), http://www.w3.org/TR/2006/REC-XML-20060816
World Wide Web Consortium, XHTML 1.0 The Extensible HyperText Markup Language (2000), http://www.w3.org/TR/xhtml1
Cheney, J.: Tradeoffs in XML Database Compression. In: Data Compression Conference, pp. 392–401 (2006)
Bača, R., Krátký, M.: TJDewey – on the efficient path labeling scheme holistic approach. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds.) DASFAA 2009. LNCS, vol. 5667, pp. 6–20. Springer, Heidelberg (2009)
Girardot, M.: Sundaresan. N.: Millau: An encoding format for efficient representation and exchange of XML over the Web. Computer Networks 33(1-6), 747–765 (2000)
League, C., Eng, K.: Schema Based Compression of XML data with Relax NG. Journal of Computers 2, 1–7 (2007)
Liefke, H., Suciu, D.: XMill: An efficient compressor for XML data. In: ACM SIGMOD International Conference on Management of Data, pp. 153–164 (2000)
Cheney, J.: Compressing XML with Multiplexed Hierarchical PPM Models. In: Data Compression Conference, pp. 163–172 (2001)
Liefke, H., Suciu, D.: An extensible compressor for XML Data. SIGMOD Record 29(1), 57–62 (2000)
Tagarelli, A.: XML Data Mining: Models, Methods, and Applications. University of Calabria, Italy (2011)
Chamberlin, D.: XQuery: An XML Query Language. IBM Systems Journal 41(4) (2002)
Wluk, R., Leong, H., Dillon, T.S., Shan, A.T., Croft, W.B., Allan, J.: A survey in indexing and searching XML documents. Journal of the American Society for Information Science and Technology 53(3), 415–435 (2002)
Bayer, R., McCreight, E.M.: Binary B-trees for virtual memory. In: ACM SIGFIDET Workshop, pp. 219–235 (1971)
Nelson, M., Gaily, J.L.: The data compression Book. 2nd Edition M&T Books (1996)
Gailly, J.-L.: Gzip, version 1.2.4, http://www.gzip.org
Seward, J.: bzip2, version 0.9.5d, http://sources.redhat.com/bzip2
Subramanian, H., Shankar, P.: Compressing XML Documents Using Recursive Finite State Automata. In: Farré, J., Litovsky, I., Schmitz, S. (eds.) CIAA 2005. LNCS, vol. 3845, pp. 282–293. Springer, Heidelberg (2006)
Adiego, J., De la Fuente, P., Navarro, G.: Merging prediction by partial matching with structural contexts model. In: IEEE Data Compression Conference, p. 522 (2004)
Tolani, P.M., Haritsa, J.R.: XGRIND: A query-friendly XML compressor. In: 18th International Conference on Data Engineering, pp. 225–234 (2002)
Jedidi, A., Arfaoui, O., Sassi-Hidri, M.: Indexing Compressed XML Documents, Web-Age Information Management: XMLDM 2012, Harbin, China, pp. 319–328 (2012)
Dietz, P., Sleator, D.: Two Algorithms for Maintaining Order in a List. In: 19th Annual ACM Symposium on Theory of Computing, pp. 365–372. ACM Press (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arfaoui, O., Sassi-Hidri, M. (2013). Querying Compressed XML Data. In: Li, J., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40319-4_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-40319-4_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40318-7
Online ISBN: 978-3-642-40319-4
eBook Packages: Computer ScienceComputer Science (R0)