Free and Latest article publishing for websites and ezines!







Research on Query Supporting XML Data Compression

In recent years, as XML has become an emerging standard for information exchange on the World Wide Web, it is clear that an enormous amount of data in the Internet will be encoded in XML in the near future because of it's extensibility and characteristic of cross-platform. However, XML documents in their textual form are rather verbose and tend to predate disk space and hinder the ability of query, due to the textual and repetitive nature of the XML tags and of several XML types. How to efficiently compress XML data and evaluate XPath queries over compressed XML data is a fundamental problem.In this paper, Methods of XML data compression with query support were studied intensively, including XML data models, schema formalisms and decomposition, the similarity analysis of XML documents, finding frequent subtrees, tree grammar based compression and pushing queries to compression data based on signature automata, etc.The main research works and specific contributions found in this thesis cover the following aspects:The research history and status-art of XML were summarized. Moreover, the XML data management technologies were analyzed. The disadvantages of exist methods of XML data compression were analyzed in detail. Furthermore, the developing aspects and goals of the study on XML data compression were given.A concept of XK-NF normal form for XML documents based on DTD path expression is proposed. The advantage of the definition is that it can represent the normal form with key constrains with three forms of functional dependency. The decomposition algorithm for XML schema is proposed for reducing the data redundancies based on the formalization rules, which is not mentioned by other XML compressor.The method of compressing XML data based on tree grammar is put forward. Redundant data appearing not only in a single XML document but also withindifferent documents, an XML compression method based tree-grammar is proposed. In order to compress XML data with query support, a clustering step based on k-means is performed as the first step for raw XML documents to generate clusters. Next, within a cluster, a frequent sub-structure mining algorithm is presented to generate the compression dictionary similar to FP-growth method. Finally, subtrees are substituted by binding variable and frequent sub-structure based on the thinking of tree-grammar.The method of querying compressed data is studied. A significant portion of this thesis is devoted to query over compressed XML data with the analysis of indexing schemas and query method appeared in other XML compressor. The queries are performed effectively based on signature index and signature automata under non-full decompression.A method of access control rules compression with query support is given. In order to cope with the duplication of access control rule, a rule pruning method is proposed for XML data access control based on DAC model, which can compress the access control rules effectively. Furthermore, the query algorithm is presented for compressed access control map.

Recommended Articles from the IT Science Category:

Most Viewed ScienceArticles in the IT Science Category:

  1. Research on QoS Based Multicast Routing Protocols in Mobile Ad Hoc Networks
  2. Research on Algorithms of GPU-Based 3D Medical Image Processing
  3. Study on the Political Function of Mass Media
  4. Study of Parallel FDTD Algorithm and EM Scattering in Layered Half-space
  5. Channel Model Simulation and Spread Spectrum OFDM for HF Communication
  6. A Study of Space-Frequency Coding and Signal Detection in MIMO-OFDM Systems
  7. Study on Techniques of Signal Processing for Cross-Track/Along-Track Interferometric Synthetic Apertu
  8. High-utility Association Rule Mining
  9. Research on Marker-less Human Body Motion Capture and Pose Estimation
  10. Research on MAC Layer Scheduling and Resource Management for IEEE 802.16e OFDM System
  11. MOCVD Growth of ZnO Films and ZnO/Si Light-Emitting Devices
  12. Large Scale Image Content Analysis, Retrieval, and Automatic Annotation in Web Environment
  13. Research and Application on Discrete Swarm Intelligence Optimization
  14. Issues on Model-Free Adaptive Control Theory
  15. Spatial Three Degree-of-Freedom Parallel Mechanisms: Configurations, Performances and Applications


© 2004-2009 Latest-Science-Articles.com - All Rights Reserved Worldwide.