Denormalized representation of XML Schemas

Authors: Todd A. Fast and Samaresh Panda
Last updated: 05-02-2006


Terms used in this document

  • AID/AXI : Abstract instance document, or abstract XML instance. A representation of the superset of all concrete instance documents fitting a certain information model. A schema is one type of AID, but there are other languages for describing AIDs (e.g. Relax NG, Examplotron). In this design, we focus on an AXI description that primarily specifies the content model of concrete XML instances.
  • AXIOM: Abstract XML Instance Object Model. The object model for representing an AID.
  • CID/CXI: Concrete instance document, or concrete XML instance. A regular XML instance document that conforms to an AID/AXI.
  • ABE: Author by example.


Introduction

This document describes the requirement and design of "Abstract XML Instance Object Model (AXIOM)".

XML Schema language, which offers facilities for describing the structure and constraining the contents of XML 1.0 documents, is a very very complex language.
  
In order to write a schema, you must spend a lot of time learning the language itself. There are tools in the market to help you write schemas, but they do require you to be familiar with the XML Schema language. XML Schema writing is a time consuming task.
  
XML Schema dictates how your XML document may look like. In other words, it describes the structure of the instance document. The "denormalization representation" is an effort to hide the complexities of XML Schema and present an XML document-like view of the schema. Instead of starting from an XML Schema, the denormalized representation allows you to rather focus on constructing the structure of your XML document. Eventually that is what your end goals are.

Denormalization representation allows:
  • to construct the structure of your XML document and generates XML Schema for it.
  • writing XML Schemas with little or no prior XML Schema language knowledge.
  • visualization of possible XML document(s) for a given XML Schema.
  • code-completion on an XML document if the document conforms to an XML Schema.

Use Cases

Here we'll discuss various use cases where AXIOM could be used as the underlying object model.

Use case 1: Author by Example (ABE)
Author-by-example (ABE) allows users to build an AXI interactively, which can then be reverse engineered by the tool into an XML schema (or some other form). Users with little or no prior knowledge of XML Schema, want to create a schema. Ideally they should be able to specify the structure of their XML instance document in some way. See picture below

XML Instance view

The tool should be able to generate a schema for the specified structure. See picture below

Generated Source

Use case 2: Visualization of XML Schema
AXIOM can be used to render an XML Schema in a form that is easy to read and understand. One possibility is to render it in such a way that it looks like an XML instance document for that schema. See picture below.

Instance view of Purchase Order schema


Use case 3: Code-completion for XML documents
AXIOM can come to the rescue, while editing an XML document  that conforms to an XML schema. It may be able to help the user in providing various options available at a particular context. Since AXIOM represents the denormalized form of a schema, it has all the information about the strcuture of the XML document for that schema and that may be used in code-completion. See picture below

Code completion based on schema


Use case 4: Transformation
Tools for transformations between XML instance documents need to effectively present an AXIOm in order for the user to map information to and from the instance document.


What is AXIOM?

AXIOM is an object model for a given XML Schema.  It is a denormalized representation of a schema. In order to create an AXIOM from a schema, the schema must be denormalized. The denormalization process requires iteration through a schema model to "flatten" the schema into a form that describes an abstract content model, that is free of types and other forms of substitution and reuse (which are an artifact of normalization), but not free of information otherwise specifying the content model (for example, minOccurs/maxOccurs). That is, a denormalized AXIOM will represent in a verbose form the allowable elements, attributes and possibly the compositors, of each element in an XML document at every point. For each element, the AXIOM will furnish lists of all(*) possible attributes and all(*) possible children element (* "all" meaning defined in the related XML schema).

Example
Consider a schema that looks like this:

address.xsd
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            targetNamespace="http://xml.netbeans.org/examples/targetNS"
            xmlns:tns="http://xml.netbeans.org/examples/targetNS"
            elementFormDefault="qualified">   
    <xsd:element name="address">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="name" type="xsd:string"/>
                <xsd:element name="address" type="xsd:string"/>
                <xsd:element name="city" type="xsd:string"/>
                <xsd:element name="state" type="xsd:string"/>
                <xsd:element name="zip" type="xsd:string"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>


The denormalized form of the above schema may look like this:
denormalized form of address.xsd

Conceptually, this denormalization will occur as a single precalculation using the entire XML schema. In practice, however, it may make sense to denormalize each element only as it is traversed by a client. Both approaches are equivalent because the resulting AXIOM is static with respect to an unchanging schema. That is, if the schema doesn't change, neither does the AXIOM. If the schema changes, then its AXIOM must be changed in response. Because the AXIOM is a denormalized form of the schema, we consider such changes as effectively refactorings, just like we would perform with instance documents conforming to a schema.

There is only ever one possible AXIOM per XML schema. It is of no bearing whether the XML schema used to create an AXIOM is contained within one or more schema files. For our purposes, we only care about logical schemas that consist of a set of types and instances of those types.


AXIOM Components

In order to keep things simpler, we have comeup with three primary components.

Element
An Element represents an element of the XML instance document. In the snippet below, shipTo is an element, which has an attribute country and five children elements, name, street, city, state, and zip.

  <shipTo country="US">
      <name>Todd A. Fast</name>
      <street>16 Network Circle</street>
      <city>Menlo Park</city>
      <state>CA</state>
      <zip>94025</zip>
  </shipTo>

Attribute

An Attribute represents an attribute of the XML instance document. In the above snippet, country is an attribute of element, shipTo.
Compositors
One of Sequence, Choice and All. These are required to group the elements and attributes together.



Algorithm

The idea of creating an AXIOM from a schema is very simple.  At the heart of it, lies a recursive procedure that creates and adds child AXIOM components to a parent AXIOM component.  A pseudo-code of the the algorithm may look like this:

populateChildren(AXIComponent parent, SchemaComponent schemaComponent) {

        for each child of schemaComponent {

            if(child is an Attribute) {
                create an AXIOM <attribute> and add it to the parent
            }

            if(child is an Element) {
                create an AXIOM <element> and add it to the parent
                get the type definition for the element
                call populateChildren(<element>, type);
            }

            if(child is a Compositor) {
                create an AXIOM <compositor> and add it to the parent
                call populateChildren(<compositor>, child);
            }

            for all other child schema component {
                call populateChildren(parent, child)
            }

        }

}

The above procedure is recursively called for each schema global element. The final outcome is a list of top level AXIOM <element>s, one for each schema global element in the schema.


Project Features

About this Project

XML was started in November 2009, is owned by dstrupl, and has 58 members.
By use of this website, you agree to the NetBeans Policies and Terms of Use (revision 20160708.bf2ac18). © 2014, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo
 
 
Close
loading
Please Confirm
Close