DTD to W3C XML Schema
Ayub Khan11 January 2006
Purpose
Screens
Features
Implementation
Risks/Issues
Resources
Purpose
- To provide a tool for developers to convert from an existing DTD to W3C XML Schema
Screens
TODOFeatures
Users of this tool have the following features available.- Allows conversion of complex, modularized XML DTDs and DTDs with namespaces to W3C XML Schemas
- Convert a Document Type Definition (DTD) into a XML Schema
(REC-xmlschema-1-20010502).
- It can also map DTD entities onto XML Schema constructs (simpleType, attributeGroup, group).
- Support for XML 1.0 DTD (input document)
- Generated W3C XML Schema file
- Once singel Wizard panel (for Converting a DTD to W3C XML Schema)
provides
- an option to specify target namespace
- a location for the schema file that gets generated
- has logic to validate the location
- some of these errors could be the DTD parse time errors (something that a user can correct the dtd file)
- conversion time errors are very rare to occur.
- <xs:annotation>
-- (from file: file:///C:/tmp/hello.dtd) --
</xs:annotation>
Implementation
This tools has following components- A DTD Parser
- DTD2XSD Mapper
DTD Parser:
A DTD Parser should parse DTD and represent its content in a easily accesible form. There are 3 types of parsers1. Java based (content acessible via some api)
- MATRA: (on sourceforge - http://matra.sourceforge.net)
- It gives me a model after parsing the DTD. Looks like there is issue A|B vs A,B is considered same in the generated DTD model.
- It provides me a tokenizer, from which I can get all the dtd constructs from the dtd file.
- DTDParser is licensed under either an Apache-style license or the Lesser GPL (LGPL) license. (http://www.wutka.com/dtdparserlicense.html)
These are
- perl (See http://www.w3.org/2000/04/schema_hack)
- C/C++ based. - Dtd2Xs: Non-Java based and license very very
restrictive, not
useful
http://www.syntext.com/downloads/index.htm#Dtd2Xs
3. XML representation (content represented in an XML format)
This is our approach, and we want to use it, due to various reasons (one could apply a xsl stylesheet to map this xml
representation to get an XML schema)
4. Other types of parsers
- JavaCC
- Antlr
DTD Syntax |
XML rep. |
Example |
---|---|---|
<!ELEMENT element-name
category> or <!ELEMENT element-name (element-content)> |
<element name="A"> |
|
<!ATTLIST element-name attribute-name attribute-type default-value> |
<attribute element="A">
where token-type -> ID, IDREF, IDREFS, ENTITY,
|
<attribute element="A"> |
General Entities <!ENTITY' Name Definition'> |
<entity name="Name"
type="general" > <definition> abc </definition> </entity> where abc is one of the following
</external-id> </choice> where xyz is one of the following
|
Example: Internal entities: <!ENTITY Pub-Status "This is a pre-release of the specification."> is represented as <entity name="Pub-Status" type="general" > <definition> <entity-value value="This is a pre-release of the specification."/> </definition> </entity> External Entities: <!ENTITY open-hatch<entity name="Pub-Status" type="general" > <definition> <external-id> <system sysid="http://www.textuality.com/boilerplate/OpenHatch.xml"/> </external-id> </definition> </entity> <!ENTITY open-hatch<entity name="open-hatch" type="general" > <definition> <external-id> <public pid="-//Textuality//TEXT Standard open-hatch boilerplate//EN" sid="http://www.textuality.com/boilerplate/OpenHatch.xml"/> </external-id> </definition> </entity> <!ENTITY hatch-pic<entity name="hatch-pic" type="general" > <definition> <external-id ndata="gif"> <system sysid="../grafix/OpenHatch.gif"/> </external-id> </definition> </entity> |
Parsed Entities <!ENTITY %Name Difinition> |
<entity name="Name"
type="parsed" > <definition> abc </definition> </entity> where abc is one of the following
</external-id> </choice> where xyz is one of the following
|
<!ENTITY % YN '"Yes"' > will be represented as <entity name="YN" type="parsed" > <definition> <entity-value value="'Yes'"/> </definition> </entity> |
DTD2XSD Mapper:
The mapper will have to provide mapping for Elements, Attributes, Entities and so on. See belowNote: the map example shown in table shows the complete map from DTD constructs to XML Schema, whether we
use XML representation (see above) or another DTD parser
- Elements
Syntax:
<!ELEMENT element-name category>
or
<!ELEMENT element-name (element-content)>
DTD | XML Schema |
---|---|
<!ELEMENT A (X,Y) > |
<element name="A"> |
<!ELEMENT A (X|Y) > |
<element name="A"> |
<!ELEMENT A (X|(Y,Z)) > |
<element name="A"> |
<!ELEMENT A (X?,Y+,Z*) > |
<element name="A"> |
<!ELEMENT A EMPTY > |
<element name="A">
<xs:complexType> <xs:attribute name="a" type="<simpletype>"/> </xs:complexType> ... </xs:element> |
<!ELEMENT A (#PCDATA) > |
This means A is an element with only character data, so the schema map |
<!ELEMENT A ANY > |
<element name="A"> |
<!ELEMENT A (X) > |
<element name="A"> |
- Attributes
<!ATTLIST element-name attribute-name attribute-type default-value>
DTD | XML Schema |
---|---|
<!ATTLIST A |
<element name="A"> |
<!ATTLIST A |
<element name="A"> |
<!ATTLIST A |
<element name="A"> |
<!ATTLIST A |
<element name="A"> |
<!ATTLIST A |
<element name="A"> |
<!ATTLIST A |
similar to CDATA (See above) |
<!ATTLIST A |
,, |
<!ATTLIST A |
,, |
<!ATTLIST A |
,, |
<!ATTLIST A |
,, |
Risk/Issues
1. Effort required to develop & unit test will take 3-4 man-weeks2. There may be some DTD constructs that may not be trivial to map to XML Schema.
3. In general using XSLT to translate large XML files would require huge memory, but in this case I am assuming DTD
typically would not be that big.
4. Stylesheets needs to be made modular so as to enable extensions
Resources
- XML (3rd recommendation) - http://www.w3.org/TR/REC-xml
- XML Schema - http://www.w3.org/XML/Schema.html
- DTD Spec: http://www.w3.org/TR/REC-xml