XML
Dependency Resolution “Deprez” Feature
Authors:
Todd
Fast
Chris
Webster
Girish
Goals
A
Netbeans project can be considered a shared intermittently connected
environment. Having to supply XML resources (schema and WSDL) that
are not available is a major burden on project collaborators. This
specification seeks to preserve the project model for WSDL and schema
artifact dependencies. The goal is to ensure resources are available
locally and sharable, but can also be easily refreshed as needed, and
in a predictable fashion that is consistent with the project
development.
Provide
a standard way to retrieve XML resources from a variety of sources.
The retrieval will perform the transitive closure of resources
potentially requiring user intervention. The retrieval process will
understand the semantics of the retrieved resources and will
generate appropriate binding elements if necessary (generate enough
information to resolve dependencies).
Provide
the ability to interact at development time in standard JAXP and
Schema validation processes.
Provide
an interface to allow the referenced resources to be queried. The
queries will provide the ability to determine all the mapped
resources, all mapped resources which are broken, and the ability to
get the closure of all the resolvers reachable via references. A
user interface will also be provided to allow the dependency
resolvers entries to be changed.
Promote
local references to better integrate with the project system.
Support
the above capabilities without internal access to the containing
project. This allows the resolution capability to support any
project type.
Non
Goals
Introspection
of the referenced artifacts themselves. The resolver is not
attempting to provide any indexing capability similar to what is
currently done in the Java metamodel via MDR or the indexing
capability provided by the JES registry. The dependency resolver can
be used to retrieve domain models (such as schema and wsdl) which
have strong query capabilities. Thus the path to indexing will be
using the domain models themselves.
Provide
a standardized view of references. The infrastructure will allow
references to be resolved (on a per artifact basis), but the
appropriate view at the project level (if any) is left to the
project type.
Provide
a single source of all the references in the project. The dependency
resolution is specified on a per artifact basis, this eliminate the
possibility of naming collisions (references from different
artifacts using the same uri becomes problematic in a global
repository).
Cross
project references of artifacts will not be supported initially due
to the complexity of the design. May be revisited at later point of
time. For now, artifacts that belong to another project will have to
be imported inside the current project to work.
Background
XML
instance documents and well known definition documents such as XML
schema and WSDL require the ability to perform late binding, that is
to resolve a reference within another document. There are several
variations of resolution depending on the usage context. For example,
the instance document schema provides a global attribute
schemaLocation which provides a way to specify the location for a
given namespace. The schemaLocation is used during a validation to
retrieve the appropriate schema. XML schema and WSDL both offer
similar features when using other namespaces. WSDL provides the
ability to import other WSDL documents (Basic Profile requires that
only WSDL documents be imported using this mechanism) and XML schema
also provides several ways to import documents (include, import,
redefine).
The
semantics for using referenced artifacts differs depending on usage
but what remains constant is the need to define an additional
indirection to the physical file. The location reference is specified
as a URI and thus includes both relative locations as well as URL
references. Intermittently connected development environments present
a challenge to URL referencing, a typical solution is to cache the
resource referenced in the URL. Caching provides a way to improve
performance (reduce the network access, consider something like the
OTA schema which includes over 100 files (fetching a schema like this
during each validation would at least introduce a potential
performance issue and not function as expected when a developer is
not connected to the network). A relative reference may also present
a problem in the development environment as the deployed location may
differ from the VCS layout. For example, a development environment
may have multiple projects which together comprise a deployment. The
runtime location may thus differ from the development time location
and affect the relative location reference.
Proposal
Infrastructure
(Locating and Querying resources)
The
deprez infrastructure will provide a factory for locating resources
on a per artifact basis. The interface will provide a way to retrieve
a resource either as a stream or as a ModelSource. A model source is
used as input to the model factories. In addition to providing access
to a model, the infrastructure will provide the ability to determine
all the referenced URI's, the broken URI's, add and remove URI's,
property change events, and the closure of all other dependency
resolvers which are reachable from the initial resolver. The
interface will also implement the necessary interfaces to interact
with both JAXP as well as LSResolver for interaction with schema
validation. The method for adding entries to deprez will seamless
integrate with the project system.
Resource
Retrieval
Create
a wizard which would allow documents to be retrieved from (local
disks, URL, and ebXML / UDDI repositories). The wizard would retrieve
the files from the specified location and copy the files to the
specified project location. The wizard would need to have extension
points to allow the closure to be determined as the closure will be
slightly different for wsdl and schema. The wizard would retrieve the
original file, then load the model and determine the resources which
need to be retrieved. If the resource is relative, the resource will
be retrieved using the base location of the original file. The wizard
may also require user intervention after the initial invocation (if
there are issues when resolving the transitive closure of referenced
resources).
Illustration
1 below shows the entry point into the resource retrieval process.
This wizard will collect the information necessary to retrieve the
resource, introspect the resource, and transitively retrieve any
additional resources necessary. During the retrieval process,
additional user prompting may also be necessary. As the prompting may
happen recursively and the extent of the prompting will not be known
ahead of time, the ability to interact after the initial collection
may be required. This could be something similar to what is done for
the refactoring preview, which is a special window (docked into the
output area). The window would be similar to the mozilla download
manager (except that the nodes in the tree could be edited to supply
additional information). The UI illustration is only representative
of what could be done and not the final design (nor is the
interaction described above). The wizard could be launched from the
New File wizard but would be more useful to invoke this wizard in
areas where new references are created. For example, when creating a
new element a type can be referenced if the type is not yet available
from the schema a new resource may need to be retrieved. Invoked in
this manner, the deprez information could be generated automatically
(where the new file wizard would require a separate step). If a
resource is located in another project, the appropriate project
references would be created (There are currently at least two
different ways [although ant project types are more common] to
specify project references and no generic api's, so this may require
support from the underlying project perhaps by exposing something in
the project lookup). Finally, this wizard can store the original
references of the resources to provide the ability to easily refresh
the resources.

Illustration
1: Resource Collection Panel
Resource
Mapping
In
addition to retrieving resources, the ability to interact with deprez
through the interface is necessary. This is done to either resolve a
broken reference, this is one entry point for wizard above, or to
edit the set of resources. References can be become broken due to
changes in the set of projects (for inter project references), the
contents of a project, or changes to the resource itself (changes to
import and similar elements). A sample screen shot is shown below.
This describes the referenced URI's, the current mapping (if any) and
the ability to display only references which are broken as well as
purge deprez of unused entries. The ability to launch the resource
wizard would be useful here as well. This expected behavior would be
to resolve the broken entries either by pointing to existing files or
retrieving new files.

Illustration
2: Resource Mapping
Files
retrieved via an absolute URI and its closures are considered as
read-only resources and will be stored and versioned in a common
directory (<project-root>/external-refs/{schema}/{wsdl}). A
public catalog file (@ <project-root>/external-refs/public.xml)
will have entries for all the artifacts pulled from web. This catalog
will be chained to all the peer catalog files so that the lookup in a
peer catalog will result in a look up in public catalog automatic.
User will not be given option to store these files to any other
directory (to enforce read-only'ness). If user wants to edit these
files, he/she has to effectively copy it over to some other folder
(in project) and then edit. The closure references of such copied
file can be handled thru peer catalog file.
Local
resources (from hard disk or from other project) can be retrieved
transitively using the same wizard. In this case however, user will
be allowed to choose a directory where he wants to store the files.
Also, these files will not be marked as read-only.
Transitive
closure view & fix broken references.
Create
a wizard/view for each Schema/WSDL that shows transitive closure
information to the user. This interface will look as depicted in
Illustration 3.

Illustration
3: Self explanatory wizard that not only shows transitive closure of
a Schema (or WSDL) but also lets user resolve if references are
broken.
Implementation
Details
Catalog
in JAXP and schema Validation
The
JAXP API recognizes that URI may require additional mapping and
provides the ability to specify either an EntityResolver in the case
of DOM or the resolveEntity method for SAX. This capability provides
the ability to programmatically resolve references to other
resources. This provides the ability to provide an arbitrary
algorithm for resolving references; however, this power also requires
additional code to be written and does not provide a generic
capability to resolve references. The Oasis catalog specification
defines an XML format for providing a mapping from the target URI
(specified in the source) to the source URI (typically a resource on
the users file system). The XML Commons Catalog Resolver provides a
resolver which can be used for standard JAXP resolution as well as
the JDK 1.5 schema validation (the resolution interface is slightly
different here as additional information is provided (both system and
public information), this resolver implements both
org.xml.sax.EntityResolver
and the
org.w3c.dom.ls.LSResourceResolver.
Using this resolver provides an easy way to externalize entity
resolution. As a side note, the XmlValidate ant task provides a way
to specify an inline catalog that references multiple external
catalogs via catalogpath element.
The
catalog specification
(http://www.oasis-open.org/committees/download.php/14041/xml-catalogs.html#s.ext.ent)
also provides a processing instruction to speccify a catalog file
which may be useful when interpreting the document (i.e.
<?oasis-xml-catalog catalog="http://example.com/catalog.xml"?>
specifies a catalog which could be used for this document). At this
time, I am not sure if this processing instruction is used but this
may become more interesting in the future.