Table of Content: - General overview
- The definition
- Simple rules
- How to reference a DTD from a document
- Declaring elements
- Declaring attributes
- Some examples
- How to validate
- Other resources
Well what is validation and what is a DTD ? DTD is the acronym for Document Type Definition. This is a description
ofthe content for a family of XML files. This is part of the XML
1.0specification, and allows one to describe and verify that a given
documentinstance conforms to the set of rules detailing its structure and
content. Validation is the process of checking a document against a DTD
(moregenerally against a set of construction rules). The validation process and building DTDs are the two most difficult
partsof the XML life cycle. Briefly a DTD defines all the possible elements
to befound within your document, what is the formal shape of your document
tree(by defining the allowed content of an element; either text, a
regularexpression for the allowed list of children, or mixed content i.e.
both textand children). The DTD also defines the valid attributes for all
elements andthe types of those attributes. The W3C XML Recommendation(Tim Bray's annotated version
ofRev1): (unfortunately) all this is inherited from the SGML world, the syntax
isancient... Writing DTDs can be done in many ways. The rules to build them if you
needsomething permanent or something which can evolve over time can be
radicallydifferent. Really complex DTDs like DocBook ones are flexible but
quiteharder to design. I will just focus on DTDs for a formats with a fixed
simplestructure. It is just a set of basic rules, and definitely not
exhaustive norusable for complex DTD design. Assuming the top element of the document is spec and the dtdis
placed in the file mydtd in the subdirectorydtds of
the directory from where the document were loaded: <!DOCTYPE spec SYSTEM "dtds/mydtd">
Notes: - The system string is actually an URI-Reference (as defined in RFC 2396) so you can use
afull URL string indicating the location of your DTD on the Web. This is
areally good thing to do if you want others to validate your
document.
- It is also possible to associate a
PUBLIC identifier
(amagic string) so that the DTD is looked up in catalogs on the client
sidewithout having to locate it on the web.
- A DTD contains a set of element and attribute declarations, but
theydon't define what the root of the document should be. This is
explicitlytold to the parser/validator as the first element of
the
DOCTYPE declaration.
The following declares an element spec : <!ELEMENT spec (front, body, back?)>
It also expresses that the spec element contains one
front ,one body and one optional
back children elements inthis order. The declaration of one
element of the structure and its contentare done in a single declaration.
Similarly the following declaresdiv1 elements: <!ELEMENT div1 (head, (p | list | note)*, div2?)>
which means div1 contains one head then a series of
optionalp , list s and note s and then
anoptional div2 . And last but not least an element can
containtext: <!ELEMENT b (#PCDATA)>
b contains text or being of mixed content (text and elementsin
no particular order):
<!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
p can contain text or a ,
ul ,b , i or em elements in
no particularorder.
Again the attributes declaration includes their content definition: <!ATTLIST termdef name CDATA #IMPLIED>
means that the element termdef can have a
name attribute containing text (CDATA ) and which is
optional(#IMPLIED ). The attribute value can also be defined
within aset: <!ATTLIST list type
(bullets|ordered|glossary)"ordered">
means list element have a type attribute with
3allowed values "bullets", "ordered" or "glossary" and which default
to"ordered" if the attribute is not explicitly specified. The content type of an attribute can be text
(CDATA ),anchor/reference/references(ID /IDREF /IDREFS ),
entity(ies)(ENTITY /ENTITIES ) or
name(s)(NMTOKEN /NMTOKENS ). The following defines
that achapter element can have an optional
id attributeof type ID , usable for reference from
attribute of typeIDREF: <!ATTLIST chapter id ID #IMPLIED>
The last value of an attribute definition can be
#REQUIRED meaning that the attribute has to be given,
#IMPLIED meaning that it is optional, or the default value
(possibly prefixed by#FIXED if it is the only allowed). Notes: The directory test/valid/dtds/ in the libxml2
distributioncontains some complex DTD examples. The example in the
filetest/valid/dia.xml shows an XML file where the simple DTD
isdirectly included within the document. The simplest way is to use the xmllint program included with libxml.
The--valid option turns-on validation of the files given as
input.For example the following validates a copy of the first revision of the
XML1.0 specification: xmllint --valid --noout test/valid/REC-xml-19980210.xml
the -- noout is used to disable output of the resulting tree. The --dtdvalid dtd allows validation of the document(s)against
a given DTD. Libxml2 exports an API to handle DTDs and validation, check the associateddescription. DTDs are as old as SGML. So there may be a number of examples on-line,
Iwill just list one for now, others pointers welcome: I suggest looking at the examples found under test/valid/dtd and any ofthe
large number of books available on XML. The dia example in test/validshould
be both simple and complete enough to allow you to build your own. Daniel Veillard |