Content restriction is a wide area, and the general case may be difficult to formalize in a usable way, so before focussing on a couple of restrictions that will be easy to formalise, we'll see how the tool can act as a simple checker.
Content models are mostly defined using regexp-like syntaxes, so we can use the finite-state machines (FSM) theory and tools. A content-model is a (context-free ? should reread the theory...) grammar that can be modelled by a FSM.
Thus we basically need to check that all words accepted by the FSM for the new content-model are accepted by the FSM for the original one. A naive (and proof-of-concept) algorithm would be to compute the FSM for the intersection of those grammars, and check that its canonical form is the same as the supposed-restriction's. I hope we can find a one-pass algorithm that would save computing time and be more elegant, but well, the priority is to get something to work.
Note: the syntax used for examples is only here as a design helper. A formal syntax will be defined later, possibly as XML data.
An apparently quite simple type of content restriction is probably the removal an occurence of an element in a content model. Even that is somewhat difficult to express if we want it to survive to future revisions of the parent DTD, the main problem being to address a single occurence of an element type (or of a parameter entity) in a content model (including inside parameter entities), when several such occurences can be found.
Simpler to specify is the removal of all occurences of an element (including pseudo-elements like #PCDATA) from a content model:
Example 1. Removing all occurences of an element in a content model
REMOVE WHICH: all (ELEMENTS|ENTITIES): regexp FROM ELEMENTS: regexp <Elements regexp="..."> <Remove occurences="all"> <ElementsRef regexp="..." /> <EntitiesRef regexp="..." /> </> </> |
Even simpler to specify is the pure destruction of an element or entity, and hence its removal from all content models and entities. This is not just a specific case of the former, because it requires to undefine the element or entity.
Example 2. Zapping an element
ZAP (ELEMENTS|ENTITIES): regexp <DocClass> <Zap> <ElementRef name="..." /> <ElementsRef regexp="..." /> </> </> |
However, the very existence of content exceptions make it difficult to handle changes to parameter entities in a secure way, as those entities can then be used both with additive and substractive semantics. Their usage in the base DTD (and in the current customization layer) should be checked to be sure of the semantics of the following clause, which would be a pure extension when applied to an entity used for a content exclusion:
Example 3.
REMOVE WHICH: all (ELEMENTS|ENTITIES): regexp FROM ENTITIES: regexp <Entities regexp="..."> <Remove occurences="all"> <ElementsRef regexp="..." /> <EntitiesRef regexp="..." /> </> </> |
For element definitions where content exceptions are not parametrized, or whose parametrization should be broken (more on this later), explicit manipulation may be necessary:
Example 4.
ADD EXCLUSION: REMOVE INCLUSION: <Elements regexp="..."> <Remove> <Exclusion name="..."> <Inclusion name="..."> </> </> |
Often an entity is used in the definition of several content models, and we only want to restrict some of those contents. Then the parametrization has to rewritten using a new entity with a "smaller content". These entities can be kept linked (the old one being an formally extension of the new one), or not - such decision will impact further customisation layers.
Example 5.
DESYNC ENTITY: name IN (ELEMENTS|ENTITIES): regexp \ TO: RESTRICTION NAMED: name LINKED: (YES|NO) |
To which a restriction can then be applied:
REMOVE WHICH: all (ELEMENTS|ENTITIES): regexp FROM ENTITIES: regexp |
Note that such a two-step mechanism may have an impact on other design issues. But maybe not, as the following syntax attempts to demonstrate:
<Elements regexp="..."> <DeSyncEntity name="..." newname="..." linked="linked"> <Remove occurences="all"> <ElementsRef regexp="..." /> </> </> </> <Entities regexp="..."> <DeSyncEntity name="..." newname="..." linked="notlinked"> <Rewrite> ... </> </> </> |
Note: these customizations depend on the base DTD using parameter entities for attributes definition.
This is not unlike element restriction, but much simpler, as attributes are not referenced in such complex places like content models, and can occur only once within one element definition.
Example 6. remove an #implied or defaulted attribute
REMOVE: attribute FROM: element-regexp |
Example 7. make required an #implied or defaulted attribute
REQUIRE: attribute IN: element-regexp |
OTOH, attributes also have some sort of "content models" which, even if simpler that an element's content model, can also be subject to customization.
Example 8. allow less tokens
REMOVE: token FROM: attribute-regexp IN: element-regexp |
Example 9. change CDATA to tokens
This may require some additional care, and SGML attribute minimization should surely be turned off in this case, if we want a doc using the customization layer to be parsable as an instance of the base DTD.
TOKENIZE: attribute-regexp FROM: element-regexp AS: token-model |
Well, just add its definition, and make sure it conforms with the modularity standards of the base DTD. It will still need to be used somewhere, though.
As well as restrictions, extentions of a content model may be quite complex to specify, and "REDEFINE" clauses may be the best way to go in many cases. Still, if a DTD is properly parametrized, it may be that the place you want to add an element/entity to is itself within what I'll name a "consistent entity", that is a part of a content model that is either a pure sequence of |'d elements or a pure sequence of &'d elements. Entities to be included in such a way, and those on the same level in the target consistent entity, should be safe-checked to be sure they don't break the consistency of the target entity.
Example 10.
ADD CONSISTENT: (element|entity) TO: (element|entity) |
Appending or prepending an element to a content model is also easy:
Example 11.
ADD (APPEND|PREPEND): element TO: element |
Example 12.
REMOVE EXCLUSION: ADD INCLUSION: |
Sometimes arbitrary changes to a content model must be done, that do not fall under "pure restriction" or "pure extension" or other categories, or that are too hard to describe as such, and the content model must be completely rewritten.
Example 13.
REDEFINE ELEMENT: element AS: content-model REDEFINE ENTITY: entity AS: cdata |
If element names are parametrized, this is trvial. Otherwise it involves removing the original element, creating a new one with the same content model, and changing the definition of all elements and entities that referenced the original.
Example 14.
RENAME ELEMENT: element TO: new-name |
This is a replacement of some of the occurences of an element by a modification of the original. It is not unlike element renaming, but somewhat more complex to express.