Now THAT’s a birthday present!

To mark the tenth anniversary of the XML Recommendation, Tim Bray has resurrected an account he wrote ten years ago of various people involved in the pre-history and creation of XML.

Well worth reading, whether you were there and are looking for an excuse to spend half an hour on nostalgia, or you weren’t there and wonder what it was like. Of course, there is no single “what it was like”: it was like different things from different vantage points. My memories of the initial development of XML are a lot longer on technical discussions and a lot shorter on memorable dinners with movers and shakers.
Eve Maler has marked the tenth anniversary of the XML spec by posting an online copy of the book she and Jeanne El Andaloussi wrote on vocabulary development: Developing SGML DTDs: From Text to Model to Markup.

There is a huge body of knowledge, craft, and/or art about document analysis, vocabulary design, and the use of markup in systems that went into the design of SGML and XML. (Some call it “the SGML methodology” as opposed to “SGML” or “the spec”.) Almost all of it circulates largly in oral tradition; Maler/El Andaloussi was for a long time the only, and is still one of the best, attempts to write it down.

Thank you for the birthday present, Eve!

Posted in XML

Tim Bray on XML People

To mark the tenth anniversary of the XML Recommendation, Tim Bray has resurrected an account he wrote ten years ago of various people involved in the pre-history and creation of XML.

Well worth reading, whether you were there and are looking for an excuse to spend half an hour on nostalgia, or you weren’t there and wonder what it was like. Of course, there is no single “what it was like”: it was like different things from different vantage points. My memories of the initial development of XML are a lot longer on technical discussions and a lot shorter on memorable dinners with movers and shakers.

Another plug for XML Catalogs (and caching)

The W3C systems group posted a blog entry the other day about the caching of DTDs and schemas. The failure of some XML software to use caches wisely is causing unbelievable amounts of traffic on the W3C site: in some cases, the same IP address is requesting the same DTD file hundreds and thousands of times in the space of a few hours.

The blog has good pointers to resources about using HTTP caching well, and about XML Catalogs.

I’ve said it before, and I’ll say it again: every piece of software that works with XML ought to use XML Catalogs. By all means allow the user to turn it off, but support it, and turn it on by default. The main reason is: it makes the life of your users easier. And the kind of problem discussed by the systeam blog post is one more reason.

W3C working group meetings / Preston’s Maxim

[25 January 2008]

I’m finishing a week of working group meetings in Florida, in the usual state of fatigue.

The SML (Service Modeling Language) and XML Query working groups met Monday-Wednesday. SML followed its usual practice of driving its agenda from Bugzilla: review the issues for which the editors have produced proposals, and act on the proposals, then review the ones that we have discussed before but not gotten consensus suitable for sending them to the editors, then others. The working group spent a fair bit of time discussing issues I had raised or was being recalcitrant on, only to end up resolving them without making the suggested change. I felt a little guilty at taking so much of the group’s time, but no one exhibited any sign of resentment. In one or two cases I was in the end persuaded to change my position; in others it simply became clear that I wasn’t going to manage to persuade the group to do as I suggested. I have always found it easier to accept with (what I hope is) good grace a decision going against me, if I can feel that I’ve been given a fair chance to speak my piece and persuade the others in the group. The chairs of SML are good at ensuring that everyone gets a chance to make their case, but also adept (among their other virtues) at moving the group along at a pretty good clip.

(In some working groups I’ve been in, by contrast, some participants made it a habit not to argue the merits of the issue but instead to spend the the time available arguing over whether the group should be taking any time on the issue at all. This tactic reduces the time available for discussion, impairs the quality of the group’s deliberation, and thus reduces the chance that the group will reach consensus; it’s thus extremely useful for those who wish to defend the status quo but do not have, or are not willing to expose, technical arguments for their position. The fact that this practice reduces me to a state of incoherent rage is merely a side benefit.)

“Service Modeling Language” is an unfortunate name, I think: apart from the fact that the phrase doesn’t suggest any very clear or specific meaning to anyone hearing it for the first time, the meanings it does suggest have pretty much nothing whatever to do with the language. SML defines a set of facilities for cross-document validation, in particular validation of, and by means of, inter-document references. Referential integrity can be checked using XSD (aka XML Schema), but only within the confines of a single document; SML makes it possible to perform referential integrity checking over a collection of documents, with cross-document analogues of XSD’s key, keyref, and unique constraints and with further abilities, in particular being able to specify simply that inter-document references of a given kind must point to elements with a particular expanded name, or elements of with a given governing type definition, or that chains of references of a particular kind must be acyclic. In addition, the SML Interchange Format (SML-IF) specifies rules that make it easier to specify exactly what schema is to be use for validating a document using XSD and thus to get consistent validation results.

The XML Schema working group met Wednesday through Friday. Wednesday morning went to a joint session with the SML and XML Query working groups: Kumar Pandit gave a high-level overview of SML and there was discussion. Then in a joint session with XML Query, we discussed the issue of implementation-defined primitive types.

The rest of the time, the Schema WG worked on last-call issues against XML Schema. Since we had a rough priority sort of the issues, we were able just to sort the issues list and open them one after the other and ask “OK, what do we do about this one?”

Among the highlights visible from Bugzilla:

  • Assertions will be allowed on simple types, not just on complex types.
  • For negative wildcards, the keyword ##definedSibling will be available, so that schema authors can conveniently say “Allow anything except elements already included in this content model”; this is in addition to the already-present ##defined (“Allow anything except elements defined in this schema”). The Working Group was unable to achieve consensus on deep-sixing the latter; it has really surprising effects when new declarations are included in a schema and seems likely to produce mystifying problems in most usage scenarios, but some Working Group members are convinced it’s exactly what they or their users want.
  • The Working Group declined a proposal that some thought would have made it easier to support XHTML Modularization (in particular, the constraints on xhtml:form and xhtml:input); it would have made it possible for the validity of an element against a type to depend, in some cases, on where the element appears. Since some systems (e.g. XPath 2.0, XQuery 1.0, and XSLT 2.0) assume that type-validity is independent of context, the change would have had a high cost.
  • The sections of the Structures spec which contain validation rules and constraints on components (and the like) will be broken up into smaller chunks to try to make them easier to navigate.
  • The group hearkened to the words of Norm Walsh on the name of the spec (roughly paraphrasable as “XSDL? Not WXS? Not XSD? XSDL? What are you smoking?”); the name of the language will be XSD 1.1, not XSDL 1.1.

We made it through the entire stack of issues in the two and a half days; as Michael J. Preston (a prolific creator of machine-generated concordances known to a select few as “the wild man of Boulder”) once told me: it’s amazing how much you can get done if you just put your ass in a chair and do it.

Primitives and non-primitives in XSDL

John Cowan asks, in a comment on another post here, what possible rationale could have governed the decisions in XSDL 1.0 about which types to define as primitives and which to derive from other types.

I started to reply in a follow-up comment, but my reply grew too long for that genre, so I’m promoting it to a separate post.

The questions John asks are good ones. Unfortunately, I don’t have good answers. In all the puzzling cases he notes, my explanation of why XSDL is as it is begins with the words “for historical reasons …”.

Continue reading