W3C working group meetings / Preston’s Maxim

[25 January 2008]

I’m finishing a week of working group meetings in Florida, in the usual state of fatigue.

The SML (Service Modeling Language) and XML Query working groups met Monday-Wednesday. SML followed its usual practice of driving its agenda from Bugzilla: review the issues for which the editors have produced proposals, and act on the proposals, then review the ones that we have discussed before but not gotten consensus suitable for sending them to the editors, then others. The working group spent a fair bit of time discussing issues I had raised or was being recalcitrant on, only to end up resolving them without making the suggested change. I felt a little guilty at taking so much of the group’s time, but no one exhibited any sign of resentment. In one or two cases I was in the end persuaded to change my position; in others it simply became clear that I wasn’t going to manage to persuade the group to do as I suggested. I have always found it easier to accept with (what I hope is) good grace a decision going against me, if I can feel that I’ve been given a fair chance to speak my piece and persuade the others in the group. The chairs of SML are good at ensuring that everyone gets a chance to make their case, but also adept (among their other virtues) at moving the group along at a pretty good clip.

(In some working groups I’ve been in, by contrast, some participants made it a habit not to argue the merits of the issue but instead to spend the the time available arguing over whether the group should be taking any time on the issue at all. This tactic reduces the time available for discussion, impairs the quality of the group’s deliberation, and thus reduces the chance that the group will reach consensus; it’s thus extremely useful for those who wish to defend the status quo but do not have, or are not willing to expose, technical arguments for their position. The fact that this practice reduces me to a state of incoherent rage is merely a side benefit.)

“Service Modeling Language” is an unfortunate name, I think: apart from the fact that the phrase doesn’t suggest any very clear or specific meaning to anyone hearing it for the first time, the meanings it does suggest have pretty much nothing whatever to do with the language. SML defines a set of facilities for cross-document validation, in particular validation of, and by means of, inter-document references. Referential integrity can be checked using XSD (aka XML Schema), but only within the confines of a single document; SML makes it possible to perform referential integrity checking over a collection of documents, with cross-document analogues of XSD’s key, keyref, and unique constraints and with further abilities, in particular being able to specify simply that inter-document references of a given kind must point to elements with a particular expanded name, or elements of with a given governing type definition, or that chains of references of a particular kind must be acyclic. In addition, the SML Interchange Format (SML-IF) specifies rules that make it easier to specify exactly what schema is to be use for validating a document using XSD and thus to get consistent validation results.

The XML Schema working group met Wednesday through Friday. Wednesday morning went to a joint session with the SML and XML Query working groups: Kumar Pandit gave a high-level overview of SML and there was discussion. Then in a joint session with XML Query, we discussed the issue of implementation-defined primitive types.

The rest of the time, the Schema WG worked on last-call issues against XML Schema. Since we had a rough priority sort of the issues, we were able just to sort the issues list and open them one after the other and ask “OK, what do we do about this one?”

Among the highlights visible from Bugzilla:

  • Assertions will be allowed on simple types, not just on complex types.
  • For negative wildcards, the keyword ##definedSibling will be available, so that schema authors can conveniently say “Allow anything except elements already included in this content model”; this is in addition to the already-present ##defined (“Allow anything except elements defined in this schema”). The Working Group was unable to achieve consensus on deep-sixing the latter; it has really surprising effects when new declarations are included in a schema and seems likely to produce mystifying problems in most usage scenarios, but some Working Group members are convinced it’s exactly what they or their users want.
  • The Working Group declined a proposal that some thought would have made it easier to support XHTML Modularization (in particular, the constraints on xhtml:form and xhtml:input); it would have made it possible for the validity of an element against a type to depend, in some cases, on where the element appears. Since some systems (e.g. XPath 2.0, XQuery 1.0, and XSLT 2.0) assume that type-validity is independent of context, the change would have had a high cost.
  • The sections of the Structures spec which contain validation rules and constraints on components (and the like) will be broken up into smaller chunks to try to make them easier to navigate.
  • The group hearkened to the words of Norm Walsh on the name of the spec (roughly paraphrasable as “XSDL? Not WXS? Not XSD? XSDL? What are you smoking?”); the name of the language will be XSD 1.1, not XSDL 1.1.

We made it through the entire stack of issues in the two and a half days; as Michael J. Preston (a prolific creator of machine-generated concordances known to a select few as “the wild man of Boulder”) once told me: it’s amazing how much you can get done if you just put your ass in a chair and do it.

Primitives and non-primitives in XSDL

John Cowan asks, in a comment on another post here, what possible rationale could have governed the decisions in XSDL 1.0 about which types to define as primitives and which to derive from other types.

I started to reply in a follow-up comment, but my reply grew too long for that genre, so I’m promoting it to a separate post.

The questions John asks are good ones. Unfortunately, I don’t have good answers. In all the puzzling cases he notes, my explanation of why XSDL is as it is begins with the words “for historical reasons …”.

Continue reading

Triage

The XML Schema Working Group is trying to move its focus to the Structures spec, whose Last Call comment period ended a couple months ago. (Work on the remaining Datatypes issues will continue in the background, but currently it is the editors, not the Working Group, who are the bottleneck on Datatypes. If the Datatypes editors manage to work effectively in the background, they may manage to use these next few weeks to produce wording proposals for the remaining issues.)

So I just finished reading through all 104 Last Call issues (so far) opened on the Structures spec, trying to impose some order on them.

Whenever I have a longish list of tasks, or of issues that need resolution, my fingers itch to classify them. Which ones are easy and will take just a few minutes of the editors’ time to fix? (Ha! in my experience, editors’ time doesn’t come in increments smaller than half an hour — once you factor in the time it takes to generate a new version of the spec and check to make sure you actually made the change correctly, even a simple fix to a broken paragraph can take forty-five minutes.) Which ones are hard and will take a lot of time and effort, either Working Group discussion time to reach consensus on the right thing to do, or editorial time to draft, or both? Which lie in the middle?

The process is a little like some of those classic AI programs discussed in textbooks (bagging groceries, for example): it’s hard to do perfectly, but even an imperfect classification can be much better than nothing. Of course, it’s even more like the process of deciding, after a catastrophe, which victims will benefit from medical attention, which victims are too far gone to expend resources trying to save, and which can get along without help. So I always think of the process of classifying issues as triage.

In scheduling issues for WG discussion, you want (in my experience) to put a few easy items on the agenda each week, so that every week the WG has the experience of nailing a few issues. But you can’t just sequence the list from easy to hard, because that leads to a dramatic slowdown as you get past the easy ones and move into the hard ones, which can have dreadful effects on morale. So in addition to the easy ones each week, you want to schedule some hard ones early on, so that the WG can spend the time it may take to understand them, develop solutions, argue about them, develop some better solutions, and eventually converge on a good solution, before you start getting into deadline pressure.

But every time I start to perform triage on a list by estimating how much time it will take, I am reminded that some items are important, and others are less important, and that importance doesn’t really correlate well with how long things will take. There is then a crisis while I contemplate some issue that is clearly important, but will take a while — or won’t take long but also won’t actually contribute much to making the spec better. And in the time-honored fashion, I compromise by trying to do both.

I almost always end up performing a sort of double triage, classifying things along two distinct axes:

  • cost: hard, easy, work (as in “this will take some work”)
  • importance: important, medium, or thimble (thimble? yes, thimble. I’ll explain in a minute)

Of course, the result is at best a rough and ready way of subdividing the problem space in order to control complexity and stave off despair. Different people will judge the importance of issues differently, likewise their cost, and even if everyone agreed on the importance or the likely cost of an issue, they could still be wrong. (For this reason, I encourage others in the WG to perform their own rough and ready classifications, but discourage the occasional attempt to discuss the correct categorization at length. By all means, tell me you think issue 666 is likely to be devilishly hard, so you think I’m wrong to class it easy. I may change it, then. But for heaven’s sake, let’s not waste time arguing about a back of the envelope scheduling estimate!)

I use the term thimble for items that aren’t either important or medium important, mostly because I find I can’t bear to call any problem in a spec “unimportant”. No issue raised by a reader is unimportant. And yet, some are more important than others. And if some are more important, then it seems to follow with logical necessity that some are (sigh) less important than others.

The image comes from a discussion of Dante’s Paradiso during my student days. Some students found it hard to come to terms with the fact that there are hierarchies even in Dante’s paradise: some of the blessed are in the inner circles, close to God, and others are, well, they are in the outer circles. This offended some readers’ egalitarianism (are they less virtuous than the other souls? less good? less deserving? Just where does God thinks he gets off, banishing some virtuous souls to the outer circles?!), and so we discussed it for a while. Eventually, someone said that when they had discussed this kind of thing in school, the nun teaching the class had finally taken up a thimble and a water tumbler and filled them with water. Each, she said, demonstrating, was full, as full as it could get. One, to be sure, held more water, but saying that the tumbler held more water did not entail saying that the thimble was not full. In a similar way, we can believe that all the souls in heaven are as good as they can be, while still recognizing that their capacity for goodness may vary.

A comment correctly pointing out that a particular sentence is malformed or confusing can never be unimportant. it is as important as it can be, even if the sentence in question describes a corner case seldom encountered and thus of comparatively little practical import, compared to (say) a problem in the definition of a concept that is constantly appealed to.

I call this process ‘triage“, but of course it works not with three classes but nine, in a three by three table. In a perfect world, of course, you’ll resolve all problems and the categorization is used only for scheduling. If, however, in this world of woe you or your Working Group sometimes miss that level of perfection, then it can matter which issues you address and which you end up closing with a laconic WONTFIX message. If you haven’t botched the categorization, you will get the best bang for the buck if the important, easy items have gotten done, and if you haven’t poured precious resources into unsuccessful attempts to resolve the items classed thimble, hard. Me, I figure you want to start at the top left of the table (important, easy) and move more or less systematically to the bottom right.

I still haven’t figured out how to decide between spending time on important, hard items or on medium, work items. The first are more important (d’oh!), but you’re likely to get fewer of them actually done. So it’s a judgement call, at best.

Unfortunately, I’m never happy even with a three by three classification of issues.

To understand any issue, you need to understand what the commenter claims the spec is doing, what is wrong with that, and how to fix it. But that doesn’t suffice to resolve it: before you touch the spec, you must decide whether you think the commenter is right. What did the working group intend to do here? And why? What’s the underlying design story? What does the current text actually do? Is the problem reported a symptom of some larger complex of problems? What are the options for changing things? What are the relevant design principles? Which invariants should the spec be seeking to achieve or preserve? If the issue is at all complex, it can take a while to get up to speed on the relevant information. So if there are several issues that deal with related topics, you really, really want to deal with them all at once, or in succession. (Few things sap morale more effectively than the discovery that in dealing with four related issues, at four separate times, the WG made four decisions which effectively contradict each other because it failed to spin up successfully or remember what it did earlier.)

So I almost always find that I also want a sort of topical clustering of issues. “If we’re going to deal with issue 2218, then we might as well deal with 5078 at the same time. And then the answers to 5163 will just fall out from the earlier decisions.” Perfect clustering involves perfect knowledge and perfect classification, so it doesn’t happen. And I often change my mind about what real issue a given problem ticket involves. So my attempts at topic clustering are even less stable and reproducible than my cost and value estimates. But failure to cluster is like non-locality of reference in a program: it leads to thrashing.

The XML Schema Working Group maintains its issue list in public, so for what it’s worth the current triage of Structures issues is visible. You have to make the Whiteboard column visible, and sort on it, to see the triage clearly.

Several questions arise. Other people presumably face this problem, just as I do. But I don’t hear them moaning and groaning about it the way I do. Have they found better, less painful ways of managing this process? Or are they just more stoic?

And why oh why do so few issue tracking systems make it convenient to add yet another dimension on which to classify issues? Bugzilla at least provides the Whiteboard field, where you can do pretty much anything you like, and then sort. But there isn’t a convenient way to say “sort first by cost and then by importance” or “sort first by importance and then by cluster and finally by cost”, etc. What would it take to make it better?

Sandboxes and test cases

Playing in sandboxes can be a lot of fun. I just spent a little while playing in a software sandbox I set up last fall, and can only wish I had gotten around to setting it up a lot earlier.

Last October, for reasons I need not go into, I conceived a strong desire to generate a large-ish number of test cases for particular areas of XML Schema, for use in thinking about the current state of implementation of XSDL 1.0, and in thinking about what ought to happen in XSDL 1.1. So I spent some time wrapping my head around the relatively elaborate test suite framework the XML Schema Working Group adopted some years ago for our test suite. (I had looked at it when we adopted it, of course, but you look at a vocabulary in a different way when you are going to be generating data using it.) Eventually, my plan was (and still is) to generate test cases automatically from Alloy models, but as a first step I mocked some test cases up by hand.

But to make sure the test cases were actually testing what I wanted to test, I needed to run them, systematically, on different processors. So I spent an afternoon or two (or three) installing every schema processor I could conveniently get my hands on and persuade to run under MaC OS X, or under Windows XP (since I started using this Mac I have not had any machines running Linux [I say it with a bit of a sigh]), and writing some scripts to wrap around them so I don’t have to remember what order they want their command-line arguments in, or what options they want.

It’s always interesting to construct test cases to illustrate some question or other that arises, whether from a comment on the spec or an inquiry by email. And I have directories with scores of small test cases I have constructed over the last few years. But until I started this more systematic construction of a schema validation sandbox, I contented myself with checking those test cases with one or two processors.

But it turns out that having more processors to test is just a lot more fun. Here is a test case. OK, what does libxml say about it? MSV? Saxon? Xerces C? Xerces J? (And if I’m really energetic, move to the other machine and check MSXML 4, MSXML 6, and xsv. I ought to get a current version of XML Spy installed, too, but I haven’t been that energetic yet.) For substantially the same effort, to get five times as much information, just makes the whole thing more fun. (And the more fun we can make it to construct and run test cases, the better.)

Today’s effort was an attempt to answer a question raised by Xan Gregg in a comment on an open issue against XSDL 1.1 Structures. Two schema documents (one pretty much as Xan provided it, one modified to correct a possible oversight), and five instances, provide a simple test of how implementations have interpreted a rule in XSDL which Xan and I turn out to have read differently. (The test cases, and a catalog for this tiny collection of tests, are all on the W3C server at http://www.w3.org/XML/2008/xsdl-exx/. I plan to put every schema example I generate this year there, instead of hiding it on my hard disk. At least the interesting ones.) The process of making them and testing them was delayed for a bit of yak-shaving (quick, how do you embed selected XHTML modules into your DTD, in a way that you are willing to let other people see in public?), but I got them made eventually, and demonstrated to my own satisfaction that virtually all the implementors have agreed on the meaning of this bit of the spec. (Fortunately for me, they all read it the way I read it. But Xan is right that it could be taken in a different way; the wording should be changed to make it clearer.)

Maintaining a software sandbox with installed copies of all the software one wants to play with can be time consuming. And since in the usual case, you aren’t familiar with the software yet, you may not be able to make a strong case for making it an urgent task. Uncertain cost, uncertain benefit, low priority. Other things always seem more urgent. But having such a sandbox, and playing in it from time to time, are important tasks, even if not often urgent. It’s nice when you get a chance to do it.

Happy New Year.

Spolsky (and Usdin and Piez) on specs

Joel Spolsky (of Joel on Software) has put up a talk he gave at the Yale Computer Science department a few weeks ago. (Actually, he put it up a few weeks ago, too, shortly after giving the talk. I’m just slow seeing it.)

In it, he has an interesting riff on specifications and their discontents, which feels relevant to the perennial topics of improving the quality of W3C (and other) specs, and of the possible uses of formalization in that endeavor.

If the spec defines precisely what a program will do, with enough detail that it can be used to generate the program itself, this just begs the question: how do you write the spec? Such a complete spec is just as hard to write as the underlying computer program, because just as many details have to be answered by spec writer as the programmer.

This is not the whole story, by any means, if only because specs can and often do explicitly refrain from specifying everything a conforming implementation is to do. But it raises an important point, which is that the more precise one tries to make a spec, the easier it can be for contradictions or other problems to creep into it. (In my experience, this is particularly likely to wreak havoc with later attempts to correct errata.)

In their XML 2007 talk on “Separating mapping from coding in transformation tasks”, Tommie Usdin and Wendell Piez talk about the utility of separating the specification of an XML-to-XML transform (“mapping”) from its implementation (“coding”), and provide a lapidary argument against one common way of trying to make a specification more precise: “Code-like prose is hard to read.” (Has there ever been a more concise diagnosis of many reader’s problems with the XML Schema spec? I am torn between the pleasure of insight and the feeling that my knuckles have just been rapped, really hard. [Deep breath.] Thank you, ma’am, may I have another?)

How do we make specs precise and complete without making them as hard to write, and as hard to read, and as likely to contain insidious bugs, as the source code for a reference implementation?