Participant observation / moving house

[12 December 2008]

Some ill defined thoughts are occupying my musings.

Some time ago, my colleague Liam Quin decided to include advertising on his site http://www.fromoldbooks.org/, which makes available high-quality scans of public-domain images he finds in … well, in old books. When we have discussed it, he has occasionally observed that one of his goals in doing so is to understand Web technology and Web usage from a slightly different vantage point. I understand him to mean that it is one thing to have a deep factual knowledge about the specifications which undergird and constitute the web, but a different thing to experience them in the process of running a web site. By accepting ads, and experimenting with different advertising programs, and watching his search engine rankings, Liam says, he has learned a good deal.

In a way, it sounds a bit like what one reads about participant observers in introductory anthropology courses. Some kinds of knowledge are more accessible from the inside than from the outside.

A second observation has concerned me for some time. The Semantic Web proposes to use URIs to denote things we want to talk about, and this has the nice side effect that proposals to mint a new term for something are safe from name collisions while still not needing to go through any central registration authority. All of my colleagues at W3C, from Tim Berners-Lee on down, recommend the use of HTTP URIs for such purposes. But new HTTP URIs can be minted, in practice, only by people who own domain names, or who have arrangements with people who own domain names. (It’s a bit like freedom of the press, which guarantees the right of uncensored publication to those who own a press. Fortunately, the Web makes owning a virtual press fairly simple, but it does tend to involve, again, owning a domain name.)

These lines of thought, together with some other considerations that need not concern us here at the moment, have led me to think it’s really high time I moved into the domain-owning classes.

So: We’re moving, or rather, we’ve moved. Messages in a Bottle is now hosted at http://cmsmcq.com/mib instead of the old address on people.w3.org.

I believe all existing references to posts and comments in the old location should be successfully redirected to the same posts and the same comments in the new location; this was a bit harder than it really ought to have been (details in a later post). If any reader finds exceptions or failures, please let me know at the email address whose username is “mib” and whose host name is “cmsmcq.com”.

Writing tight

[13 November 2008]

Dimitre Novatchev has called attention to a recent question on the
stackoverflow programming Q and A web site:

I have an XPath expression which provides me a sequence of values like the one below:

1 2 2 3 4 5 5 6 7

It is easy to convert this to a set of unique values “1 2 3 4 5 6 7” using the distinct-values function. However, what I want to extract is the list of duplicate values = “2 5”. I can’t think of an easy way to do this. Can anyone help?

Dimitre’s solution is beautiful and concise: 21 characters long (longer if you use variables with names longer than singler characters), and well worth the five or ten minutes it took me to work out why it works. I won’t explain it to you, dear reader; you deserve the brief puzzlement and Aha-moment of understanding it.

Despite being terse, it’s not the kind of thing you’d enter in an Obfuscated XPath contest, it just uses an idiom I haven’t seen before. I’ll see it again, though, because I’ll use it myself; as I say, it’s beautiful. (I do confess to a certain curiosity about how he would modify it if, as he says, efficiency needed to be addressed.

Dmitre gets my vote for this month’s best programming-language application of Strunk and White’s rule: “Omit needless words!”

… And it don’t look like I’ll ever stop my wandering (travels, part 3)

[4 November 2008]

This is the third in a series of posts about recent travels.

From Mannheim, I traveled to Dublin to visit the Digital Humanities Observatory headed by Susan Schreibman; they stand at the beginning of a three-year project to provide access to Irish work in digital humanities and to prepare the way for long-term preservation. I wish them the best of luck in persuading the individual projects with whom they are to collaborate that the use of appropriate standards is the right way forward.

From Dublin, Susan and I traveled to Trier for <philtag n=”7″/>, a small gathering whose name is a macaronic pun involving the German words Philologie (philology), Tag (congress, conference, meeting), and the English word tag. The meeting gathered together a number of interesting people, including several of those most actively interested in computer applications in philology, among them Werner Wegstein, who has organized most of the series, and whom I know from way back as a supporter of the TEI; Andrea Rapp, one of the senior staff at the Trier center of expertise in electronic access and publishing in the humanities; and Fotis Jannidis, currently teaching in Darmstadt and the founder and editor of the annual Computerphilologie, as well as a co-editor of the important electronic edition of the young Goethe. Wegstein is retiring from his chair in Würzburg, thus leading to the creation of a new chair computational philology, for which both Rapp and Jannidis were on the short list; on the preceding Friday, they had given their trial lectures in Würzburg. Either way, Würzburg will get a worthy successor to Wegstein.

The general topic this year was “Communicating eHumanities: Archives, Textcentres, Portals”, and several of the reports were indeed focused on archives, or text centers, or portals. I spoke about the concept of schema mapping as a way of making it possible to provide a single, simple, unified user interface to heterogeneous collections, while still retaining rich tagging in resources that have it, and providing access to that rich markup through other interfaces. Susan Schreibman spoke about the DHO. Haraldur Bernharðsson of Reykjavík spoke about an electronic edition of the Codex Regius of the Poetic Edda, which cheered me a great deal, since the Edda is dear to my heart and I’m glad to see a well done electronic edition. Heike Neuroth, who is affiliated both with the Max Planck Digital Library and Berlin and with the Lower Saxon State and University Library in Göttingen, spoke on the crucial but underappreciated topic of data curation. (I did notice that many of the papers she cited as talking about the need for long-term preservation of data were published in proprietary formats, which struck me as unfortunate for both practical and symbolic reasons. But data curation is important, even if some who say so are doing a good job of making it harder to curate they data they produce.)

There were a number of other talks, all interesting and useful. But I think the high point of the two days was probably the public lecture by Fotis Jannidis under the title Die digitale Evolution der Kultur oder der bescheidene Beitrag der Geisteswissenschaften zur Virtualisierung der Welt (‘Digital evolution, or the modest contribution of the humanities to the virtualization of the world’). Jannidis took as his point of departure a suggestion by Brewster Kahle that we really should try to digitize all of the artifacts produced til now by human culture and refined and augmented Kahle’s back of the envelope calculations about how much information that would involve, and how one might go about it. At one point he showed a graphic with representations of books and paintings and buildings and so on in the upper left, and digitizations of them in the upper right, and a little row of circles labeled Standards at the bottom, like the logs on which the stones of the pyramids make their way to the construction site, in illustrations of books about ancient Egypt.

It was at about this point that, as already pointed out, he said “Standards are the essential axle grease that makes all of this work.”

Schmidt on networking

[26 October 2008]

I’ve recently set up a LinkedIn profile for myself, and I’ve been assiduously searching in LinkedIn for old and current friends and colleagues and ‘building my network’ — so I was rather struck by the following slightly sobering remarks in Helmut Schmidt’s recent book Außer Dienst, which I picked up in Germany a couple weeks ago and have been reading at odd moments on airplanes:

Den Ausdruck Netzwerk hat es zu meiner Zeit noch nicht gegeben. Aber natürlich hat man vielfältige persönliche Kontakte geknüpft und sie langfristig aufrechterhalten. Wer sich gegen seine Zeitgenossen abschließt, hat es schwerer, zu abgewogenen Urteilen zu gelangen, als einer, der sich öffnet und Kontakt und Austausch sucht. … Mir wollen weniger jene Netzwerke wichtig erscheinen, welche einem bestimmten Interesse oder der eigenen Karriere dienen, als vielmehr solche, die der geistigen Anregung und dem gedanklichen Austausch förderlich sind.

Or (my translation):

In my day the expression ‘network’ did not yet exist. But of course people made a lot of personal contacts and kept them up over long periods. Anyone who is closed off from contact with one’s contemporaries will find it harder to reach well founded judgements than someone who is open to new contacts and to exchange of views. … Networks of contacts which serve only a particular interest or which are made only in the interest of one’s career seem to me less important than contacts which serve to provide intellectual stimulation and promote the exchange of thoughts.

Of course, Schmidt has things like the Mittwochsgesellschaft (and a Freitagsgesellschaft in Hamburg) in mind, which set an awfully high bar. But still — it makes me wonder not just how well LinkedIn and other social networking sites measure up to these high standards, but how one might use them to pursue the same kinds of intellectual cross-fertilization and mutual education Schmidt describes.

… I’ve been wandering late … (travels, part 2)

[26 October 2008]

This is the second in series of posts recording some of my impressions from recent travels.

After the XSLT meetings described in the previous post, and then a week at home, during which I was distracted by events that don’t need to be described here, I left again in early October for Europe. During the first half of last week [week before last, now], I was in Mannheim attending a workshop on organized by the electronic publications working group of the Union of German Academies of Science. Most of the projects represented were dictionaries of one stripe or another, many of them historical dictionaries (the Thesaurus Linguae Latinae, the Dictionnaire Etymologique de l’Ancien Français, the Deutsches Rechtswörterbuch, the Qumran-Wörterbuch, the Wörterbuch der deutschen Vinzersprache, both an Althochdeutsches Wörterbuch and an Althochdeutsches Etymologisches Wörterbuch, a whole set of dialect dictionaries, and others too numerous to name).

Some of the projects are making very well planned, good use of information technology (the Qumran dictionary in Göttingen sticks particularly in my mind), but many suffer from the huge weight of a paper legacy, or from short-sighted decisions made years ago. I’m sure it seemed like a good idea at the time to standardize on Word 6, and to build the project work flow around a set of Word 6 macros which are thought not to run properly in Word 7 or later versions of Word, and which were built by a clever participant in the project who is now long gone, leaving no one who can maintain or recreate them. But however good an idea it seemed at the time, it was in reality a foolish decision for which project is now paying a price (being stuck in outdated software, without the ability to upgrade, and with increasing difficulty finding support), and for which the academy sponsoring the project, and the future users of the work product, will continue paying for many years to come.

I gave a lecture Monday evening under the title “Standards in der geisteswissenschaftlichen Textdatenverarbeitung: Über die Zukunftssicherung von Sprachdaten”, in which I argued that the IT practice of projects involved with the preservation of our common cultural heritage must attend to a variety of problems that can make their work inaccessible to posterity.

The consistent use of suitably developed and carefully chosen open standards is by far the best way to ensure that the data and tools we create today can still be used by the intended beneficiaries in the future. I ended with a plea for people with suitable backgrounds in the history of language and culture to participate in standardization work, to ensure that the standards developed at W3C or elsewhere provide suitable support for the cultural heritage. The main problem, of course, is that the academy projects are already stretched thin and have no resources to spare for extra work. But I hope that the academies will realize that they have a role to play here, which is directly related to their mission.

It’s best, of course, if appropriate bodies join W3C as members and provide people to serve in Working Groups. More universities, more academies, more user organizations need to join and help guide the Web. (Boeing and Chevron and other user organizations within W3C do a lot for all users of the Web, but there is only so much they can accomplish as single members; they need reinforcements!) But even if an organization does not or cannot join W3C, an individual can have an impact by commenting on draft W3C specifications. All W3C working groups are required by the W3C development process to respond formally to comments received on Last-Call drafts, and to make a good-faith effort to satisfy the originator of the comment, either by doing as suggested or by providing a persuasive rationale for not doing so. (Maybe it’s not a good idea after all, maybe it’s a good idea but conflicts with other good ideas, etc.) It is not unknown for individuals outside the working group (and outside W3C entirely) to have a significant impact on a spec just by commenting on it systematically.

Whether anyone in the academies will take the invitation to heart remains to be seen, though at least a couple of people asked hesitantly after the lecture how much membership dues actually run. So maybe someday …