Heinrich Hertz and the empty set of tomatoes

[2 April 2009]

Why does Nelson Goodman want to work so hard just to avoid talking about classes or sets?

Earlier this year I spent some time reading the section on the calculus of individuals in Nelson Goodman’s The structure of appearance (3d ed. Boston: Reidel, 1977) and the paper Goodman wrote on the subject with Henry S. Leonard (Henry S. Leonard and Nelson Goodman, “The calculus of individuals and its uses” The journal of symbolic logic 5.2 (1940): 45-55).

I was struck by the lengths Goodman goes to in order to avoid talking about sets, although his compound individuals which contain other individuals seem to be doing very much the same work as sets. Indeed, the 1940 paper makes a selling point of this fact. On page 46, Leonard and Goodman write “To any analytic proposition of the Boolean algebra will correspond a postulate or theorem of this calculus provided that …” (In other words, with some few provisos, if you can make a true statement about sets, you can make a corresponding true statement about individuals in the calculus of individuals. The provisos aren’t even statements you can’t make, just restrictions on the form you make them in. Instead of saying “the intersection of x and y is the empty set” you have to say they are discrete. And so on.) And the concluding sentence of the paper (p. 55) is: “The dispute between nominalist and realist as to what actual entities are individuals and what are classes is recognized as devolving upon matters of interpretative convenience rather than upon metaphysical necessity.“

In other words, Goodman seems at first glance to be simplifying the world by eliminating the notion of sets and classes, and then to be complicating it again in precisely similar ways by taking all of the fundamental ideas we have about sets or classes, and reconstructing them as funny ways of talking about individuals. Cui bono?

This afternoon I saw a review by Anthony Gottlieb, in the New Yorker, of a recent book about the Wittgenstein family (Alexander Waugh, The House of Wittgenstein: A family at war), which seems to suggest a solution. Gottlieb quotes a suggestion from the physicist Heinrich Hertz:

Hertz had suggested a novel way to deal with the puzzling concept of force in Newtonian physics: the best approach was not to try to define it but to restate Newton’s theory in a way that eliminates any reference to force. Once this was done, according to Hertz, “the question as to the nature of force will not have been answered; but our minds, no longer vexed, will cease to ask illegitimate questions.”

(Throws a new light on Wittgenstein’s remark about not wanting to solve problems but to dissolve them, doesn’t it?)

It’s true that once you rebuild the ideas of set union, intersection, difference, etc. as ideas about individuals which can overlap or contain other individuals, and eliminate the word ‘set’, it becomes a lot harder to describe a set which contains as members all sets which are members of themselves, or a set which contains as members all sets which are not members of themselves. The closest you can conveniently get are statements about individuals which overlap themselves (they all do) or which do not overlap themselves (no such individual). Good-bye, Russell’s Paradox!

And consider the surrealist joke I ran into the other day:

Q. What is red and invisible?
A. No tomatoes.

A user of the calculus of individuals can enjoy this on its own terms, without having to worry about whether it’s a veiled reference to the fact that some typed logics end up with multiple forms of empty set, one for each type in the system. One for integers, if you’re going to reason about integers. One for customer records, if you’re going to reason about customers. And … one for tomatoes?

Q. What is red and invisible?
A. The empty set of tomatoes.

Persistence and dereferenceability

[31 March 2009]

My esteemed former colleague Thomas Roessler has posted a musing on the fragility of the electronic historical record and the difficulties of achieving persistence, when companies go out of existence and coincidentally stop maintaining their Web sites or paying their domain registration fees.

After reading Thomas’s post, my evil twin Enrique came close to throwing a temper tantrum. (Actually, that’s quite unfair. For Enrique, he was remarkably well behaved.)

“The semantic web partisans,” he shouted, “have spent the last ten years or more telling us that URLs are the perfect naming mechanism: a single, integrated space of names with distributed naming authority. Haven’t they?”

“Well,” I said, “strictly speaking, I think they have mostly been talking about URIs, for the last few years at least.” He ignored this.

“They have been telling us we should use URLs for naming absolutely everything. Including everything we care about. Including Aeschylus and Euripides! Homer! Sappho! Including Shelley, and Keats, and Pope!”

I couldn’t help starting to hum ‘Brush up your Shakespeare’ at this, but he ignored me. This in itself was unusual; he is usually a sucker for Cole Porter. I guess he really was kind of worked up.

“And when anyone expressed concern about (a) the fact that the power to mint URLs is tied up with the regular payment of fees, so it’s really not equally accessible to everyone, or (b) the possibility that URLs don’t have the kind of longevity needed for real persistence, they just told us again, louder, that we should be using URLs for everything.”

“Now, don’t bring up URNs!” I told him, in a warning tone. “We don’t want to open those old wounds again, do we?”

“And why the hell not?” he roared. “What do the SemWeb people think they are playing at?!”

“Well,” I said.

“Either they are surprised at this problem, in which case you have to ask: ‘How can they be surprised? What kind of idiots must they be not to have seen this coming?’“

“Well,” I said.

“Or else they aren’t surprised, in which case you have to ask what they are smoking! Is it their attention span so short that it has never occurred to them that names sometimes need to last for longer than Netscape, Inc., happens to be in business?”

“Well,” I said. I realized I didn’t really have a good answer.

“And you?!” he snarled, turning on me and grabbing my lapels. “You were there for years — you couldn’t take a moment to point out to them that a naming convention can be used for everything we care about only if it can be used for the monuments of human culture? You couldn’t be bothered to point out that URLs can be suitable for naming parts of our cultural heritage only if they can last for a few hundred, preferably a few tens of thousands, of years? What use are you?!”

“Well,” I said.

“What use are URLs and their much hyped dereferenceability, if they can break this fast?”

“Well,” I said.

Long pause.

I am not sure Enrique’s complaints are entirely fair, but I also didn’t know how to answer them. I fear he is still waiting for an answer.

Managing to disagree

[30 March 2009]

For some reason, lately I’ve found an old remark of Allen Renear’s running through my head.

“We can disagree about many things; but can we disagree about everything?

“Or would that be like positing the existence of an airline so small, it has no nonstop flights?”

[Memory tells me that he said this at a meeting of the Society for Textual Scholarship; Google, aided by Robin Cover’s Cover Pages, tells me that it was in April 1995.]

It’s on my mind, perhaps, because with Claus Huitfeldt and Yves Marcoux I’ve been doing some work on a formal model of transcription, and when we have examined how multiple divergent transcriptions of the same exemplar look in our model, it has proven much harder than I would have thought possible to make the transcriptions actually contradict one another. (More on this in another post, perhaps.)

Consistency checking in prose

[26 March 2009]

Twice in the last couple of days I’ve run into documents with consistency problems. So I’ve been thinking about consistency checking in prose as a challenge.

The web site for a large organization has, in the About Us section of the site, a side bar saying so-and-so many employees in so-and-so many countries. And one of the documents within the About Us section talked about the organization’s efforts to be a good corporate citizen and employer at all of its so-and-so many locations. If you are in n locations in m countries, though, shouldn’t n be greater than or equal to m?

The other example was documentation for a specialized XML vocabulary which included a summary of gaps in the vocabulary’s coverage and shortcomings in the design. The main gap, said the documentation, was that “the vocabulary offers no solution to the problem of XYZ” But the vocabulary does offer a solution to that problem: the revision of the vocabulary to deal with problem XYZ is described with some pride two or three sections further up in the document.

One may speculate that in both cases, a perfectly true statement in the document was rendered false by later events, and statements added later to the document, reflecting the later state of affairs, contradict the earlier statements. (There was a gap in the vocabulary, and the documentation mentioned it as a potentially painful one. Eventually it was painful enough to be filled. And the tag-by-tag account of the markup was even revised to document the new constructs. But the description of gaps and shortcomings was not revised. And it’s not hard to believe that an organization may be in m locations at one point, and in a larger number of locations, in n countries, later on.)

In neither of these cases is the contradiction particularly embarrassing or problematic.

[“But I notice you name no names,” said Enrique. “Chicken.” “Hush,” I said. “The names are not relevant.”]

But of course the same problem happens in specs, where inconsistencies may have graver consequences.

[“Ha! I can think of some names under this rubric!” crowed Enrique. “Shut up!” I explained. “I’m trying to describe and understand the problem, not confess my sins.” “Oh, go on! Admitting you have a problem is the first step towards recovery. If you don’t admit that you have the problem, you’ll never — what the ?!” At this point, I managed to get the duct tape over his mouth.]

I think there must be two basic approaches to trying to avoid inconsistencies.

(1) You can try to prevent them arising at all.

(2) You can try to make them easier to detect automatically, so that an automated process can review a set of documents and flag passages that need attention.

Neither of these seems to be easy to do. But for both of them, it’s not hard to think of techniques that can help. And thinking about any kind of information system, whether it’s an XML vocabulary or a database management system or a programming language or a Web site content management system, or a complicated combination of the above, we can usefully ask ourselves:

How could we make it easier to prevent inconsistency in a set of documents?

How could we make it easier to keep a set of documents in synch with each other as they change?

How could we document the information dependencies between documents at a useful level of granularity? (Helpful, perhaps, to say “if document X changes, documents Y and Z, which depend on it, must be checked to see if they need corresponding revisions”, but a lot more helpful if you can say which sections in Y and Z depend on which bits of X.) Could we do it automatically?

It seems plausible that detecting inconsistencies between distant parts of a document would be easier if we could get a list of (some of) the entailments of each bit of a document.

How can we make it easier to document the entailments of a particular passage in a spec?

For making the entailments of a passage explicit (and thus amenable to mechanical consistency checking with other parts of the document set) I think there are several obvious candidates: RDF, Topic Maps, RDFa, the work Sam Hunting has been doing with embedded topic maps (see for example his report at Balisage 2008), colloquial XML structures designed for the specific system in question. Years ago, José Carlos Ramalho and colleagues were talking about semantic validation of document content; they must have had something to say about this too. (In the DTD, I recall, they used processing instructions.) Even standard indexing tools may be relevant.

How do these candidates compare? Are they all essentially equally expressive? Does one or the other make it easier to annotate the document? Is one or the other easier to process?

[“If you don’t admit that you have the problem, you’ll never be able to fix it. And you keep that roll of duct tape away from me, you hear?”]

Blogging and apothegms

[25 March 2009]

For some time now I’ve been carrying around a little notebook with (among other things) notes on various topics I have thought it would be useful and interesting to make blog posts about. I haven’t had time to work out coherent expositions or arguments on most of the topics, though, so nothing happens. All I’ve got are short fragments in a telegraphic style — just enough (I hope) to remind myself, when I come back to the topic, of the line of thought I wanted to pursue.

Sometimes I think I should post the notes I’ve got, despite their incomplete, inadequate formulations. It might not help you, dear reader (sorry) but it might make this lab notebook more useful for me.

(See also Matt Kirschenbaum’s ruminations from 2005 on the use(s) of blog posts, which is a message in a bottle I’ve just run across.)

And I have begun to wonder if this explains the aphoristic, telegraphic style I associate with the posthumous notebooks and journals of great writers, full of incomprehensibly terse remarks. Are the fragments of (say) the Schlegels nothing but notes for things they would later have worked up into blog posts, if only they had not been born two hundred years too soon?


Or perhaps I should say:

Notebook full of ideas for posts.

Telegraphic — aphoristic — apothegmatic?

Schlegels (Nietzsche?) as bloggers avant la lettre?

Is profundity nothing more than haste to get something — something — a trail of breadcrumbs? — down quick?

Hmm. Breadcrumbs. Guess DanC (all of DIG?) thinks so.

Hmm.