Tagging Parts of Speech

C. M. Sperberg-McQueen

TEI ED W12

May 1990

Abstract

A set of feature structures sufficient to express the analysis underlying the tags of the tagged LOB corpus is provided, with a set of entity names for tagging text. The method of construction, similar to that described in AIW21, "On Lexical Ambiguity," is such as to allow simple extension to other languages and other grammatical features. The possible meanings of underspecified analyses are described and discussed.

NOTE:

The paper is not now complete; it offers a full list of the basic grammatical categories in the LOB scheme, and full specification of the features of verbs and some other categories. Full definitions of the LOB tags in feature-structure notation are given only for verb tags. The work on non-lexical parts of speech, especially, does not agree with normal linguistic analysis and should be brought into line.

Much work remains to be done in this area; I believe it should proceed as follows:

assume the method described here and in AI W 21 for representing complex feature structures with simple entity references built up out of other entity references for simple feature-value pairs
develop a set of feature structures (ignoring for the moment the SGML formalism) for “standard average European” as commonly annotated in large corpora: number, gender, case, tense, etc.
ensure that the feature structures so developed are upward compatible with commonly used schemes like LOB, Brown, etc. That is, LOB, Brown and other common schemes should fall out of the TEI scheme as simplifications or as particular sets of values.
if consensus can be achieved, propose a specific set of tags using this standard-average-European feature set, for use in corpus annotation.
using the method assumed in point 1, create the required entity definitions for features and feature structures. Optionally provide DTD modifications for enforcing the standard average feature set.

1. Introduction

This paper describes one approach to expressing part-of-speech tags using the feature-structure markup proposed by the A&I committee. It takes as a given the part-of-speech classification of the LOB corpus and seeks an equivalent expression of that classification in TEI-conformant SGML.

NOTE:

It is intended that this paper eventually provide a full specification of the LOB tags in feature-structure notation. A partial specification, however, is enough to make clear the direction being suggested and to allow for comment. The paper is thus being distributed in a half-complete state.

The grammatical features now outlined in this paper include those required for a full treatment of LOB scheme's verb tags, and the grammatical features required for LOB's treatment of nouns, pronouns, conjunctions, numerals, and determiner-pronouns. Features required for adverbs, determiners and articles, adjectives, qualifiers, and WH-pronouns must still be added; this is a matter of transcription from the “naive” SGML form in which they have already been worked out.

Full feature-structure definitions are given only for the LOB verb tags; similar definitions for the other categories remain to be formulated, which should be straightforward.

Further work should attempt to extend these definitions to other classifications.

2. Description of the LOB Tags

We begin with a list of the tags used in the LOB markup. This is taken from The Tagged LOB Corpus: User's Manual, by Stig Johansson in collaboration with Eric Atwell, Roger Garside, and Geoffrey Leech (Bergen: Norwegian Computing Centre for the Humanities, 1986).

NOTE:

List to be added.

3. LOB Verb Tags

For the moment, let's work with just the verb tags:

BE: the verb TO BE
BED: the verb TO BE, past tense
BEDZ: the verb TO BE, past tense, 3d person singular
BEG: the participle BEING
BEM: am, 'm
BEN: been
BER: are, 're
BEZ: is, 's
DO: the verb DO as auxiliary
DOD: did
DOZ: does
HV: have
HVD: had (as past tense)
HVG: having
HVN: had (as past participle)
HVZ: has
MD: modal auxiliary verb
VB: lexical verb
VBD: lexical verb in past tense
VBG: lexical verb, present participle
VBN: lexical verb, past participle
VBZ: lexical verb, third-person singular present tense

4. Naive Transcription into SGML

The semantics of these 22 tags can be reduced to a few atomic notions; if we use conventional (traditional?) grammatical terms, we can arrange these tags in a (sparse) matrix along the following axes:

lexical type: lexical verb, BE, DO, HAVE, auxiliary, or modal
number: singular, plural, or unmarked
person: 1st, 2nd, 3rd, unmarked, or not applicable (for participles)
tense: present, past, future

Because this is modern English, these axes are not truly orthogonal: plural occurs only for BER and 3rd-person and singular correlate strongly. An analysis having only modern English in mind might thus collapse these features for reasons of economy; I keep them separate because this traditional analysis is clear and commonly understood and because it can more readily be extended to historical forms of English and to other Indo-European languages. The full analysis will also be required in in the pronoun system, in any case.

The feature structures for the LOB tags for verbs can be built out of these primitive notions. One straightforward approach would use the major category as a generic identifier and specify feature-value pairs using the attribute-value notation. The element and attribute declarations would look like this:

<!ELEMENT verb>
<!ATTLIST verb
            n (sg, pl, ind) i
            -- number:  singular, plural, indefinite --
            p (1, 2, 3, 0) 0
            -- person:  1, 2, 3, or unmarked --
            pt (participle, nonparticipial) nonparticipial
            -- participles:  yes or no --
            t (pres, past, fut) pres
            -- tense:  present, preterite, future --
            lex (lex, be, do, have, aux, mod) lex
            -- lexical, auxiliary (and which), or modal -- >

So the various LOB verb tags could be specified thus:

BE (be): verb lex=be
BED (were): verb lex=be t=past
BEDZ (was): verb lex=be t=past p=3 n=sg
BEG (being): verb lex=be t=pres pt=part
BEM (am, 'm): verb lex=be t=pres p=1 n=sg
BEN (been): verb lex=be t=past pt=part
BER (are, 're): verb lex=be t=pres n=pl
BEZ (is, 's): verb lex=be t=pres p=3 n=sg
DO (do): verb lex=do
DOD (did): verb lex=do t=past
DOZ (does): verb lex=do t=pres p=3 n=sg
HV (have): verb lex=have
HVD (had, 'd): verb lex=have t=past
HVG (having): verb lex=have t=pres pt=part
HVN (had (pp)): verb lex=have t=past pt=part
HVZ (has, 's): verb lex=have t=pres p=3 n=sg
MD (modal aux): verb lex=mod
VB (base verb): verb
VBD (past tense): verb t=past
VBG (present participle, gerund): verb pt=part t=pres
VBN (past participle): verb pt=part t=past
VBZ (3d pers sg): verb p=3 n=sg

In the notes which follow, this direct translation of category names and values into generic identifiers, attribute names, and attribute values will be called the “naive” approach.

5. Transcription into SGML Using Feature Structures

The naive SGML version of the LOB verb tags can be translated directly into the feature-structure notation devised by the A&I committee. The structure for BEZ, for example, might be expressed thus:


<f.struct>
        <f.struct.name>BEZ
        <feature><f.name>category     <f.struct>verb     </feature>
        <feature><f.name>lexical type <f.struct>copula   </feature>
        <feature><f.name>number       <f.struct>singular </feature>
        <feature><f.name>person       <f.struct>3rd      </feature>
        <feature><f.name>tense        <f.struct>present  </feature>
</f.struct>

while the feature structure for VBD might be somewhat simpler:


<f.struct>
        <f.struct.name>VBD
        <feature><f.name>category     <f.struct>verb     </feature>
        <feature><f.name>lexical type <f.struct>full verb</feature>
        <feature><f.name>tense        <f.struct>preterite</feature>
</f.struct>

6. Interpretation of Missing Features

Here we encounter a minor conundrum. By leaving number and person unspecified, this rendition of VBD could conceivably be claiming any of the following:

1. that number was either singular or plural or unmarked, those being the allowable values
2. that the word in question is not marked for number (so the feature defaults to the value unmarked)
3. that the feature has an unknown value (e.g. the analysis is not complete and may or may not be completed later)
4. that the feature does not apply here (i.e. the analysis is complete without it and cannot ever supply a value for this feature)

It seems better to forbid the second interpretation (<ptr></ptr>), and insist that failure to specify a value says nothing about the value---no defaulting mechanism is provided or allowed. Similarly, the first interpretation (<ptr></ptr>) can be forced by explicitly providing an OR of the various possible values over which the feature can range or by providing a value like unmarked, which may have a similar effect, as it does here. The final interpretation (<ptr></ptr>) may be tempting, but it would be unenforceable by any SGML parser. Moreover it would be redundant to specify this interpretation, by silence or any other means, every time a feature was not mentioned. It would suffice to specify such information once in a grammar; I conclude that a grammar is where such claims belong, and that we can therefore eliminate the final interpretation. Inapplicable features will always be passed over in silence, but not all features passed over in silence need be interpreted as inapplicable.[1]

More properly, then, the tag VBD ought to be analyzed this way, specifying the value unmarked for both person and number.


<f.struct>
        <f.struct.name>VBD
        <feature><f.name>category     <f.struct>verb     </feature>
        <feature><f.name>lexical type <f.struct>full verb</feature>
        <feature><f.name>number       <f.struct>unmarked </feature>
        <feature><f.name>person       <f.struct>unmarked </feature>
        <feature><f.name>tense        <f.struct>preterite</feature>
</f.struct>

Or we can be more explicit about the combinatorial possibilities, banning the value unmarked and restricting the values to 1st, 2nd, or 3rd person and singular or plural. In this case we provide explicit alternations to show the range of possibilities:


<f.struct>
        <f.struct.name>VBD
        <feature><f.name>category     <f.struct>verb     </feature>
        <feature><f.name>lexical type <f.struct>full verb</feature>
        <feature><f.name>number
                 <f.s.OR><f.struct>singular</f.struct>
                         <f.struct>plural  </f.struct>
                 </f.s.OR>
        </feature>
        <feature><f.name>person
                 <f.s.OR><f.struct>1st</f.struct>
                         <f.struct>2nd</f.struct>
                         <f.struct>3rd</f.struct>
                 </f.s.OR>
        </feature>
        <feature><f.name>tense        <f.struct>preterite</feature>
</f.struct>

7. Definitions of Primitive Grammatical Elements

It seems clear that feature structures like those just described may conveniently be expressed by general entity references which occur within f.struct tags, or which themselves contain the f.struct tags. Thus in a running text one might have:

    Wash    <f.struct>&nn;        </f.struct>
    sinks   <f.struct>&vbz;       </f.struct>
    .       <f.struct>&punct.stop;</f.struct>

It also seems clear that such complex entity values are best built up from smaller primitive entity values, each describing one feature. This has the advantage of allowing all analyses which use a grammatical feature (e.g. number) to use the same definitions. In the remainder of this paper I will give the entity definitions required for the LOB tags and give some simple examples of their possible use.

8. Major Categories

The major categories (“parts of speech”) assumed by the LOB tagging can be treated as values of a feature called category.


<!ENTITY v        "<feature><f.name>   category </f.name>
                            <f.struct> verb     </f.struct>
                   </feature>"                                      >
<!ENTITY adv      "<feature><f.name>   category </f.name>
                            <f.struct> adverb   </f.struct>
                   </feature>"                                      >
<!ENTITY n        "<feature><f.name>   category </f.name>
                            <f.struct> noun     </f.struct>
                   </feature>"                                      >
<!ENTITY pron     "<feature><f.name>   category </f.name>
                            <f.struct> pronoun  </f.struct>
                   </feature>"                                      >


<!ENTITY conj     "<feature><f.name>   category </f.name>
                            <f.struct> conjunction</f.struct>
                   </feature>"                                      >
<!ENTITY num      "<feature><f.name>   category </f.name>
                            <f.struct> numeral  </f.struct>
                   </feature>"                                      >
<!-- determiner-pronoun class includes determiners, quantifiers,  -->
<!-- and qualifiers which can act as determiners or pronominally. -->
<!-- AB subclass is pre-qualifiers and pre-quantifiers.           -->
<!ENTITY AB       "<feature><f.name>   category </f.name>
                            <f.struct> determiner-pronoun</f.struct>
                   </feature>
                   <feature><f.name>   position </f.name>
                            <f.struct> pre-posed</f.struct>
                   </feature>"                                  >


<!-- AP subclass is post-determiner/pronoun                       -->
<!ENTITY AP       "<feature><f.name>   category </f.name>
                            <f.struct> determiner-pronoun</f.struct>
                   </feature>
                   <feature><f.name>   position  </f.name>
                            <f.struct> post-posed</f.struct>
                   </feature>"                                      >
<!ENTITY det      "<feature><f.name>   category </f.name>
                            <f.struct> determiner </f.struct>
                   </feature>"                                      >
<!ENTITY article  "<feature><f.name>   category </f.name>
                            <f.struct> article  </f.struct>
                   </feature>"                                      >


<!ENTITY ex       "<feature><f.name>   category </f.name>
                            <f.struct> existential THERE</f.struct>
                   </feature>"                                      >
<!ENTITY prep     "<feature><f.name>   category </f.name>
                            <f.struct> preposition </f.struct>
                   </feature>"                                      >
<!ENTITY adj      "<feature><f.name>   category </f.name>
                            <f.struct> adjective </f.struct>
                   </feature>"                                      >
<!ENTITY qual     "<feature><f.name>   category </f.name>
                            <f.struct> qualifier </f.struct>
                   </feature>"                                      >


<!ENTITY to       "<feature><f.name>   category </f.name>
                            <f.struct> infinitival TO </f.struct>
                   </feature>"                                      >
<!ENTITY uh       "<feature><f.name>   category </f.name>
                            <f.struct> interjection </f.struct>
                   </feature>"                                      >
<!ENTITY wh       "<feature><f.name>   category </f.name>
                            <f.struct> WH-determiner </f.struct>
                   </feature>"                                      >
<!ENTITY not      "<feature><f.name>   category </f.name>
                            <f.struct> NOT      </f.struct>
                   </feature>"                                      >
<!ENTITY letter   "<feature><f.name>   category </f.name>
                            <f.struct> letter   </f.struct>
                   </feature>"                                      >
<!ENTITY punct    "<feature><f.name>   category </f.name>
                            <f.struct> punctuation </f.struct>
                   </feature>"                                      >
<!ENTITY formula  "<feature><f.name>   category </f.name>
                            <f.struct> formula </f.struct>
                   </feature>"                                      >
<!ENTITY foreign  "<feature><f.name>   category </f.name>
                            <f.struct> foreign phrase </f.struct>
                   </feature>"                                      >

9. Lexical Subcategorizations

Like most linguists, LOB distinguishes among subgroups of the major categories; these subcategorizations may be expressed in feature-structure notation this way:

NOTE:

This section is not complete for all categories.


<!-- VERBS                                                        -->
<!-- Lexical class of verb:  LOB distinguishes lexical verbs,     -->
<!-- auxiliaries, and modals.  We use +/- AUX, +/- MODAL, and     -->
<!-- a LEXITEM feature to make these distinctions.                -->
<!-- An alternative analysis would use a single feature and allow -->
<!-- it the values LEXICAL, MODAL, BE, DO, HAVE.  This would be   -->
<!-- very close to the tag construction of LOB, but seems less    -->
<!-- general.                                                     -->

<!-- Lexical verbs are -AUX -MOD                                  -->
<!ENTITY vb.lex   "&v;
                   <feature><f.name>AUX</f.name><minus></feature>
                   <feature><f.name>MOD</f.name><minus></feature>"  >
<!-- Modal verbs are +AUX +MOD                                    -->
<!ENTITY vb.mod   "&v;
                   <feature><f.name>AUX</f.name><plus></feature>
                   <feature><f.name>MOD</f.name><plus></feature>"   >
<!-- Auxiliary verbs are +AUX -MOD and get a LEXITEM feature      -->
<!ENTITY vb.be    "&v;
                   <feature><f.name>AUX</f.name><plus></feature>
                   <feature><f.name>MOD</f.name><minus></feature>
                   <feature><f.name>   Lexical item</f.name>
                            <f.struct> be          </f.struct>
                   </feature>"                                      >
<!ENTITY vb.do    "&v;
                   <feature><f.name>AUX</f.name><plus></feature>
                   <feature><f.name>MOD</f.name><minus></feature>
                   <feature><f.name>   Lexical item</f.name>
                            <f.struct> do          </f.struct>
                   </feature>"                                      >
<!ENTITY vb.have  "&v;
                   <feature><f.name>AUX</f.name><plus></feature>
                   <feature><f.name>MOD</f.name><minus></feature>
                   <feature><f.name>   Lexical item</f.name>
                            <f.struct> have        </f.struct>
                   </feature>"                                      >


<!-- NOUNS                                                        -->
<!-- Lexical class of noun:  LOB distinguishes common and proper  -->
<!-- nouns.  Common nouns may be marked additionally as capped.   -->
<!-- Such nouns are common nouns habitually written with          -->
<!-- uppercase initial, which act syntactically and               -->
<!-- morphologically as common nouns, not proper nouns.           -->
<!-- Examples:  Jew, Englishman, the English, Urdu, a Thatcherite,-->
<!-- an Etonian, Gaullism.                                        -->
<!-- Proper nouns may also be marked as locative or titular.      -->
<!-- The locatives are locative words written with initial cap.   -->
<!-- E.g.  Bay, Bight, Cape, Firth, Hill, \0Is, Island, Isle,     -->
<!-- Lake, Loch, \0Mt, Mount, Mountain, Peninsula, Plain, Point,  -->
<!-- \0Rd, Road, \0St, Street, Square, Valley, Wood.  Loch_NPL    -->
<!-- Ness_NP, the Firth_NPL of Forth_NP, the Houses_NPLS of       -->
<!-- Parliament_NP.                                               -->
<!-- An alternative analysis would use a single feature and allow -->
<!-- it the values COMMON, PROP, PROPTIT, PROPLOC, CAP.  As for   -->
<!-- verbs, we prefer what appears a more general construction.   -->

<!-- Some special features are also marked by LOB:  nouns of      -->
<!-- measure (UNIT), cited words, and nouns used adverbially.     -->

<!-- Common nouns are -PROPER, -CAP and otherwise unmarked.       -->
<!ENTITY n.com    "&n;
                   <feature><f.name>proper      </f.name><minus>
                   </feature>
                   <feature><f.name>capitalized </f.name><minus>
                   </feature>
                   <feature><f.name>unit noun   </f.name><minus>
                   </feature>
                   <feature><f.name>cited word  </f.name><minus>
                   </feature>
                   <feature><f.name>noun-as-adv </f.name><minus>
                   </feature>"                                      >
<!-- Capitalized nouns are -PROPER, +CAP.                         -->
<!ENTITY n.cap    "&n;
                   <feature><f.name>proper      </f.name><minus>
                   </feature>
                   <feature><f.name>capitalized </f.name><plus>
                   </feature>
                   <feature><f.name>unit noun   </f.name><minus>
                   </feature>
                   <feature><f.name>cited word  </f.name><minus>
                   </feature>
                   <feature><f.name>noun-as-adv </f.name><minus>
                   </feature>"                                      >
<!-- Proper nouns are +PROPER, +CAP.  LOB apparently does not     -->
<!-- recognize -CAP proper nouns.  (Treatment of 'van' and 'de'   -->
<!-- should be checked to make sure this is correct.)             -->
<!ENTITY n.proper "&n;
                   <feature><f.name>proper      </f.name><plus>
                   </feature>
                   <feature><f.name>capitalized </f.name><plus>
                   </feature>
                   <feature><f.name>unit noun   </f.name><minus>
                   </feature>
                   <feature><f.name>cited word  </f.name><minus>
                   </feature>
                   <feature><f.name>noun-as-adv </f.name><minus>
                   </feature>"                                      >
<!-- Locative proper nouns have +LOC -TITLE                       -->
<!ENTITY np.loc   "&n;
                   <feature><f.name>proper      </f.name><plus>
                   </feature>
                   <feature><f.name>capitalized </f.name><plus>
                   </feature>
                   <feature><f.name>locative    </f.name><plus>
                   </feature>
                   <feature><f.name>title       </f.name><minus>
                   </feature>
                   <feature><f.name>unit noun   </f.name><minus>
                   </feature>
                   <feature><f.name>cited word  </f.name><minus>
                   </feature>
                   <feature><f.name>noun-as-adv </f.name><minus>
                   </feature>"                                      >
<!-- Titles have -LOC +TITLE                                      -->
<!ENTITY np.title "&n;
                   <feature><f.name>proper      </f.name><plus>
                   </feature>
                   <feature><f.name>capitalized </f.name><plus>
                   </feature>
                   <feature><f.name>locative    </f.name><minus>
                   </feature>
                   <feature><f.name>title       </f.name><plus>
                   </feature>
                   <feature><f.name>unit noun   </f.name><minus>
                   </feature>
                   <feature><f.name>cited word  </f.name><minus>
                   </feature>
                   <feature><f.name>noun-as-adv </f.name><minus>
                   </feature>"                                      >
<!-- Cited nouns are tagged by LOB as otherwise like common nouns.-->
<!ENTITY n.cited  "&n;
                   <feature><f.name>proper      </f.name><minus>
                   </feature>
                   <feature><f.name>capitalized </f.name><minus>
                   </feature>
                   <feature><f.name>unit noun   </f.name><minus>
                   </feature>
                   <feature><f.name>cited word  </f.name><plus>
                   </feature>
                   <feature><f.name>noun-as-adv </f.name><minus>
                   </feature>"                                      >
<!-- Unit nouns are otherwise like common nouns.                  -->
<!ENTITY n.unit   "&n;
                   <feature><f.name>proper      </f.name><minus>
                   </feature>
                   <feature><f.name>capitalized </f.name><minus>
                   </feature>
                   <feature><f.name>unit noun   </f.name><plus>
                   </feature>
                   <feature><f.name>cited word  </f.name><minus>
                   </feature>
                   <feature><f.name>noun-as-adv </f.name><minus>
                   </feature>"                                      >
<!-- Adverbial nouns are otherwise like common nouns.             -->
<!ENTITY n.adverb "&n;
                   <feature><f.name>proper      </f.name><minus>
                   </feature>
                   <feature><f.name>capitalized </f.name><minus>
                   </feature>
                   <feature><f.name>unit noun   </f.name><minus>
                   </feature>
                   <feature><f.name>cited word  </f.name><minus>
                   </feature>
                   <feature><f.name>noun-as-adv </f.name><plus>
                   </feature>"                                      >
<!-- Note that LOB does not define an orthogonal set of tags      -->
<!-- for the various imaginable interactions among these features -->


<!-- ADVERBS                                                      -->
<!-- LOB distinguishes denominative, prepositional, participial,  -->
<!-- and other (unmarked) adverbs.                                -->
<!-- adv.nom are nominal adverbs, e.g. here, now, ...             -->
<!ENTITY adv.nom  "&pron;
                   <feature><f.name>   adv.type </f.name>
                            <f.struct> denominative </f.struct>
                   </feature>"                                      >
<!-- adv.prep are adverb homographs of prepositions               -->
<!ENTITY adv.prep "&pron;
                   <feature><f.name>   adv.type </f.name>
                            <f.struct> prepositional </f.struct>
                   </feature>"                                      >
<!-- adv.part are adverbial participles like 'back' ...           -->
<!ENTITY adv.part "&pron;
                   <feature><f.name>   adv.type </f.name>
                            <f.struct> participial </f.struct>
                   </feature>"                                      >
<!ENTITY adv.com  "&pron;
                   <feature><f.name>   adv.type </f.name>
                            <f.struct> unmarked </f.struct>
                   </feature>"                                      >


<!-- PRONOUNS                                                     -->
<!-- Lexical class of pronoun:  LOB distinguishes nominal         -->
<!-- pronouns (anybody, anyone, anything, everybody, ...),        -->
<!-- determiners, personal pronouns, and reflexive pronouns.      -->
<!ENTITY pro.nom  "&pron;
                   <feature><f.name>   pron.type </f.name>
                            <f.struct> nominal   </f.struct>
                   </feature>"                                      >
<!-- Possessive pronominal determiners include "my" etc.          -->
<!ENTITY pro.det  "&pron;
                   <feature><f.name>   pron.type  </f.name>
                            <f.struct> determiner </f.struct>
                   </feature>"                                      >
<!ENTITY pro.pers "&pron;
                   <feature><f.name>   pron.type </f.name>
                            <f.struct> personal  </f.struct>
                   </feature>"                                      >
<!ENTITY pro.refl "&pron;
                   <feature><f.name>   pron.type </f.name>
                            <f.struct> reflexive </f.struct>
                   </feature>"                                      >


<!-- CONJUNCTIONS                                                 -->
<!-- LOB distinguishes coordinating and subordinating conj.       -->
<!ENTITY CC       "&conj;
                   <feature><f.name>subordinating</f.name><minus>
                   </feature>"                                      >
<!ENTITY CS       "&conj;
                   <feature><f.name>subordinating</f.name><plus>
                   </feature>"                                      >


<!-- NUMERALS                                                     -->
<!-- LOB distinguishes cardinals and numerals.  Other             -->
<!-- distinctions are made, and may be found below.               -->
<!ENTITY num.card "&num;
                   <feature><f.name>ordinal</f.name><minus>
                   </feature>"                                      >
<!ENTITY num.ord  "&num;
                   <feature><f.name>ordinal</f.name><plus>
                   </feature>"                                      >


<!-- PREPOSED PRONOUN-DETERMINER                                  -->
<!-- LOB distinguishes qualifiers and quantifiers                 -->
<!ENTITY AB.qual  "&AB;
                   <feature><f.name>   det.type  </f.name>
                            <f.struct> qualifier </f.struct>
                   </feature>"                                      >
<!ENTITY AB.quant "&AB;
                   <feature><f.name>   det.type   </f.name>
                            <f.struct> quantifier </f.struct>
                   </feature>"                                      >


<!-- ADJECTIVES                                                   -->
<!-- LOB distinguishes attributive-only adjectives from those     -->
<!-- which can be either attributive or predicative.              -->
<!ENTITY jj.attr  "<feature><f.name>attrib-only</f.name><plus>
                   </feature>"                                      >
<!ENTITY jj.pred  "<feature><f.name>attrib-only</f.name><minus>
                   </feature>"                                      >


<!-- WH-pronouns                                                  -->
<!-- LOB distinguishes determiners, pronouns, and relatives.      -->
<!-- The first two can be marked with the CATEGORY feature        -->
<!-- already defined, (assuming we don't mind having two values   -->
<!-- for the same feature).  The last requires a RELATIVE feature -->
<!ENTITY rel.yes  "<feature><f.name>relative</f.name><plus>
                   </feature>"                                      >
<!ENTITY rel.no   "<feature><f.name>relative</f.name><minus>
                   </feature>"                                      >


<!-- PUNCTUATION                                                  -->
<!-- LOB distinguishes !, open and close bracket, open and        -->
<!-- close quote, dash, comma, stop, ellipsis, colon, semicolon,  -->
<!-- and question mark.                                           -->
<!ENTITY p.bang   "<feature><f.name>character</f.name>
                          <f.struct> ! </f.struct>
                   </feature>"                                      >
<!ENTITY p.openbr "<feature><f.name>character</f.name>
                          <f.struct> ( </f.struct>
                   </feature>"                                      >
<!ENTITY p.closbr "<feature><f.name>character</f.name>
                          <f.struct> ) </f.struct>
                   </feature>"                                      >
<!ENTITY p.openq  "<feature><f.name>character</f.name>
                          <f.struct> &ldquo </f.struct>
                   </feature>"                                      >
<!ENTITY p.closq  "<feature><f.name>character</f.name>
                          <f.struct> &rdquo </f.struct>
                   </feature>"                                      >
<!ENTITY p.dash   "<feature><f.name>character</f.name>
                          <f.struct> &dash </f.struct>
                   </feature>"                                      >
<!ENTITY p.comma  "<feature><f.name>character</f.name>
                          <f.struct> , </f.struct>
                   </feature>"                                      >


<!ENTITY p.stop   "<feature><f.name>character</f.name>
                          <f.struct> . </f.struct>
                   </feature>"                                      >
<!ENTITY p.ellips "<feature><f.name>character</f.name>
                          <f.struct> &hellip </f.struct>
                   </feature>"                                      >
<!ENTITY p.colon  "<feature><f.name>character</f.name>
                          <f.struct> : </f.struct>
                   </feature>"                                      >
<!ENTITY p.semi   "<feature><f.name>character</f.name>
                          <f.struct> ; </f.struct>
                   </feature>"                                      >
<!ENTITY p.query  "<feature><f.name>character</f.name>
                          <f.struct> ? </f.struct>
                   </feature>"                                      >

9.1. Number, Person, Case, Gender, and Other Grammatical Features

Features of traditional grammar like number, gender, and case appear in many of the LOB tags.

NOTE:

This section is not complete for all categories.


<!-- Number:  English words marked for number are sing or plur.   -->
<!-- This feature is used for verbs, nouns, pronouns, and         -->
<!-- numerals.                                                    -->
<!ENTITY sing     "<feature><f.name>   number   </f.name>
                            <f.struct> singular </f.struct>
                   </feature>"                                  >
<!ENTITY plur     "<feature><f.name>   number   </f.name>
                            <f.struct> plural   </f.struct>
                   </feature>"                                  >
<!ENTITY num.no   "<feature><f.name>   number   </f.name>
                            <f.struct> unmarked </f.struct>
                   </feature>"                                  >
<!-- We define "unmarked" as a placeholder, so that we can        -->
<!-- specify that a given word is not marked for number, rather   -->
<!-- than either leaving it out or specifying an exhaustive       -->
<!-- list of alternatives.                                        -->


<!-- Person:  English words marked for person are 1st, 2nd, 3rd.  -->
<!-- This feature is used for verbs and pronouns.                 -->
<!-- We distinguish IMPERSONAL as a value for pronouns and        -->
<!-- UNMARKED as a value for verbs which are not marked.          -->
<!-- Participles are not marked for person and have their own     -->
<!--       binary feature.                                        -->
<!ENTITY p1       "<feature><f.name>   person   </f.name>
                            <f.struct> 1st      </f.struct>
                   </feature>"                                  >
<!ENTITY p2       "<feature><f.name>   person   </f.name>
                            <f.struct> 2nd      </f.struct>
                   </feature>"                                  >
<!ENTITY p3       "<feature><f.name>   person   </f.name>
                            <f.struct> 3rd      </f.struct>
                   </feature>"                                  >
<!ENTITY impers   "<feature><f.name>   person   </f.name>
                            <f.struct> none     </f.struct>
                   </feature>"                                  >
<!-- We might say MINUS but PERSON is not binary so we don't  -->
<!ENTITY per.no   "<feature><f.name>   person   </f.name>
                            <f.struct> unmarked </f.struct>
                   </feature>"                                  >
<!ENTITY partic   "<feature><f.name>   participle </f.name>
                                       <plus>   </feature>"     >
<!ENTITY par.no   "<feature><f.name>   participle </f.name>
                                       <minus>  </feature>"     >


<!-- Tense:  English tenses are present and preterite.            -->
<!-- This feature is used for verbs.                              -->
<!-- This omits the compound tenses because they are analytic in  -->
<!-- English and we are worrying only about word tags.            -->
<!-- To allow for the compound tenses, e.g. for phrase tagging,   -->
<!-- we add a future tense and introduce a +/- PERFECTIVE feature -->
<!-- and perform the Cartesian product.                           -->
<!ENTITY present  "<feature><f.name>   tense    </f.name>
                            <f.struct> present  </f.struct>
                   </feature>"                                  >
<!ENTITY preterite "<feature><f.name>   tense    </f.name>
                             <f.struct> preterite </f.struct>
                   </feature>"                                  >
<!-- The features above are all that are needed for LOB tags.     -->
<!-- The following features are added proleptically for other     -->
<!-- uses.                                                        -->
<!ENTITY future   "<feature><f.name>   tense    </f.name>
                            <f.struct> future   </f.struct>
                   </feature>"                                  >

<!ENTITY presperf "<feature><f.name>   tense    </f.name>
                            <f.struct> present  </f.struct>
                   </feature>
                   <feature><f.name> perfective </f.name><plus>
                   </feature>"                                  >
<!ENTITY pluperf  "<feature><f.name>   tense    </f.name>
                            <f.struct> preterite</f.struct>
                   </feature>
                   <feature><f.name> perfective </f.name><plus>
                   </feature>"                                  >
<!ENTITY futperf  "<feature><f.name>   tense    </f.name>
                            <f.struct> future   </f.struct>
                   </feature>
                   <feature><f.name> perfective </f.name><plus>
                   </feature>"                                  >


<!-- Degree:  English modifiers are pos, comp, or sup.            -->
<!-- This feature is used for adverbs and adjectives.             -->
<!ENTITY pos      "<feature><f.name>   degree   </f.name>
                            <f.struct> positive </f.struct>
                   </feature>"                                  >
<!ENTITY comp     "<feature><f.name>   degree   </f.name>
                            <f.struct> comparative </f.struct>
                   </feature>"                                  >
<!ENTITY sup      "<feature><f.name>   degree   </f.name>
                            <f.struct> superlative </f.struct>
                   </feature>"                                  >


<!-- Case:  English words marked for case are nom, gen, or acc.   -->
<!-- This feature is used for adverbs (sometimes marked GEN),     -->
<!-- nouns (NOM or GEN), pronouns, numerals (sometimes GEN),      -->
<!-- determinerr-pronouns, and determiners.                       -->
<!-- NOM is nominative or "subjective" case.  We use NOM not SUB  -->
<!-- because we hope to generalize to other IE languages later.   -->
<!ENTITY nom      "<feature><f.name>   case     </f.name>
                            <f.struct> nominative </f.struct>
                   </feature>"                                  >
<!ENTITY gen      "<feature><f.name>   case     </f.name>
                            <f.struct> genitive    </f.struct>
                   </feature>"                                  >
<!-- ACC is accusative or "objective" case.  We use ACC not OBJ   -->
<!-- or OBLIQUE to make other IE languages easier later.          -->
<!ENTITY acc      "<feature><f.name>   case     </f.name>
                            <f.struct> accusative  </f.struct>
                   </feature>"                                  >
<!ENTITY case.no  "<feature><f.name>   case     </f.name>
                            <f.struct> unmarked    </f.struct>
                   </feature>"                                  >


<!-- Gender:  English words marked for gender are masculine,      -->
<!-- feminine, neuter, or common.  We add unmarked just in case.  -->
<!-- This feature is used for personal pronouns (3rd-person only) -->
<!ENTITY masc     "<feature><f.name>   gender   </f.name>
                            <f.struct> masculine </f.struct>
                   </feature>"                                  >
<!ENTITY fem      "<feature><f.name>   gender   </f.name>
                            <f.struct> feminine  </f.struct>
                   </feature>"                                  >
<!ENTITY neut     "<feature><f.name>   gender   </f.name>
                            <f.struct> neuter    </f.struct>
                   </feature>"                                  >
<!-- Common gender in English is masculine or feminine.  Other    -->
<!-- languages might need to define it as a distinct value.       -->
<!-- Danish, for instance?                                        -->
<!ENTITY common   "<feature><f.name>   gender   </f.name>
                            <f.s.OR>
                                 <f.struct> masculine </f.struct>
                                 <f.struct> feminine  </f.struct>
                            </f.s.OR>
                   </feature>"                                  >
<!ENTITY gend.no  "<feature><f.name>   gender   </f.name>
                            <f.struct> unmarked  </f.struct>
                   </feature>"                                  >

9.2. Miscellaneous Features

Some other features are not recognizable as traditional grammatical notions.

NOTE:

This section is not complete for all categories.


<!-- LOB distinguishes cardinals with the value 1, and others.    -->
<!-- Because LOB tokenizes on spaces, hyphenated pairs are also   -->
<!-- distinguished, here with a COUNT feature whose value is the  -->
<!-- number of numerals in the unit being tagged.                 -->
<!ENTITY num.one  "<feature><f.name>unitary value</f.name><plus>
                   </feature>"                                      >
<!ENTITY num.plur "<feature><f.name>unitary value</f.name><minus>
                   </feature>"                                      >
<!ENTITY num.pair "<feature><f.name>   count </f.name>
                            <f.struct> 2     </f.struct>
                   </feature>"                                      >


<!-- LOB distinguishes the word BOTH from other ABNs because it   -->
<!-- can serve as a double conjunction.                           -->
<!-- No distinction is made among uses of BOTH.                   -->
<!-- Determiners also distinguish double and single conjunctions. -->
<!ENTITY conj.dbl "<feature><f.name>double-conj </f.name><plus>
                   </feature>"                                      >
<!ENTITY c.dbl.no "<feature><f.name>double-conj </f.name><minus>
                   </feature>"                                      >


<!-- LOB distinguishes various words which can be pre-posed,      -->
<!-- post-posed, or both.                                         -->
<!ENTITY pre.yes  "<feature><f.name>preposable </f.name><plus>
                   </feature>"                                      >
<!ENTITY pre.no   "<feature><f.name>preposable </f.name><minus>
                   </feature>"                                      >
<!ENTITY post.yes "<feature><f.name>postposable </f.name><plus>
                   </feature>"                                      >
<!ENTITY post.no  "<feature><f.name>postposable </f.name><minus>
                   </feature>"                                      >

10. Combinations of Primitives

We can define the verbal tags of the LOB scheme fully as follows:


<!-- Simple verbs:  VB, VBD, VBG, VBN, VBZ                        -->
<!ENTITY VB       "&vb.lex; &num.no; &per.no; &par.no; &present;"   >
<!ENTITY VBD      "&vb.lex; &num.no; &per.no; &par.no; &preterite;" >
<!ENTITY VBG      "&vb.lex; &num.no;          &partic; &present;"   >
<!ENTITY VBN      "&vb.lex; &num.no;          &partic; &preterite;" >
<!ENTITY VBZ      "&vb.lex; &sing;   &p3;     &par.no; &present;"   >


<!-- Modal verbs:  MD                                             -->
<!ENTITY MD       "&vb.mod; &num.no; &per.no; &par.no; &present;"   >

<!-- BE Auxiliaries:  BE, BED, BEDZ, BEG, BEM, BEN, BER, BEZ      -->
<!-- Some might wish for a +/- INFINITE feature to distinguish    -->
<!-- infinitives; except for BE, however, the English infinitive  -->
<!-- is always the same as the form unmarked for person and num.  -->
<!-- And BE marks all forms for per/num.  So we don't need INFIN. -->
<!ENTITY BE       "&vb.be; &num.no; &per.no; &par.no; &present;"    >
<!ENTITY BED      "&vb.be; &num.no; &per.no; &par.no; &preterite;"  >
<!ENTITY BEDZ     "&vb.be; &sing;   &p3;     &par.no; &preterite;"  >
<!ENTITY BEG      "&vb.be; &num.no; &per.no; &partic; &present;"    >
<!ENTITY BEM      "&vb.be; &sing;   &p1;     &par.no; &present;"    >
<!ENTITY BEN      "&vb.be; &num.no; &per.no; &partic; &preterite;"  >
<!ENTITY BER      "&vb.be; &plur;   &per.no; &par.no; &present;"    >
<!ENTITY BEZ      "&vb.be; &sing;   &p3;     &par.no; &present;"    >


<!-- DO Auxiliaries:  DO, DOD, DOZ                                -->
<!ENTITY DO       "&vb.do; &num.no; &per.no; &par.no; &present;"    >
<!ENTITY DOD      "&vb.do; &num.no; &per.no; &par.no; &preterite;"  >
<!ENTITY DOZ      "&vb.do; &sing;   &p3;     &par.no; &present;"    >

Note that in VBD, DOD, and BED, the string “num.no per.no” says, correctly, that the verbs in question are not marked for person and number. In the case of VBD and DOD, however, this means person can be 1st, 2nd, or 3rd, and number can be singular or plural, in any combination; in the case of BED, it means that person and number can be any combination except 3rd-person singular. This is a simple fact of English grammar. Our choice of expression, modeled on the choices made in the LOB tag scheme, places the burden for handling this fact on the grammar and the application program; one could also change the definitions of these entities to make it explicit here. This facility effectively allows us to specify, in our entity declarations, just what we mean by a given part-of-speech classification, and thus represents an advantage over the naive approach presented earlier.

All the other LOB tags can be similarly defined; completion of the definition is for now left to the reader as an exercise.

A. Definition of all LOB tags in feature-structure notation

Binary Features

+/-AUX  auxiliary      /* verb */
+/-MOD  modal          /* verb */
+/-PROP proper         /* noun */
+/-CAP  capitalized    /* noun, adjective */
+/-SUB  subordinating  /* conjunction */
+/-ORD  ordinal        /* number */
+/-PERF perfective     /* tensed verbs */
+/-PART participle     /* verbs */

+/-LOC  locative term  /* proper nouns */
+/-TITL title          /* proper nouns */
+/-UNIT unit-term      /* noun */
+/-CITE cited-word     /* noun */
+/-ATTR attributive    /* adjectives */
+/-PRED predicative    /* adjectives -- redundant? */
+/-DBLC double-conj    /* determiner/pronouns, and determiners */
+/-PRE  preposable     /* ? may precede its head */
+/-POST postposable    /* ? may follow its head */
+/-PTCL particle       /* ? adverb  ?==? inverse of +/-takes-complement? */

+/-REL  relative       /* pronouns -- alternative to pron.type */
+/-PERS personal       /* pronouns -- alternative to pron.type */
+/-REFL reflexive      /* pronouns -- alternative to pron.type */
+/-WH   WH-word        /* pronouns, adverbs */
/* cross-category usages:  */
+/-pseudo-adverb  /* i.e. can appear in adverbial positions -- noun */
+/-pseudo-noun    /* i.e. can appear in noun positions -- adverb */
+/-also-prep      /* i.e. is also a preposition -- adverb */
+/-DET            /* i.e. is a determiner -- pronoun */
+/-exnoun         /* formed from a noun -- pronoun (anybody ...) */

N-way Features

/* Base categories */
CAT  category  = verb | adverb | noun | pronoun | conjunction | number |
        determiner | article | THERE | preposition | adjective |
        qualifier | TO | interjection | [WH] | NOT | letter |
        punctuation | formula | foreign
 
/* Sub-categorization */
LEX  lexitem   = (string)            /* verbs */
CHAR character = (string)            /* punctuation */
CNT  count     = (integer)           /* numbers -- for pairs, ranges */
[ATYP adv.type = nominal | preposition | particle | unmarked ]
        [prefer binary +/-pseudo-noun +/-also.prep +/-ptcl ]
[PTYP pron.type = nominal | determiner | personal | reflexive ]
        [prefer binary +/-exnoun +/-det +/-pers +/-refl +/-wh +/-rel ]
DTYP det.type  = qualifier | quantifier
 
/* Categories of Traditional Grammar */
NUM  number    = singular | plural | unmarked
PER  person    = 1st | 2nd | 3rd | none | unmarked
TEN  tense     = present | preterite | future
DEG  degree    = positive | comparative | superlative
CASE case      = nominative | genitive | accusative | unmarked
GEN  gender    = masculine | feminine | neuter | unmarked [ | common ]

Cross-Category Groupings in LOB

      noun-but-can-serve-as-adverb
      adverb-but-can-serve-as-noun (as prepositional object)
      adverb-or-preposition (RI)
      adverb-or-preposition-without-object (RP)

B. Definitions of LOB, Brown, and Lancaster Tags

LOB tags

Summary of tags (with spaces inserted for clarity):

    1   A B L   pre-qualifier (quite, rather, such) 7.12
                CAT=(DET|PRON), DTYP=QUALIFIER, +PRE
    2   A B N   pre-quantifier (all, half) 7.12
                CAT=(DET|PRON), DTYP=QUANTIFIER, +PRE
    3   A B X   pre-quantifier/pronoun/double conjunction (both)
                CAT=(DET|PRON), DTYP=QUANTIFIER, +PRE, +DBLC
    4   A P     post-determiner/pronoun.
                CAT=(DET|PRON), +POST
    5   A P $   other's
                CAT=(DET|PRON), +POST, CASE=GEN
    6   A P S   others
                CAT=(DET|PRON), +POST, NUM=PLURAL
    7   A P S $ others'
                CAT=(DET|PRON), +POST, CASE=GEN, NUM=PLURAL
    8   A T     article, singular (a, an, every) 7.12
                CAT=ARTICLE, NUM=SINGULAR
    9   A T I   article, sing or plural (the, no) 7.12
                CAT=ARTICLE, NUM=UNMARKED
 
    10  BE      be
                CAT=VERB +AUX -MOD -PART TEN=PRES LEX=BE
    11  BE D    were
                CAT=VERB +AUX -MOD -PART TEN=PRET LEX=BE
    12  BE D Z  was
                CAT=VERB +AUX -MOD -PART TEN=PRET NUM=SING PER=3 LEX=BE
    13  BE G    being
                CAT=VERB +AUX -MOD +PART TEN=PRES LEX=BE
    14  BE M    am, 'm
                CAT=VERB +AUX -MOD -PART TEN=PRES NUM=SING PER=1 LEX=BE
    15  BE N    been
                CAT=VERB +AUX -MOD +PART TEN=PRET LEX=BE
    16  BE R    are, 're
                CAT=VERB +AUX -MOD -PART TEN=PRES NUM=UNMKD PER=UNMKD LEX=BE
    17  BE   Z  is, 's
                CAT=VERB +AUX -MOD -PART TEN=PRES NUM=SING PER=3 LEX=BE
 
    18  CC coordinating conjunction (and, and/or, but, nor, only, or, yet)
                CAT=CONJ -SUB
 
    19  CD       2, 3, two, three, hundred, thousand, dozen, zero - 7.17
    20  CD     $ cardinal + genitive
    21  CD -CD   hyphenated pair of cardinals 7.17
    22  CD 1     one, 1 7.17
    23  CD 1   $ one's
    24  CD 1 S   ones
    25  CD   S   cardinal + plural (tens, millions, dozens, etc.)
 
    26  CS    subordinating conjunction (after, although, etc.) 7.14-15
                CAT=CONJ +SUB
 
    27  DO    do 7.5
              CAT=VERB +AUX -MOD -PART TEN=PRES NUM=UNMKD PER=UNMKD LEX=DO
    28  DO D  did
              CAT=VERB +AUX -MOD -PART TEN=PRET NUM=UNMKD PER=UNMKD LEX=DO
    29  DO Z  does
              CAT=VERB +AUX -MOD -PART TEN=PRES NUM=SING  PER=3 LEX=DO
 
    30  DT    singular detemrinal (another, each, that, this) 7.12
    31  DT $  singular determiner + genitive (another's)
    32  DT I  singular or plural determiner (any, enough, some)
    33  DT S  plural determiner (those, these)
    34  DT X  determiner/double conjunction (either, neither) 7.12
 
    35  EX  existential 'there'
 
    36  HV    have 7.5
              CAT=VERB +AUX -MOD -PART TEN=PRES NUM=UNMKD PER=UNMKD LEX=HAVE
    37  HV D  had, 'd
              CAT=VERB +AUX -MOD -PART TEN=PRET NUM=UNMKD PER=UNMKD LEX=HAVE
    38  HV G  having
              CAT=VERB +AUX -MOD +PART TEN=PRES LEX=HAVE
    39  HV N  had (past participle)
              CAT=VERB +AUX -MOD +PART TEN=PRET LEX=HAVE
    40  HV Z  has, 's
              CAT=VERB +AUX -MOD -PART TEN=PRES NUM=SING  PER=3 LEX=HAVE
 
    41  IN    preposition (about, above, etc.) 7.13, 7.15
 
    42  JJ    adjective 7.3-4, 7.8-9, 7.11
    43  JJ B  attributive-only adjective (chief, main, entire, etc.)
    44  JJ R  comparative adjective 7.9, 7.11
    45  JJ T  superlative adjective 7.9, 7.11
    46  J  NP adj with word-initial capital (English, German, etc.)
 
    47  MD    modal auxiliary
                CAT=VERB +AUX +MOD TEN=PRES NUM=UNMKD PER=UNMKD
 
    48  N C        cited word 7.23
        NC         CAT=NOUN N=SING CASE=NOM -PROP -CAP -UNIT +CITE
    49  N N        noun, sg, common 7.4, 7.6, 7.7
        NN         CAT=NOUN N=SING CASE=NOM -PROP -CAP -UNIT -CITE
    50  N N     $  noun, sg, common, + genitive 7.6
        NN$        CAT=NOUN N=SING CASE=GEN -PROP -CAP -UNIT -CITE
    51  N N P      noun, sg, common, with word-initial capital 7.7
        NNP        CAT=NOUN N=SING CASE=NOM -PROP +CAP -UNIT -CITE
    52  N N P   $  noun, sg, common, with word-init cap and genitive
        NNP$       CAT=NOUN N=SING CASE=GEN -PROP +CAP -UNIT -CITE
    53  N N P S    noun, pl,   common, with word-init cap
        NNPS       CAT=NOUN N=PLUR CASE=NOM -PROP +CAP -UNIT -CITE
    54  N N P S $  noun, pl,  common, with word-init cap and genitive
        NNS$       CAT=NOUN N=PLUR CASE=GEN -PROP +CAP -UNIT -CITE
    55  N N   S    noun, pl,   common 7.6, 7.7
        NNS        CAT=NOUN N=PLUR CASE=NOM -PROP -CAP -UNIT -CITE
    56  N N   S $  noun, pl,   common, + genitive
        NNS$       CAT=NOUN N=PLUR CASE=GEN -PROP -CAP -UNIT -CITE
    57  N N U      noun, abbrev unit of measurement (hr., lb., etc.)
        NNU        CAT=NOUN N=SING CASE=NOM -PROP -CAP +UNIT -CITE
    58  N N U S    noun, abbrev unit of measurement, pl (gns, yds, etc.)
        NNUS       CAT=NOUN N=PLUR CASE=NOM -PROP -CAP +UNIT -CITE
    59  N P        noun, sg, proper 7.7
        NP         CAT=NOUN N=SING CASE=NOM +PROP +CAP -UNIT -CITE -LOC -TITL -PS.ADV
    60  N P     $  noun, sg, proper, + genitive
        NP$        CAT=NOUN N=SING CASE=GEN +PROP +CAP -UNIT -CITE -LOC -TITL -PS.ADV
    61  N P L      noun, sg, locative with word-initial cap (Abbey,
        NPL        CAT=NOUN N=SING CASE=NOM +PROP +CAP -UNIT -CITE +LOC -TITL -PS.ADV
    62  N P L   $  ditto + genitive
        NPL$       CAT=NOUN N=SING CASE=GEN +PROP +CAP -UNIT -CITE +LOC -TITL -PS.ADV
    63  N P L S    noun, pl, locative with word-initial cap
        NPLS       CAT=NOUN N=PLUR CASE=NOM +PROP +CAP -UNIT -CITE +LOC -TITL -PS.ADV
    64  N P L S $  ditto + genitive
        NPLS$      CAT=NOUN N=PLUR CASE=GEN -PROP +CAP -UNIT -CITE +LOC -TITL -PS.ADV
    65  N P   S    noun, pl, proper 7.7
        NPS        CAT=NOUN N=PLUR CASE=NOM +PROP +CAP -UNIT -CITE -LOC -TITL -PS.ADV
    66  N P   S $  noun, pl, proper, + genitive
        NPS$       CAT=NOUN N=PLUR CASE=GEN +PROP +CAP -UNIT -CITE -LOC -TITL -PS.ADV
    67  N P T      noun, sg, titular with word-initial cap
        NPT        CAT=NOUN N=SING CASE=NOM +PROP +CAP -UNIT -CITE -LOC +TITL -PS.ADV
    68  N P T   $  noun, sg, titular, cap, + genitive
        NPT$       CAT=NOUN N=SING CASE=GEN +PROP -CAP -UNIT -CITE -LOC +TITL -PS.ADV
    69  N P T S    noun, pl, titular, cap
        NPTS       CAT=NOUN N=PLUR CASE=NOM +PROP -CAP -UNIT -CITE -LOC +TITL -PS.ADV
    70  N P T S $  noun, pl, titular, cap, + genitive
        NPTS$      CAT=NOUN N=PLUR CASE=GEN +PROP -CAP -UNIT -CITE -LOC +TITL -PS.ADV
    71  N R        noun, sg, adverbial (Jan, Feb, east, today,
        NR         CAT=NOUN N=SING CASE=NOM -PROP -CAP -UNIT -CITE -LOC -TITL +PS.ADV
    72  N R     $  noun, sg, adverbial + genitive
        NR$        CAT=NOUN N=SING CASE=GEN -PROP -CAP -UNIT -CITE -LOC -TITL +PS.ADV
    73  N R   S    noun, pl,   adverbial
        NRS        CAT=NOUN N=PLUR CASE=NOM -PROP -CAP -UNIT -CITE -LOC -TITL +PS.ADV
    74  N R   S $  noun, pl,   adverbial + genitive
        NRS$       CAT=NOUN N=PLUR CASE=GEN -PROP -CAP -UNIT -CITE -LOC -TITL +PS.ADV
 
    75  OD    ordinal (1st, 2nd, first, ...) 7.17
    76  OD $  ordinal + genitive
 
    77  P N    nominal pron (anybody, anyone, anything; everybody,
    78  P N $  nominal pron + genitive
    79  P P $  poss determiner (my, your, etc.) 7.12
    80  P P $$ poss pron (mine, yours, etc.)
    81  P P 1 A   pers pron, 1st pers sing nom (I)
    82  P P 1 A S pers pron, 1st pers plur nom (we)
    83  P P 1 O   pers pron, 1st pers sing acc (me)
    84  P P 1 O S pers pron, 1st pers plur acc (us)
    85  P P 2     pers pron, 2nd pers (you, thou, thee, ye)
    86  P P 3     pers pron, 3rd pers sing nom + acc (it)
    87  P P 3 A   pers pron, 3rd pers sing nom (he, she)
    88  P P 3 A S pers pron, 3rd pers plur nom (they)
    89  P P 3 O   pers pron, 3rd pers sing acc (him, her)
    90  P P 3 O S pers pron, 3rd pers plur acc (them, 'em)
    91  P P L     refl pron, sg
    92  P P L  S  refl pron, pl; reciprocal pron
 
    93  QL   qualifier (as, awfully, less, more, so, too, very, ...)
    94  QL P post-qualifier (enough, indeed)
 
    95  R B    adverb 7.10-7.11
               CAT=ADV DEG=POS  CASE=UNMKD -PSEUDO.NOUN -ALSO.PREP -WH
    96  R B $  adverb + genitive (else's)
               CAT=ADV          CASE=GEN   -PSEUDO.NOUN -ALSO.PREP -WH
    97  R B R  comparative adverb
               CAT=ADV DEG=COMP CASE=UNMKD -PSEUDO.NOUN -ALSO.PREP -WH
    98  R B T  superlative adverb
               CAT=ADV DEG=SUP  CASE=UNMKD -PSEUDO.NOUN -ALSO.PREP -WH
    99  R I    adverb (homograph of preposition:  below, near, ...)
               CAT=ADV DEG=POS  CASE=UNMKD -PSEUDO.NOUN +ALSO.PREP -PTCL -WH
    100 R N    nominal adverb (here, now, there, then) 7.10
               CAT=ADV DEG=POS CASE=UNMKD  +PSEUDO.NOUN -ALSO.PREP -WH
    101 R P    adverbial particle (back, down, off, ...) 7.10, 7.13
               CAT=ADV DEG=POS CASE=UNMKD  -PSEUDO.NOUN +ALSO.PREP +PTCL -WH
 
    102 TO  infinitival 'to'
                CAT=TO
 
    103 UH  interjection
                CAT=INTERJECTION
 
    104 VB    base form of verb (uninflected present tense, imper)
                CAT=VERB -AUX -MOD -PART TEN=PRES NUM=UNMKD PER=UNMKD
    105 VB D  past tense of verb 7.3
                CAT=VERB -AUX -MOD -PART TEN=PRET
    106 VB G  present participle, gerund 7.4
                CAT=VERB -AUX -MOD +PART TEN=PRES
    107 VB N  past participle 7.3
                CAT=VERB -AUX -MOD +PART TEN=PRET
    108 VB Z  3d person sg
                CAT=VERB -AUX -MOD -PART TEN=PRES NUM=SING PER=3
 
    109 W DT     WH-determiner (what, whatever, interrogative
    110 W DT R   WH-determiner, relative (which) 7.16
    111 W P      WH-pron, interrogative, nom+acc (who, whoever)
    112 W P  $   WH-pron, interrogative, gen (whose)
    113 W P  $ R WH-pron, relative, gen (whose)
    114 W P  A   WH-pron, nom (whosoever)
    115 W P  O   WH-pron, interrogative, acc (whom, whomsoever)
    116 W P  O R WH-pron, relative, acc (whom)
    117 W P    R WH-pron, relative, nom+acc (that, relative who) 7.14,
    118 W RB     WH-adverb (how, when, ...) 7.16
 
    119 XNOT 'not'
 
    120 ZZ  letter
 
    121 !   exclamation mark
    122 &FO formula 7.22
    123 &FW foreign word 7.21
    124 (   left bracket (round or square)
    125 )   right bracket (round or square)
    126 *'  begin quote (single or double) 2.6
    127 **' end quote (single or double 2.6
    128 *-  dash 7.24
    129 ,   comma 7.24
    130 .   full stop 7.24
    131 ... ellipsis
    132 :   colon 7.24
    133 ;   semicolon 7.24
    134 ?   question mark

Tagging Parts of Speech

C. M. Sperberg-McQueen

TEI ED W12

May 1990

Abstract

1. Introduction

2. Description of the LOB Tags

3. LOB Verb Tags

4. Naive Transcription into SGML

5. Transcription into SGML Using Feature Structures

6. Interpretation of Missing Features

7. Definitions of Primitive Grammatical Elements

8. Major Categories

9. Lexical Subcategorizations

9.1. Number, Person, Case, Gender, and Other Grammatical Features

9.2. Miscellaneous Features

10. Combinations of Primitives

A. Definition of all LOB tags in feature-structure notation

Binary Features

N-way Features

Cross-Category Groupings in LOB

B. Definitions of LOB, Brown, and Lancaster Tags

LOB tags

Notes