Article 695 of comp.compression.research: Xref: mv comp.compression.research:695 Path: mv!news.sprintlink.net!news.uoregon.edu!waikato!auckland.ac.nz!news From: Daniel Sleator Newsgroups: comp.compression.research Subject: Link Grammar Parser, version 2 Date: 15 Oct 1995 12:28:21 GMT Organization: CMU Lines: 114 Sender: compression@cs.aukuni.ac.nz (comp.compression.research moderator) Approved: compression@cs.aukuni.ac.nz Message-ID: <45qup5$oiq@net.auckland.ac.nz> Reply-To: Daniel Sleator NNTP-Posting-Host: cs26.cs.auckland.ac.nz X-Newsreader: NN version 6.5.0 #3 (NOV) In Spring 1992, we released version 1 of our "link grammar parser". This is a syntactic parser for English, based on an original theory of syntax related to dependency grammar. Several hundred people took copies of the parser, and a number of people reported that they were using it or were planning to use it for various projects. However, the parser had a number of weaknesses, and its coverage was not sufficient for it to be of much use to people. With the help of Dennis Grinberg and John Lafferty, we have now released version 2, which is significantly better than version 1. Some of the advantages are described below. We have also created a web site about the system: http://bobo.link.cs.cmu.edu/grammar/html/intro.html This site contains a lot of information about the parser, and allows you to experiment with it. The parser and its documentation are available via anonymous ftp: /afs/cs/user/sleator/public/link-grammar on host ftp.cs.cmu.edu We think the parser could now be useful for a variety of applications that involve recovering the syntactic structure of text. These might include speech recognition, speech generation, grammar checking, machine translation, and language understanding systems. Davy Temperley Daniel Sleator ....................................................................... Daniel Sleator Office: 412-268-7563 Professor of Computer Science Fax: 412-268-5576 Carnegie Mellon University Home: 412-362-8675 Pittsburgh, PA 15213 sleator@cs.cmu.edu IMPROVEMENTS 1. The new version is "robust". The old version could not assign any syntactic structure to a sentence unless it could completely interpret the sentence. The new version is able to skip over portions of the sentence that it cannot understand, and assign some structure to the rest of the sentence. 2. Quite apart from the "robustness" feature, the parser's coverage is vastly improved. The old system could only fully parse about 30% of sentences in a typical Wall Street Journal article. The new version can find complete parses for 70-75% of such sentences. 3. The new version has a much larger dictionary. The old version had about 25000 words; the new version has about 59000. (Here we count individual forms of verbs and nouns: e.g., "chase", "chases", "chased", and "chasing" are counted as separate words. The number of "stem" words is probably about 30000.) 4. The new version has an "unknown word" feature. It has a general syntactic category which it assigns to any word which it does not recognize. (In the process, it labels the unknown word as a noun, verb, adjective, or adverb.) 5. The parser has a "two-stage" system. At the first stage, it considers common syntactic constructions; the "stage one" coverage is roughly comparable to the coverage of the earlier version. In the second stage, it considers many less common constructions. Here are a few examples of "stage-two" constructions: Plural nouns acting as noun modifiers ("He was booked on a weapons violations charge") Adjectival nouns preceding adjectives ("City clerical workers went on strike today") Prepositional phrases modifying verbs, but preceding the direct object ("She sold for five dollars the ring her mother gave her"). Manner adverbs modifying adjectives ("The delicately quiet tone of the cello blended well with the fiercely percussive piano chords") Unusual cases of subject-verb inversion ("Also invited to the meeting were several prominent scientists") Auxiliaries without main verbs ("If you don't want to do it, you should find someone who will") Unusual uses of gerunds ("We have to talk about this sleeping in class and girl chasing") Noun-phrases introducing proper names ("The actress Whoopi Goldberg and the singer Michael Jackson attended the ceremony") Hyphenated expressions as noun-phrases ("The buy-out caused a free-for-all in the mid-afternoon") 6. The post-processing system released with the earlier version has been improved. There is now a "wild-card" character for post-processing, allowing rules to be expressed much more parsimoniously. 7. The new version has greatly improved documentation. We have compiled a "guide-to-links", describing every connector type and every syntactic construction covered by the parser. The guide also contains a complete description of post-processing. We also provide a general introduction (in a file called "manual") to the system, describing the general logic of link grammars and the post-processing system and the notational system we use for expressing them, as well as a number of special features of the parser. We hope this will allow people to modify the system substantially if they wish or design their own versions (e.g. dictionaries for other languages). 8. The dictionary is using a different (and much more logical) link naming scheme.