goldberg@russell.STANFORD.EDU (Jeffrey Goldberg) (10/21/87)
It is possible to do a keyword search on a > 1700 entry bibliography of work in computational linguistics published in the 1980's. Here is how: Computational Linguistics & Natural Language Processing Bibliography by Mail There is a large (> 1700 items) bibliography of 1980s natural language processing and computational linguistics sitting on a Sun called Russell at CSLI. Anyone with a computer account can now search this bibliography and get a listing of the result by using electronic mail. INSTRUCTIONS The keywords used for the lookup are to be given in the subject line of your mail message addressed to clbib@russell.stanford.edu (36.9.0.9). The body of your message will be thrown away. Here is an example: % mail clbib@Russell.Stanford.EDU Subject: Woods ATN 1980 . EOT Null message body; hope that's okay % Or more compactly: % Mail -s "woods atn 1980" clbib@Russell.Stanford.EDU < /dev/null And here is what you would receive in return: >>> Date: Wed, 11 Jul 87 12:03:35 PST >>> To: yourname >>> Subject: CLBIB search: Woods ATN ... %A T.P. Kehler %A R.C. Woods %T ATN grammar modeling in applied linguistics %D 1980 %P 123-126 %J ACL Proceedings, 18th Annual Meeting %A William A. Woods %T Cascaded ATN grammars %D 1980 %V 6 %N 1 %P 1-12 %J American Journal of Computational Linguistics This example show one mailing from a Unix machine, but you can mail CLBIB from any machine and get a result, provided you remember to put your search keys in the "Subject:" field of the message. The entries you get are in standard Unix 'refer' format (see the man page). You may put between one and eight keywords in the mail "Subject: " field, and each keyword can be any string of characters (name, date, topic, etc.) that you think likely to be found in the items of interest (case is ignored). The list of keywords is interpreted conjunctively: "Woods" gets you everything published by anyone called "Woods" in the 1980s, whereas "Woods 1983" narrows that down to just the 1983 papers (or papers whose first or last page number is "1983") by persons named "Woods" (or whose title refers to "woods"), and, of course, there may be no such items (so the reply would contain nothing). Only the first six characters in a keyword are significant, so "generation" is indistinguishable from "generalized", and "Anderson" is indistinguishable from "Andersson". You should bear this in mind when you consider the relevance of what you receive to your intended request. To take up less CPU at this end, please use as your first keyword the one that will narrow selections down the most. The first key may not be a year. If the first key is "help", you will be sent this file. BUGS The system is no better than the mail connections. This system is worse than the mail connections. The return address is determined only from information in the "From" field. "Reply-To:" should be checked but it is not. The return parsing is stupid and doesn't know all there is to know about RFC822 mail headers. The "From" and "Subject" fields must have exactly the "F" and the "S" in uppercase. It is impossible to seach for only the item "help". (You get this file if the first key on a subject line is "help") It is impossible to get all of the entries for one year. [This is not a bug. If you want the entire list you can follow the instructions about such things below.] The mail handling scripts were written by linguists, not by programmers. The scripts are fragile and the system may be taken down without notice at anytime. THE BIBLIOGRAPHY Some sense of the scope of the bibliography can be gathered from the following summary information. Here are the authors who find themselves with a dozen or more of their 1980s publications included: 25 Aravind K. Joshi 19 Bonnie Lynn Webber 18 Robert C. Berwick 18 Jaime G. Carbonell 17 David D. McDonald 15 Philip J. Hayes 15 Wendy G. Lehnert 15 Fernando C.N. Pereira 14 Kathleen R. McKeown 14 Karen Sparck-Jones 13 Eugene Charniak 13 Barbara J. Grosz 13 Jerry R. Hobbs 13 Martin Kay 13 Stuart M. Shieber 12 Douglas E. Appelt 12 Philip R. Cohen 12 C. Raymond Perrault 12 Graeme D. Ritchie 12 Ralph M. Weischedel 12 Yorick A. Wilks And the papers included distribute across the years like this: 1980: 207 1981: 138 1982: 211 1983: 240 1984: 219 1985: 247 1986: 353 1987: 117 The 1987 figure includes the contents of this year's ACL Proceedings, and the relevant papers in AAAI-87, but not those from the upcoming IJCAI meeting in August nor the as-yet-unpublished 1987 European ACL Proceedings. Machine-readable copies of the entire bibliography are available on standard MS-DOS 360K DS/DD disks. Write to Ms Sheila Lee, CSRP Series, School of Cognitive Sciences, University of Sussex, BRIGHTON BN1 9QN, UK, asking for a copy of the CL-NLP8X.BIB bibliography disk, and enclose a check for $16.00 to cover media, handling, packing and postage costs. A hardcopy version of the entire bibliography with a permuted index of titles and an index to nonprimary authors is to be published by CSLI/Chicago University Press in November 1987 - details below: %A Gerald Gazdar %A Alex Franz %A Karen Osborne %A Roger Evans %D 1987 - in press %T Natural Language Processing in the 1980's - A Bibliography %C Stanford %S CSLI Lecture Notes %I Chicago University Press If there is a problem with this program please send a note to: clbib-request@Russell.stanford.edu But only questions about the mailing system can be dealt with. Problems with the content of the bibliography (typos, omissions, etc) are not something that we are capable of coping with here. SEE ALSO refer(1) Mail(1) tib(local) AUTHORS & ACKNOWLEDGEMENTS The bibliography was compiled at the University of Sussex under the direction of Gerald Gazdar by Gerald Gazdar, Alex Franz, Karen Osborne, and Roger Evans. Initial c-shell scripts were written by Evans and Gazdar at Sussex. They were overhauled by Jeff Goldberg at CSLI. In addition to more standard Unix tools (awk(1), sed(1), Mail(1), etc), refer(1) (available on most Unix distributions) and Tib (available on the Unix TeX distribution) are employed. Unix is a trade mark of AT&T. SUMMARY To search bibliography mail to clbib@Russell.stanford.edu with the keywords for the search as your Subject line. To get a help file send to clbib@Russell.stanford.edu with "help" as the first keyword in your subject line. To get in touch with real people, send to clbib-request@Russell.stanford.edu Information about getting a hardcopy of the bibliography with indicies will be forthcoming any day now. -- Jeff Goldberg ARPA goldberg@russell.stanford.edu UUCP ...!ucbvax!russell.stanford.edu!goldberg