[sci.lang] Computational Linguistics Bibliography by E-Mail

goldberg@russell.STANFORD.EDU (Jeffrey Goldberg) (10/21/87)

It is possible to do a keyword search on a > 1700 entry
bibliography of work in computational linguistics published in
the 1980's.  Here is how:


Computational Linguistics & Natural Language Processing Bibliography by Mail

There is a large (> 1700 items) bibliography of 1980s natural
language processing and computational linguistics sitting on a
Sun called Russell at CSLI.  Anyone with a computer account can
now search this bibliography and get a listing of the result by
using electronic mail.


INSTRUCTIONS

The keywords used for the lookup are to be given in the subject line of
your mail message addressed to clbib@russell.stanford.edu  (36.9.0.9).
The body of your message will be thrown away.

Here is an example: 

	% mail clbib@Russell.Stanford.EDU
	Subject: Woods ATN 1980
	.
	EOT
	Null message body; hope that's okay
	%

Or more compactly:

        % Mail -s "woods atn 1980" clbib@Russell.Stanford.EDU < /dev/null

And here is what you would receive in return:

>>>	Date: Wed, 11 Jul 87 12:03:35 PST
>>>	To: yourname
>>>	Subject: CLBIB search: Woods ATN ...

	%A T.P. Kehler
	%A R.C. Woods
	%T ATN grammar modeling in applied linguistics
	%D 1980
	%P 123-126
	%J ACL Proceedings, 18th Annual Meeting

	%A William A. Woods
	%T Cascaded ATN grammars
	%D 1980
	%V 6
	%N 1
	%P 1-12
	%J American Journal of Computational Linguistics

This example show one mailing from a Unix machine, but you can
mail CLBIB from any machine and get a result, provided you
remember to put your search keys in the "Subject:" field of the
message.

The entries you get are in standard Unix 'refer' format (see the man page).
You may put between one and eight keywords in the mail "Subject: "
field, and each keyword can be any string of characters (name,
date, topic, etc.) that you think likely to be found in the items
of interest (case is ignored).  The list of keywords is interpreted
conjunctively: "Woods" gets you everything published by anyone
called "Woods" in the 1980s, whereas "Woods 1983" narrows that down
to just the 1983 papers (or papers whose first or last page number
is "1983") by persons named "Woods" (or whose title refers to "woods"),
and, of course, there may be no such items (so the reply would contain
nothing).  Only the first six characters in a keyword are significant,
so "generation" is indistinguishable from "generalized", and "Anderson"
is indistinguishable from "Andersson".  You should bear this in mind
when you consider the relevance of what you receive to your intended
request.

To take up less CPU at this end, please use as your first keyword the
one that will narrow selections down the most.  The first key may not be
a year.

If the first key is "help", you will be sent this file.


BUGS

The system is no better than the mail connections.

This system is worse than the mail connections.

The return address is determined only from information in the "From" field.
"Reply-To:" should be checked but it is not.

The return parsing is stupid and doesn't know all there is to know about
RFC822 mail headers.

The "From" and "Subject" fields must have exactly the "F" and the "S"
in uppercase.

It is impossible to seach for only the item "help".  (You get this file if
the first key on a subject line is "help")

It is impossible to get all of the entries for one year.  [This is not a
bug.  If you want the entire list you can follow the instructions about
such things below.]

The mail handling scripts were written by linguists, not by
programmers.  The scripts are fragile and the system may be
taken down without notice at anytime.



THE BIBLIOGRAPHY

Some sense of the scope of the bibliography can be gathered from
the following summary information.  Here are the authors who find
themselves with a dozen or more of their 1980s publications included:

	25 Aravind K. Joshi
	19 Bonnie Lynn Webber
	18 Robert C. Berwick
	18 Jaime G. Carbonell
	17 David D. McDonald
	15 Philip J. Hayes
	15 Wendy G. Lehnert  
	15 Fernando C.N. Pereira
	14 Kathleen R. McKeown
	14 Karen Sparck-Jones
	13 Eugene Charniak
	13 Barbara J. Grosz
	13 Jerry R. Hobbs
	13 Martin Kay
	13 Stuart M. Shieber
	12 Douglas E. Appelt
	12 Philip R. Cohen
	12 C. Raymond Perrault
	12 Graeme D. Ritchie
	12 Ralph M. Weischedel
	12 Yorick A. Wilks

And the papers included distribute across the years like this:

	1980:	     207
	1981:	     138
	1982:	     211
	1983:	     240
	1984:	     219
	1985:	     247
	1986:	     353
	1987:	     117

The 1987 figure includes the contents of this year's ACL Proceedings,
and the relevant papers in AAAI-87, but not those from the upcoming
IJCAI meeting in August nor the as-yet-unpublished 1987 European ACL
Proceedings.

Machine-readable copies of
the entire bibliography are available on standard MS-DOS 360K DS/DD disks.
Write to Ms Sheila Lee, CSRP Series, School of Cognitive Sciences,
University of Sussex, BRIGHTON BN1 9QN, UK, asking for a copy of
the CL-NLP8X.BIB bibliography disk, and enclose a check for $16.00 to
cover media, handling, packing and postage costs.


A hardcopy version
of the entire bibliography with a permuted index of titles and an index to
nonprimary authors is to be published by CSLI/Chicago University Press
in November 1987 - details below:

	%A  Gerald Gazdar
	%A  Alex Franz
	%A  Karen Osborne
	%A  Roger Evans
	%D  1987 - in press
	%T  Natural Language Processing in the 1980's - A Bibliography
	%C  Stanford
	%S  CSLI Lecture Notes
	%I  Chicago University Press

If there is a problem with this program please send a note to:

clbib-request@Russell.stanford.edu

But only questions about the mailing system can be dealt with.  Problems
with the content of the bibliography (typos, omissions, etc) are not
something that we are capable of coping with here.

SEE ALSO

refer(1) Mail(1) tib(local)

AUTHORS & ACKNOWLEDGEMENTS

The bibliography was compiled at the University of Sussex under the
direction of Gerald Gazdar by Gerald Gazdar, Alex Franz, Karen Osborne,
and Roger Evans.  Initial c-shell scripts were written by Evans and
Gazdar at Sussex.  They were overhauled by Jeff Goldberg at CSLI.

In addition to more standard Unix tools (awk(1), sed(1), Mail(1), etc),
refer(1) (available on most Unix distributions) and Tib (available on the
Unix TeX distribution) are employed.

Unix is a trade mark of AT&T.

SUMMARY

To search bibliography mail to clbib@Russell.stanford.edu with the keywords
for the search as your Subject line.

To get a help file send to clbib@Russell.stanford.edu with "help" as the first
keyword in your subject line.

To get in touch with real people, send to clbib-request@Russell.stanford.edu

Information about getting a hardcopy of the bibliography with indicies will
be forthcoming any day now.
-- 
Jeff Goldberg 
ARPA   goldberg@russell.stanford.edu
UUCP   ...!ucbvax!russell.stanford.edu!goldberg