[comp.ai] understanding bibliographic references

finin@antares.PRC.Unisys.COM (Tim Finin) (05/20/88)

We have a need to process bibliographic references, extracting the
relevant information encoded in them.  That is, to take a reference like:

	J. W. Wallis and Edward H. Shortliffe.  Customizing
	explanations using causal knowledge. In Bruce G. Buchanan and
	Edward H. Shortiffe, editors, Rule Based Expert Systems,
	Addison-Wesley, Reading, MA, 1984.

and to produce a data structure something like:

	((type bookChapter)
	 (author "J. W. Wallis and Edward H. Shortliffe")
 	 (title "Customizing Explanations Using Causal Knowledge")
	 (book (title "Rule-Based Expert Systems")
	       (publisher "Addison-Wesley")
	       (editor "Bruce G. Buchanan and Edward H. Shortliffe")
	       (year "1984")
	       (address "Reading, MA")))

Put simply, we want to develop a system thast does what BibTeX does,
but in reverse.  It should work for references to a variety of types
of documents (e.g.  journal articles, books, technical reports,
theses, etc), and bibliographic styles.  It should have clear
domain-independant knowledge (e.g.  "Edward" is a given name, MA can
be an abbreviation for Massachusettes which is the name of a state,
1984 is a good value for a year of publication, etc.)  and
domain-dependant knowledge (e.g.  what IJCAI means, that BBN is a
company which has a technical reports series, etc).  This would ease
its porting from one domain (e.g. AI) to another (e.g. fluid dynamics).

Such a system would probably be an interesting application drawing on
aspects of computational lingusitics (e.g. parsing, sub-language
theory, proper name recognition), and knowledge-based expert systems
(e.g. expectation-driven parsing, domain modeling).  I'm interested in
getting pointers to any research on systems like this.  I can't recall
hearing of any.

Tim




Tim Finin			finin@prc.unisys.com
Paoli Research Center		..!{psuvax1,sdcrdcf,cbmvax,bpa}!burdvax!finin
Unisys Corporation		215-648-7446 (o)  
PO Box 517, Paoli PA 19301	215-386-1749 (h)