[news.groups] What'll be the format of comp.bibliography?

dupuy@cs.columbia.edu (Alexander Dupuy) (08/08/90)

I don't see much problem with duplicate bibliographies - people can decide for
themselves which is more useful/accurate/whatever - but more of a problem would
be the question of the database formats.

The first issue would be whether any format would be used at all.  Given the
experience of comp.archives, I would expect that trying to define a new format
would almost certainly be a failure, and even using an existing one, at least
half the postings wouldn't use it (or any other any standard format).

There are currently three major bibliography formats out there (that I know of,
anyhow) not counting library software systems.  One is Unix refer(1) format,
documented in addbib(1), and the other two are Scribe and BibTeX.  BibTeX
format is pretty much a subset of Scribe's with one or two minor exceptions.
Both refer and Scribe/BibTeX format have their own advantages and
disadvantages.

Unix refer format is more fixed in structure, and thus more amenable to
database-style operations (e.g. sortbib, indxbib, lookbib).  It has the
advantage that it comes pretty much standard with Unix.  Although the defined
fields are somewhat more regular than Scribe/BibTeX format, they aren't quite
as extensive.

Scribe/BibTeX format is more freeform, but requires classification of the
document type (i.e. article, book, proceedings, unpublished, etc.).  It has the
advantage that Scribe and BibTeX can both understand a common subset format,
and both provide support for generating bibligraphies and references in a
number of styles (e.g.  CACM, IEEE, etc.)

A sample refer format bibliography entry might look like this:

%K Miscellaneous
%A David P. Anderson
%A Robert Wahbe
%T A Framework for Multimedia Communication in a General-Purpose Distributed Sys
tem
%R Technical Report 89/498
%I UC Berkeley CS Division
%D March 1989
%X \fBAbstract:\fP
Motivates the design given in TR 88/462, gives some comparisons,
and discusses implications for protocol and local system design.
Description of channel parameters supercedes TR 88/462.


The same bibliography entry in Scribe/BibTeX common subset might look like:

@TechReport(UCBTR-89-498,
	Author = "David P. Anderson and Robert Wahbe",
	Title = "A Framework for Multimedia Communication in a General-Purpose
 Distributed System",
	Institution = "UC Berkeley CS Division",
	Number = "89/498", Month = "March", Year = "1989",
	Abstract = {
Motivates the design given in TR 88/462, gives some comparisons,
and discusses implications for protocol and local system design.
Description of channel parameters supercedes TR 88/462.} )

It is more or less feasible to convert from one format to the other (easier, I
think when going from refer to Scribe/BibTeX, which is why I prefer refer).

I'll follow this article with a posting describing each format in more detail,
and some notes I've made on conversions between them.

@alex
--
-- 
inet: dupuy@cs.columbia.edu
uucp: ...!rutgers!cs.columbia.edu!dupuy

dupuy@cs.columbia.edu (Alexander Dupuy) (08/08/90)

The following are some notes I've made on the various bibliography formats - at
the end is an outline of a heuristic method for determining a Scribe/BibTeX
classification for refer style bibliography entries.

@alex
_______________________________________________________________________________

References are kept in Unix refer format, described below:

  Bibliography Key Letters

     The most common key-letters and their meanings are given below.

          %A   Author's name

          %B   Book (or Proceedings) containing article referenced

          %C   City (place of publication)

          %D   Date of publication

          %E   Editor of book containing article referenced

          %F   Footnote number or label (supplied by refer)

          %G   Government order number

          %H   Header commentary, printed before reference

          %I   Issuer (publisher)

          %J   Journal containing article referenced

          %K   Keywords to use in locating reference

          %L   Label field used by -k option of refer

          %M   Bell Labs Memorandum (undefined)

          %N   Number within volume

          %O   Other commentary, printed at end of reference

          %P   Page number(s)

          %Q   Corporate or Foreign Author (unreversed)

          %R   Report, paper, or thesis (unpublished)

          %S   Series title

          %T   Title of article or book

          %V   Volume number

          %X   Abstract - used by roffbib, not by refer

          %Y,Z Ignored by refer


In practice, some of these conventions are ignored:

 %J is often used for Conference Proceedings, where %B is correct

 %J and %N are used for Technical Reports, where %R (or %M?) is correct

 %A is used in cases where %Q should be used for organizational authors


In order to encode certain information needed by Scribe and BibTeX, we will use
the %Y and %Z fields:

          %Y   Classification category (see below for list)

          %Z   Additional fields: FIELD = "val", FIELD = "val" ...

Valid classification categories are:

        ARTICLE
        BOOK
	BOOKLET
        CONFERENCE
        MANUAL
        MASTERSTHESIS
        MISC
        PHDTHESIS
        PROCEEDINGS
        TECHREPORT
        UNPUBLISHED
_______________________________________________________________________________


Another format is bibtex format, described below


Bibliography File

     The bibliography file (.bib) format is just about  a  subset
     of  that  allowed in Scribe bibliographies.  Only the delim-
     iter pairs {...}  and  "..."  are  allowed  inside  entries.
     Entries  themselves  can  be delimited by (...) also.  The =
     sign between field names and field values is not optional.

     There are a number of conventions that  should  be  followed
     when writing .bib files.  These are not requirements of bib-
     tex, but standard bibliography style  files  will  typically
     expect these conventions to be followed.

     References should be categorized as in Scribe  into  one  of
     the  categories:  article,  book, booklet, inbook, incollec-
     tion, inproceedings, manual, mastersthesis, misc, phdthesis,
     proceedings,  techreport,  and  unpublished.  See the Scribe
     manual for the fields that must/can appear in each  type  of
     reference.

     The title field should be entered in uppers-and-lowers  for-
     mat,  where  everything  is  capitalized except articles and
     unstressed conjunctions and prepositions, and even those are
     capitalized  if  they  are  the first word or the first word
     after a colon.  Some style  files  will  convert  all  words
     except  the  first  to all lowercase.  This is a mistake for
     things like proper nouns, so you have to tell bibtex not  to
     touch  such  capital letters by enclosing them in braces, as
     in "Dogs of {A}merica".  It is unlikely that any style  file
     would  attempt  to  convert  book  titles  to  lowercase, so
     perhaps you can omit braces in such titles.

     The author and editor fields should conform to a  particular
     format, so that the style file can parse them into parts.  A
     name can have four parts: first, von, last, junior, each  of
     which can consist of more than one word.  For example, "John
     Paul von Braun, Jr." has "John  Paul"  as  the  first  part,
     "von"  as  the von part, "Braun" as the last part, and "Jr."
     as the junior part.  Use one of these formats for a name:
          First von Last
          von Last, First
          von Last, Junior, First
     The last part is assumed to be one word, or  all  the  words
     after the von part.  Bibtex will treat anything in braces as
     one word, so use braces to surround last names that  contain
     more  than  one word.  The von part is recognized by looking
     for words that begin with lowercase letters.  When possible,
     enter the full first name(s);  style files may abbreviate by
     taking the first letter.  Actually, the rules for  isolating
     the  name  parts  are a bit more complicated, so they do the
     right thing for names  like  "de  la  Grand  Round,  Chuck".
     There is no need for a field like Scribe's fullauthor field.

     If there are multiple authors or editors, they should all be
     separated  by  the  word and.  Scribe's editors field should
     not be used, since bibtex style files  can  count  how  many
     names are in an editor field.


_______________________________________________________________________________


And from the Scribe manual page:


BIBLIOGRAPHIES

     Scribe contains a mechanism for automatically  assembling  a
     Bibliography  for  a  document  by  selecting entries from a
     larger bibliographic database.  Scribe expects to  find  the
     information for Bibliography entries in a bibliography data-
     base file (.BIB) in a specific data format.  Each  entry  in
     an .BIB file must have the structure:

               @classification(codeword, list-of-fields)

     The list of valid "classifications" appears in the "Classes"
     subtopic.   The  list  of  valid  "fields"  appears  in  the
     "Fields" subtopic.  The  formatting  of  the  cited  biblio-
     graphic  reference  is  controlled  by  the reference format
     chosen via the @Style command.   Available  formats  can  be
     found in the "Reference_Formats" subtopic.


 CLASSES Scribe's bibliography  classifications are listed below.
Not all classifications  are  used  with each  reference  format.
Refer to the  "Scribe User  Manual" or the "Scribe Advanced  User
Manual" for  details on  which  classifications are  required and
optional for each reference format.

      ARTICLE An article from an academic journal or a magazine.

      BOOK Something published on its own, usually by a  publish-
     ing  house  that  is  not  the same as the author.

      BOOKLET Something published  and bound, but  having neither
     an explicitly-named publisher nor a sponsoring institution.

      CONFERENCE The conference name.

      INBOOK For a reference to a part of a book rather  than  to
     the entire book.

      INCOLLECTION Something composed of papers or chapters  pre-
     viously published elsewhere.

      INPROCEEDINGS A  reference  to  a  paper  in  a  conference
     proceedings or the like.

      MANUAL An instruction manual or piece of technical documen-
     tation.

      MASTERSTHESIS A Masters' thesis.

      MISC Any category not mentioned in this list.

      PHDTHESIS A Ph.D. thesis.

      PROCEEDINGS The proceedings of a conference or some similar
     document.   The identifying characteristics of the classifi-
     cation are that its publisher and author are identical,  and
     often no editor's name appears.

      TECHREPORT A technical report.  Similar to a  book,  except
     that it is published by a research institution instead of by
     a publisher and that it  usually  has  an  assigned  "report
     number".

      UNPUBLISHED Some paper that is in preparation or  that  has
     been printed but not published.


 FIELDS     The following  field  names    are  used in  defining
bibliography database entries.  All take a delimited string or an
abbreviation code as a value.  Not all  field names apply to each
classification;  some are required while  others are  not.  Check
the "Scribe User  Manual" and "Scribe Advanced User  Manual"  for
details.

      ADDRESS The address of the publisher or printer or  organi-
     zation.

      AUTHOR The name(s) of the author or authors, in the  format
     in which they should be printed.

      ANNOTE Any annotation text.  Not actually printed  in  most
     bibliography formats.

      BOOKTITLE The title of a book or proceedings of which  this
     reference is a chapter or paper or article.

      CHAPTER If a reference is being made to part of a book  and
     not the entire book, specify either chapter or pages.

      DATE  Can  be  used instead  of  MONTH   and YEAR  in  some
     reference formats

      EDITION Manuals often have an edition name or  number  that
     is not part of the actual title of the manual.

      EDITOR The name of the editor.  If more than one, use  Edi-
     tors.

      EDITORS The name of the editors.  If only one, use Editor.

      FULLAUTHOR The full name of the author or authors,  written
     out without commas.

      FULLORGANIZATION The  "full" name of  the  organization for
     mailing purposes.

      HOWPUBLISHED For unusual manuscripts; how it came into your
     possession ("personal note", etc.).

      INSTITUTION The  organization  or  institution  backing  or
     publishing a technical report or a proceedings.

      JOURNAL The title of the journal.

      KEY The sort key.  This field is used for alphabetization.

      MEETING Used with the value of the SOCIETY field name.

      MONTH January, February, etc.

      NOTE Any comment, usually used to clarify the reference  or
     to  suggest  alternate sources.  Differs from Annote in that
     Note will always be printed, but Annote will be printed only
     in those bibliography types that specify annotation.

      NUMBER Issue number of a journal or series number in a book
     series or serial number of a technical report.

      ORGANIZATION The name of the organization holding a confer-
     ence that published a proceedings.

      PAGES The page numbers within a  journal,  proceedings,  or
     book  that  contain the material actually cited.

      PUBLISHER The name of the publishing company.

      SCHOOL For theses, the name  of  the  school  granting  the
     degree.

      SERIES When books are published in a series, the series has
     a name.

      TITLE The title of the  book,  article,  thesis,  or  other
     document  that  is  being cited.  Do not italicize or under-
     line; that detail will be handled by the selected  reference
     format.

      TYPE Some technical reports are called by other names, such
     as  "Research  Report",  etc.   If  this is not a "Technical
     Report", put its true name in this field.

      VOLUME The volume number of a journal or a series book.

      YEAR The year of publication; four digits:  1979.


 REFERENCE_FORMATS  Bibliography   format    definitions  in  the
Database are used to control the style and sequencing of the list
of references and the citations.  Select one  with the References
@Style parameter.

      1APA Similiar to the APA format except that it contains  an
     Annote field that is treated as a Comment.

      1APADRAFT Similiar to the 1APA format  except  that  it  is
     double-spaced.

      ANNAPA Similiar to the 1APA format except that  the  Annote
     field is treated as text.

      ANNAPADRAFT Similiar to the 1APADraft  format  except  that
     the Annote field is treated as text.

      ANNOTEDSTDALPHABETIC Same as  StdAlphabetic,  but  includes
     annotations and has filled lines.

      ANNOTEDSTDIDENTIFIER Similiar to the  STDIdentifier  format
     except it includes annotations and has filled lines.

      ANNOTEDSTDNUMERIC
       Same as STDNumeric, but  includes  annotations  (i.e.  the
     contents  of  the  Annote field) in the Bibliography and has
     filled lines.

      ANNSTDALPHABETIC  Similiar  to  the  STDAlphabetic   format
     except it includes annotations and has unfilled lines.

      ANNSTDNUMERIC Similiar to the STDNumeric format  except  it
     includes annotations and has unfilled lines.

      APA  (American  Psychological  Association).    Spelled-out
     citations  (Knuth,  1978),  outdented closed reference list,
     alphabetical ordering of references.   APADRAFT  Draft  ver-
     sion  of  APA  format.  Same as regular version, but triple-
     spaces the Bibliography.

      CACM Numeric citations  [5],  closed  format,  alphabetical
     ordering of references.

      CLOSEDALPHABETIC Similiar to the STDAlphabetic format.

      CLOSEDNUMERIC Similiar to the STDNumeric format.

           5
      IEEE  Superscripted numeric citations, closed format, cita-
     tion sequence ordering of references.

      IPL (Information Processing Letters).  The format  required
     by  IPL.   This  format  is incomplete; it does not have all
     standard Scribe types yet (April 1984) and is being included
     for convenience only.

      NEWAPA (American Psychological Association).  The  new  APA
     format  with  the  Year  following  the Author.  Spelled-out
     citations (Knuth, 1978), outdented  closed  reference  list,
     alphabetical ordering of references.

      SIAM (Society for Industrial and Applied Mathematics).  The
     format  required  by  SIAM  journals.  This format is incom-
     plete; it does not have all standard Scribe types yet (April
     1984) and is being included for convenience only.

      STDALPHABETIC Alphabetic citations [Knuth 78], open format,
     alphabetical ordering of references.

      STDIDENTIFIER Open format, reference identifier  for  cita-
     tions rather that a generated label.

      STDNUMERIC Numeric citations [5], open format, alphabetical
     ordering of references.


COMMANDS

      BIBFORM Defines  a  Bibliography  classification,  such  as
     "Book", for a particular Bibliography reference format.  May
     only  be  used  in  .REF  files  and  the  subtopic  in  the
     "Bibliographies"  entry for available bibliography classifi-
     cations.)

            Format:

              @Bibform(Classification=delimited-definition-string)

EXAMPLES
           1. @BibForm(UnPublished=<
                   @begin(BibEntry)
                   @parm(tag).@@parm(Author), @~
                   "@parm(Title)"@~
                   @Imbed(Note,def ', @Parm(Note)', undef '.')
                   @end(BibEntry)
                     >)

              (Note: Taken from the IEEE.Ref database file.)

           2. @BibForm(Misc=<
                  @begin(BibEntry)
                  @l1{[@parm(tag)]@@imbed(Author,def '@parm(Author).')}
                  @imbed(Title,def '@l2{@parm(Title).}')
                  @imbed(HowPublished,def '@l2{@parm(HowPublished).}')
                  @imbed(Year, def '@l2{@imbed"Month, def {@Parm(Month), }"@~
                          @parm(Year)}')
                  @imbed(Note,def '@l2{@parm(Note).}')
                  @end(BibEntry)
                  >)

              (Note: Taken from the Standa.Lib database file.)


_______________________________________________________________________________


And finally, a mapping from refer keywords to Scribe/BibTeX fields:

        %A      AUTHOR (use last word before comma for KEY)

        %B      BOOKTITLE

        %C      ADDRESS

        %D      [MONTH] YEAR (or DATE, if more than two words)

        %E      EDITOR (or EDITORS, for Scribe)

        %F      ignored

        %G      NUMBER

        %H      NOTE (see also %O)

        %I      INSTITUTION     (TECHREPORT)
		PUBLISHER	([IN]BOOK, BOOKLET, INCOLLECTION)
                ORGANIZATION    (CONFERENCE, [IN]PROCEEDINGS, MANUAL)
                SCHOOL          (MASTERSTHESIS or PHDTHESIS)

        %J      JOURNAL

        %K      ignored

        %L      KEY (also citation name of reference)

        %M      NUMBER, TYPE="Bell Labs Memorandum"

	%N	NUMBER

	%O	NOTE (see also %H)

	%P	PAGES

	%Q	AUTHOR (use first word for KEY)

	%R	parse into TYPE and NUMBER, very messy

	%S	SERIES

	%T	TITLE

	%V	VOLUME

	%X	ANNOTE


The following Scribe/BibTeX fields have to be encoded in %Z:

	CHAPTER
	EDITION
	FULLAUTHOR
	FULLORGANIZATION
	HOWPUBLISHED
	MEETING
	PUBLISHER	(INPROCEEDINGS, PROCEEDINGS)
	SOCIETY


An heuristic for determining the classification type from the refer data:
(evaluate from top to bottom, observing nesting conditionals)

%J present

  %N present and %J contains the string "report"

	TECHREPORT	(and convert %J into TYPE)

  %I present and %I contains any of the strings
	  "univ.", "university", "dept.", "department", "labs", "laboratory",
	  "center", "institut", "division"

	TECHREPORT	(and convert %J into TYPE)

  %J contains any of the strings
	  "proc.", "proceedings", "conf.", "conference", "symp.", "symposium"
	  "congress", "intl.", "workshop"

	INPROCEEDINGS

  else

	ARTICLE


%A missing

  %E present

	BOOK

  (%B or (%I and %T)) and %D present

	PROCEEDINGS

  %T and %D present

	MANUAL

  else

	MISC


%T missing

	MISC


%R or %M present

  %R present and %R contains the string "thesis"

    %R contains the string "masters"

	MASTERSTHESIS

    else

	PHDTHESIS

  else

	TECHREPORT


%B present

  if %E missing or %B contains any of the strings
	  "proc.", "proceedings", "conf.", "conference", "symp.", "symposium"
	  "congress", "intl.", "workshop"

	INPROCEEDINGS

  else

	INCOLLECTION


%E present

  if %T contains any of the strings
	  "proc.", "proceedings", "conf.", "conference", "symp.", "symposium"
	  "congress", "intl.", "workshop"

    if %P present

	INPROCEEDINGS

    else

	PROCEEDINGS

  else

    if %P present

	INCOLLECTION

    else

	BOOK


%P present

	INBOOK


%I present and %I does not contain the string "press"

  %I contains any of the strings
	  "univ.", "university", "dept.", "department", "institut",

	PHDTHESIS


%I or %C present

	BOOK


%D present

	BOOKLET

else

	MISC

_______________________________________________________________________________
--
-- 
inet: dupuy@cs.columbia.edu
uucp: ...!rutgers!cs.columbia.edu!dupuy

emv@math.lsa.umich.edu (Edward Vielmetti) (08/09/90)

In article <DUPUY.90Aug7212148@hudson.cs.columbia.edu> dupuy@cs.columbia.edu (Alexander Dupuy) writes:

   There are currently three major bibliography formats out there (that I know of,
   anyhow) not counting library software systems.  One is Unix refer(1) format,
   documented in addbib(1), and the other two are Scribe and BibTeX.  BibTeX
   format is pretty much a subset of Scribe's with one or two minor exceptions.
   Both refer and Scribe/BibTeX format have their own advantages and
   disadvantages.

There's also "tib" format, which is a slight mutation of refer(1)
format but usable with TeX.  And the 10th edition unix manuals have a
further bibliography format (don't recall the name) that uses
refer-ish style except the tags are multicharacter (%title instead of
%t).

Here's a cite (from sgml.math.lsa.umich.edu:/pub/sgml/bibliography) on
tools that go from the SGML format to BibTeX; I haven't seen this thing
yet.

Cover, Robin; Duncan, Nicholas; Barnard, David.  "A Bibliography
    on Structured Text."  Technical Report, 1990.  This is the
    preliminary print version of a bibliographic and information
    database (compiled by Robin Cover), structured in SGML-database
    and formatted with SGML ->> BibTeX utilities developed at Queen's
    University by Nick Duncan and David Barnard.  Contact: Department
    of Computing and Information Science; Queen's University;
    Kingston, Ontario, Canada K7L 3N6; Tel: (613) 545-6056.


I think there's also an ANSI bibliographic standard, though I don't know
how it addresses storage representation vis-a-vis appearance on the page.

--Ed

Edward Vielmetti, U of Michigan math dept <emv@math.lsa.umich.edu>
comp.text.sgml	ISO 8879 SGML, structured documents, markup languages
			yes votes to sgml-yes@math.lsa.umich.edu
			 no votes to  sgml-no@math.lsa.umich.edu