dupuy@cs.columbia.edu (Alexander Dupuy) (08/08/90)
I don't see much problem with duplicate bibliographies - people can decide for
themselves which is more useful/accurate/whatever - but more of a problem would
be the question of the database formats.
The first issue would be whether any format would be used at all. Given the
experience of comp.archives, I would expect that trying to define a new format
would almost certainly be a failure, and even using an existing one, at least
half the postings wouldn't use it (or any other any standard format).
There are currently three major bibliography formats out there (that I know of,
anyhow) not counting library software systems. One is Unix refer(1) format,
documented in addbib(1), and the other two are Scribe and BibTeX. BibTeX
format is pretty much a subset of Scribe's with one or two minor exceptions.
Both refer and Scribe/BibTeX format have their own advantages and
disadvantages.
Unix refer format is more fixed in structure, and thus more amenable to
database-style operations (e.g. sortbib, indxbib, lookbib). It has the
advantage that it comes pretty much standard with Unix. Although the defined
fields are somewhat more regular than Scribe/BibTeX format, they aren't quite
as extensive.
Scribe/BibTeX format is more freeform, but requires classification of the
document type (i.e. article, book, proceedings, unpublished, etc.). It has the
advantage that Scribe and BibTeX can both understand a common subset format,
and both provide support for generating bibligraphies and references in a
number of styles (e.g. CACM, IEEE, etc.)
A sample refer format bibliography entry might look like this:
%K Miscellaneous
%A David P. Anderson
%A Robert Wahbe
%T A Framework for Multimedia Communication in a General-Purpose Distributed Sys
tem
%R Technical Report 89/498
%I UC Berkeley CS Division
%D March 1989
%X \fBAbstract:\fP
Motivates the design given in TR 88/462, gives some comparisons,
and discusses implications for protocol and local system design.
Description of channel parameters supercedes TR 88/462.
The same bibliography entry in Scribe/BibTeX common subset might look like:
@TechReport(UCBTR-89-498,
Author = "David P. Anderson and Robert Wahbe",
Title = "A Framework for Multimedia Communication in a General-Purpose
Distributed System",
Institution = "UC Berkeley CS Division",
Number = "89/498", Month = "March", Year = "1989",
Abstract = {
Motivates the design given in TR 88/462, gives some comparisons,
and discusses implications for protocol and local system design.
Description of channel parameters supercedes TR 88/462.} )
It is more or less feasible to convert from one format to the other (easier, I
think when going from refer to Scribe/BibTeX, which is why I prefer refer).
I'll follow this article with a posting describing each format in more detail,
and some notes I've made on conversions between them.
@alex
--
--
inet: dupuy@cs.columbia.edu
uucp: ...!rutgers!cs.columbia.edu!dupuydupuy@cs.columbia.edu (Alexander Dupuy) (08/08/90)
The following are some notes I've made on the various bibliography formats - at
the end is an outline of a heuristic method for determining a Scribe/BibTeX
classification for refer style bibliography entries.
@alex
_______________________________________________________________________________
References are kept in Unix refer format, described below:
Bibliography Key Letters
The most common key-letters and their meanings are given below.
%A Author's name
%B Book (or Proceedings) containing article referenced
%C City (place of publication)
%D Date of publication
%E Editor of book containing article referenced
%F Footnote number or label (supplied by refer)
%G Government order number
%H Header commentary, printed before reference
%I Issuer (publisher)
%J Journal containing article referenced
%K Keywords to use in locating reference
%L Label field used by -k option of refer
%M Bell Labs Memorandum (undefined)
%N Number within volume
%O Other commentary, printed at end of reference
%P Page number(s)
%Q Corporate or Foreign Author (unreversed)
%R Report, paper, or thesis (unpublished)
%S Series title
%T Title of article or book
%V Volume number
%X Abstract - used by roffbib, not by refer
%Y,Z Ignored by refer
In practice, some of these conventions are ignored:
%J is often used for Conference Proceedings, where %B is correct
%J and %N are used for Technical Reports, where %R (or %M?) is correct
%A is used in cases where %Q should be used for organizational authors
In order to encode certain information needed by Scribe and BibTeX, we will use
the %Y and %Z fields:
%Y Classification category (see below for list)
%Z Additional fields: FIELD = "val", FIELD = "val" ...
Valid classification categories are:
ARTICLE
BOOK
BOOKLET
CONFERENCE
MANUAL
MASTERSTHESIS
MISC
PHDTHESIS
PROCEEDINGS
TECHREPORT
UNPUBLISHED
_______________________________________________________________________________
Another format is bibtex format, described below
Bibliography File
The bibliography file (.bib) format is just about a subset
of that allowed in Scribe bibliographies. Only the delim-
iter pairs {...} and "..." are allowed inside entries.
Entries themselves can be delimited by (...) also. The =
sign between field names and field values is not optional.
There are a number of conventions that should be followed
when writing .bib files. These are not requirements of bib-
tex, but standard bibliography style files will typically
expect these conventions to be followed.
References should be categorized as in Scribe into one of
the categories: article, book, booklet, inbook, incollec-
tion, inproceedings, manual, mastersthesis, misc, phdthesis,
proceedings, techreport, and unpublished. See the Scribe
manual for the fields that must/can appear in each type of
reference.
The title field should be entered in uppers-and-lowers for-
mat, where everything is capitalized except articles and
unstressed conjunctions and prepositions, and even those are
capitalized if they are the first word or the first word
after a colon. Some style files will convert all words
except the first to all lowercase. This is a mistake for
things like proper nouns, so you have to tell bibtex not to
touch such capital letters by enclosing them in braces, as
in "Dogs of {A}merica". It is unlikely that any style file
would attempt to convert book titles to lowercase, so
perhaps you can omit braces in such titles.
The author and editor fields should conform to a particular
format, so that the style file can parse them into parts. A
name can have four parts: first, von, last, junior, each of
which can consist of more than one word. For example, "John
Paul von Braun, Jr." has "John Paul" as the first part,
"von" as the von part, "Braun" as the last part, and "Jr."
as the junior part. Use one of these formats for a name:
First von Last
von Last, First
von Last, Junior, First
The last part is assumed to be one word, or all the words
after the von part. Bibtex will treat anything in braces as
one word, so use braces to surround last names that contain
more than one word. The von part is recognized by looking
for words that begin with lowercase letters. When possible,
enter the full first name(s); style files may abbreviate by
taking the first letter. Actually, the rules for isolating
the name parts are a bit more complicated, so they do the
right thing for names like "de la Grand Round, Chuck".
There is no need for a field like Scribe's fullauthor field.
If there are multiple authors or editors, they should all be
separated by the word and. Scribe's editors field should
not be used, since bibtex style files can count how many
names are in an editor field.
_______________________________________________________________________________
And from the Scribe manual page:
BIBLIOGRAPHIES
Scribe contains a mechanism for automatically assembling a
Bibliography for a document by selecting entries from a
larger bibliographic database. Scribe expects to find the
information for Bibliography entries in a bibliography data-
base file (.BIB) in a specific data format. Each entry in
an .BIB file must have the structure:
@classification(codeword, list-of-fields)
The list of valid "classifications" appears in the "Classes"
subtopic. The list of valid "fields" appears in the
"Fields" subtopic. The formatting of the cited biblio-
graphic reference is controlled by the reference format
chosen via the @Style command. Available formats can be
found in the "Reference_Formats" subtopic.
CLASSES Scribe's bibliography classifications are listed below.
Not all classifications are used with each reference format.
Refer to the "Scribe User Manual" or the "Scribe Advanced User
Manual" for details on which classifications are required and
optional for each reference format.
ARTICLE An article from an academic journal or a magazine.
BOOK Something published on its own, usually by a publish-
ing house that is not the same as the author.
BOOKLET Something published and bound, but having neither
an explicitly-named publisher nor a sponsoring institution.
CONFERENCE The conference name.
INBOOK For a reference to a part of a book rather than to
the entire book.
INCOLLECTION Something composed of papers or chapters pre-
viously published elsewhere.
INPROCEEDINGS A reference to a paper in a conference
proceedings or the like.
MANUAL An instruction manual or piece of technical documen-
tation.
MASTERSTHESIS A Masters' thesis.
MISC Any category not mentioned in this list.
PHDTHESIS A Ph.D. thesis.
PROCEEDINGS The proceedings of a conference or some similar
document. The identifying characteristics of the classifi-
cation are that its publisher and author are identical, and
often no editor's name appears.
TECHREPORT A technical report. Similar to a book, except
that it is published by a research institution instead of by
a publisher and that it usually has an assigned "report
number".
UNPUBLISHED Some paper that is in preparation or that has
been printed but not published.
FIELDS The following field names are used in defining
bibliography database entries. All take a delimited string or an
abbreviation code as a value. Not all field names apply to each
classification; some are required while others are not. Check
the "Scribe User Manual" and "Scribe Advanced User Manual" for
details.
ADDRESS The address of the publisher or printer or organi-
zation.
AUTHOR The name(s) of the author or authors, in the format
in which they should be printed.
ANNOTE Any annotation text. Not actually printed in most
bibliography formats.
BOOKTITLE The title of a book or proceedings of which this
reference is a chapter or paper or article.
CHAPTER If a reference is being made to part of a book and
not the entire book, specify either chapter or pages.
DATE Can be used instead of MONTH and YEAR in some
reference formats
EDITION Manuals often have an edition name or number that
is not part of the actual title of the manual.
EDITOR The name of the editor. If more than one, use Edi-
tors.
EDITORS The name of the editors. If only one, use Editor.
FULLAUTHOR The full name of the author or authors, written
out without commas.
FULLORGANIZATION The "full" name of the organization for
mailing purposes.
HOWPUBLISHED For unusual manuscripts; how it came into your
possession ("personal note", etc.).
INSTITUTION The organization or institution backing or
publishing a technical report or a proceedings.
JOURNAL The title of the journal.
KEY The sort key. This field is used for alphabetization.
MEETING Used with the value of the SOCIETY field name.
MONTH January, February, etc.
NOTE Any comment, usually used to clarify the reference or
to suggest alternate sources. Differs from Annote in that
Note will always be printed, but Annote will be printed only
in those bibliography types that specify annotation.
NUMBER Issue number of a journal or series number in a book
series or serial number of a technical report.
ORGANIZATION The name of the organization holding a confer-
ence that published a proceedings.
PAGES The page numbers within a journal, proceedings, or
book that contain the material actually cited.
PUBLISHER The name of the publishing company.
SCHOOL For theses, the name of the school granting the
degree.
SERIES When books are published in a series, the series has
a name.
TITLE The title of the book, article, thesis, or other
document that is being cited. Do not italicize or under-
line; that detail will be handled by the selected reference
format.
TYPE Some technical reports are called by other names, such
as "Research Report", etc. If this is not a "Technical
Report", put its true name in this field.
VOLUME The volume number of a journal or a series book.
YEAR The year of publication; four digits: 1979.
REFERENCE_FORMATS Bibliography format definitions in the
Database are used to control the style and sequencing of the list
of references and the citations. Select one with the References
@Style parameter.
1APA Similiar to the APA format except that it contains an
Annote field that is treated as a Comment.
1APADRAFT Similiar to the 1APA format except that it is
double-spaced.
ANNAPA Similiar to the 1APA format except that the Annote
field is treated as text.
ANNAPADRAFT Similiar to the 1APADraft format except that
the Annote field is treated as text.
ANNOTEDSTDALPHABETIC Same as StdAlphabetic, but includes
annotations and has filled lines.
ANNOTEDSTDIDENTIFIER Similiar to the STDIdentifier format
except it includes annotations and has filled lines.
ANNOTEDSTDNUMERIC
Same as STDNumeric, but includes annotations (i.e. the
contents of the Annote field) in the Bibliography and has
filled lines.
ANNSTDALPHABETIC Similiar to the STDAlphabetic format
except it includes annotations and has unfilled lines.
ANNSTDNUMERIC Similiar to the STDNumeric format except it
includes annotations and has unfilled lines.
APA (American Psychological Association). Spelled-out
citations (Knuth, 1978), outdented closed reference list,
alphabetical ordering of references. APADRAFT Draft ver-
sion of APA format. Same as regular version, but triple-
spaces the Bibliography.
CACM Numeric citations [5], closed format, alphabetical
ordering of references.
CLOSEDALPHABETIC Similiar to the STDAlphabetic format.
CLOSEDNUMERIC Similiar to the STDNumeric format.
5
IEEE Superscripted numeric citations, closed format, cita-
tion sequence ordering of references.
IPL (Information Processing Letters). The format required
by IPL. This format is incomplete; it does not have all
standard Scribe types yet (April 1984) and is being included
for convenience only.
NEWAPA (American Psychological Association). The new APA
format with the Year following the Author. Spelled-out
citations (Knuth, 1978), outdented closed reference list,
alphabetical ordering of references.
SIAM (Society for Industrial and Applied Mathematics). The
format required by SIAM journals. This format is incom-
plete; it does not have all standard Scribe types yet (April
1984) and is being included for convenience only.
STDALPHABETIC Alphabetic citations [Knuth 78], open format,
alphabetical ordering of references.
STDIDENTIFIER Open format, reference identifier for cita-
tions rather that a generated label.
STDNUMERIC Numeric citations [5], open format, alphabetical
ordering of references.
COMMANDS
BIBFORM Defines a Bibliography classification, such as
"Book", for a particular Bibliography reference format. May
only be used in .REF files and the subtopic in the
"Bibliographies" entry for available bibliography classifi-
cations.)
Format:
@Bibform(Classification=delimited-definition-string)
EXAMPLES
1. @BibForm(UnPublished=<
@begin(BibEntry)
@parm(tag).@@parm(Author), @~
"@parm(Title)"@~
@Imbed(Note,def ', @Parm(Note)', undef '.')
@end(BibEntry)
>)
(Note: Taken from the IEEE.Ref database file.)
2. @BibForm(Misc=<
@begin(BibEntry)
@l1{[@parm(tag)]@@imbed(Author,def '@parm(Author).')}
@imbed(Title,def '@l2{@parm(Title).}')
@imbed(HowPublished,def '@l2{@parm(HowPublished).}')
@imbed(Year, def '@l2{@imbed"Month, def {@Parm(Month), }"@~
@parm(Year)}')
@imbed(Note,def '@l2{@parm(Note).}')
@end(BibEntry)
>)
(Note: Taken from the Standa.Lib database file.)
_______________________________________________________________________________
And finally, a mapping from refer keywords to Scribe/BibTeX fields:
%A AUTHOR (use last word before comma for KEY)
%B BOOKTITLE
%C ADDRESS
%D [MONTH] YEAR (or DATE, if more than two words)
%E EDITOR (or EDITORS, for Scribe)
%F ignored
%G NUMBER
%H NOTE (see also %O)
%I INSTITUTION (TECHREPORT)
PUBLISHER ([IN]BOOK, BOOKLET, INCOLLECTION)
ORGANIZATION (CONFERENCE, [IN]PROCEEDINGS, MANUAL)
SCHOOL (MASTERSTHESIS or PHDTHESIS)
%J JOURNAL
%K ignored
%L KEY (also citation name of reference)
%M NUMBER, TYPE="Bell Labs Memorandum"
%N NUMBER
%O NOTE (see also %H)
%P PAGES
%Q AUTHOR (use first word for KEY)
%R parse into TYPE and NUMBER, very messy
%S SERIES
%T TITLE
%V VOLUME
%X ANNOTE
The following Scribe/BibTeX fields have to be encoded in %Z:
CHAPTER
EDITION
FULLAUTHOR
FULLORGANIZATION
HOWPUBLISHED
MEETING
PUBLISHER (INPROCEEDINGS, PROCEEDINGS)
SOCIETY
An heuristic for determining the classification type from the refer data:
(evaluate from top to bottom, observing nesting conditionals)
%J present
%N present and %J contains the string "report"
TECHREPORT (and convert %J into TYPE)
%I present and %I contains any of the strings
"univ.", "university", "dept.", "department", "labs", "laboratory",
"center", "institut", "division"
TECHREPORT (and convert %J into TYPE)
%J contains any of the strings
"proc.", "proceedings", "conf.", "conference", "symp.", "symposium"
"congress", "intl.", "workshop"
INPROCEEDINGS
else
ARTICLE
%A missing
%E present
BOOK
(%B or (%I and %T)) and %D present
PROCEEDINGS
%T and %D present
MANUAL
else
MISC
%T missing
MISC
%R or %M present
%R present and %R contains the string "thesis"
%R contains the string "masters"
MASTERSTHESIS
else
PHDTHESIS
else
TECHREPORT
%B present
if %E missing or %B contains any of the strings
"proc.", "proceedings", "conf.", "conference", "symp.", "symposium"
"congress", "intl.", "workshop"
INPROCEEDINGS
else
INCOLLECTION
%E present
if %T contains any of the strings
"proc.", "proceedings", "conf.", "conference", "symp.", "symposium"
"congress", "intl.", "workshop"
if %P present
INPROCEEDINGS
else
PROCEEDINGS
else
if %P present
INCOLLECTION
else
BOOK
%P present
INBOOK
%I present and %I does not contain the string "press"
%I contains any of the strings
"univ.", "university", "dept.", "department", "institut",
PHDTHESIS
%I or %C present
BOOK
%D present
BOOKLET
else
MISC
_______________________________________________________________________________
--
--
inet: dupuy@cs.columbia.edu
uucp: ...!rutgers!cs.columbia.edu!dupuyemv@math.lsa.umich.edu (Edward Vielmetti) (08/09/90)
In article <DUPUY.90Aug7212148@hudson.cs.columbia.edu> dupuy@cs.columbia.edu (Alexander Dupuy) writes:
There are currently three major bibliography formats out there (that I know of,
anyhow) not counting library software systems. One is Unix refer(1) format,
documented in addbib(1), and the other two are Scribe and BibTeX. BibTeX
format is pretty much a subset of Scribe's with one or two minor exceptions.
Both refer and Scribe/BibTeX format have their own advantages and
disadvantages.
There's also "tib" format, which is a slight mutation of refer(1)
format but usable with TeX. And the 10th edition unix manuals have a
further bibliography format (don't recall the name) that uses
refer-ish style except the tags are multicharacter (%title instead of
%t).
Here's a cite (from sgml.math.lsa.umich.edu:/pub/sgml/bibliography) on
tools that go from the SGML format to BibTeX; I haven't seen this thing
yet.
Cover, Robin; Duncan, Nicholas; Barnard, David. "A Bibliography
on Structured Text." Technical Report, 1990. This is the
preliminary print version of a bibliographic and information
database (compiled by Robin Cover), structured in SGML-database
and formatted with SGML ->> BibTeX utilities developed at Queen's
University by Nick Duncan and David Barnard. Contact: Department
of Computing and Information Science; Queen's University;
Kingston, Ontario, Canada K7L 3N6; Tel: (613) 545-6056.
I think there's also an ANSI bibliographic standard, though I don't know
how it addresses storage representation vis-a-vis appearance on the page.
--Ed
Edward Vielmetti, U of Michigan math dept <emv@math.lsa.umich.edu>
comp.text.sgml ISO 8879 SGML, structured documents, markup languages
yes votes to sgml-yes@math.lsa.umich.edu
no votes to sgml-no@math.lsa.umich.edu