[comp.text.tex] sorting in bibtex

opbibtex@Neon.Stanford.EDU (Oren Patashnik) (02/08/91)

(Would somebody who knows Harmen van den Berg please mail him this
response?  He asked for e-mail replies, but all my attempts at
reaching him via e-mail failed.  The path in his article header ended
with `uunet!mcsun!hp4nl!utrcu1!uttwkt'.  Thanks.)

Your comp.text.tex article says:

> When I use BibTeX to make my list of references, I encounter the
> following problem concerning the alphabetical order in which the
> references appear. That order is not alphabetical. I have the following
> two references:
>   @phdthesis{vr78,
>     author = "P. F. de Vries Robbe",
>     title = "title",
>     school = "school",
>     year = "year" }
>   @phdthesis{v87,
>     author = "P. H. de Vries",
>     title = "title",
>     school = "school",
>     year = "year }
> These references appear in non-alphabetical order, namely [Vries Robbe]
> and [Vries], instead of [Vries] and [Vries Robbe]. Does anybody know a
> solution for this problem? (I tried using {} around Vries Robbe, but
> that didn't help.)

BibTeX's `plain' standard style gives the order you want (de Vries
followed by de Vries Robbe) so I suspect it's the bibliography style
(.bst file) that's giving you the problem.  It's hard for me to say
where the problem lies without knowing the details of the particular
style you're using, but I have several guesses.  First, if your style
uses alphabetic (rather than numeric) labels, it may be sorting first
on the label, then on the name---after all, it's the label that tells
you which item to look for in the reference list.  (Incidentally, I am
not a fan of alphabetic labels, or author-date styles, of which many
styles that use alphabetic labels are examples; if you'd like to see a
100-or-so-line spiel explaining why, let me know and I'll send you a
copy.)  Second, your style may simply have a bug in how it processes
names.  I know that in Dutch you don't alphabetize using the `de',
whereas in English we often do (because it's not nearly so common an
occurrence), so I know your style must be doing something different
from the standard styles---it may simply have done the `something
different' incorrectly.  In either case, if the style is not
appropriate for you, you should probably try to get it changed.  But
in the second case, if it's a bug that occurs only rarely, it may not
be worth the bother to fix up the style; in that case, there are easy
fixes to get the entry to come out where you want it in the reference
list.  The \noopsort macro in the xampl.bib file that's distributed
with BibTeX gives an example of how you'd do that.  If you can't
figure out what the problem is, let me know and I'm sure I can track
it down.  Hope this helps.

(By the way, it would be nice if when you post an article asking for
e-mail you would give an electronic address---say, after your name and
school affiliation---that works.  All seven addresses that I tried,
based on the header information in your article, failed.  Thanks.)

	--Oren Patashnik (opbibtex@neon.stanford.edu)

opbibtex@Neon.Stanford.EDU (Oren Patashnik) (02/10/91)

I got lots of requests to post a spiel I mentioned in a previous
article, so here it is.
               ---------------------------------------

First I'll flame a bit about bibliography styles in general.  I'll
argue that author-date styles, whose citations in the text look like
[Jones 76] or (Jones, 1976) or [Jon76], are based on outdated
technology and are bad.  Then I'll give some .bst-file information.

BEGIN FLAME

I understand that there's often little choice in choosing a
bibliography style.  Journal X says you must use style Y and that's
that.  If you have a choice, however, I strongly recommend that you
choose something like BibTeX's `plain' (numbers-in-brackets) standard
style.  Such a style, van Leunen argues convincingly in her "Handbook
for Scholars", encourages better writing than the alternatives---more
concrete, more vivid.

By contrast, author-date styles encourage flabby writing.  For
instance many such styles almost require the passive voice---"It has
been shown [Knuth 76] that ..."; the `plain' style avoids the passive
voice---"Knuth [13] shows that ..."

But the passive-voice problem isn't the worst of it.  The author-date
style forces you to include in your sentence author and date
information, even when some or all of that information is more a
distraction than a help to the reader.  Furthermore, author-date makes
it awkward to include other information that might be helpful.

The `plain' style, on the other hand, allows you to include exactly
the information in the sentence that belongs---sometimes the author,
sometimes the year, sometimes neither, but sometimes other stuff---
while minimally interrupting the flow of the sentence.  For example if
the year information is crucial you simply write something like:
"Knuth's seminal 1976 paper on mumble [13] shows ..."

Another example, in `plain':
	The field blossomed in 1976, starting with Knuth's tantalizing
	theory [13].  Others tore the theory to shreds [7], [8], [12],
	[41], [43], but that theory sparked . . .
This is reasonably clean and crisp.

Here's how it might come out in `author-date':
	The field blossomed one year, starting with a prominent
	computer scientist's tantalizing theory [Knuth 76]; others
	tore the theory to shreds [Aho and Ullman 76], [Aho et al. 76],
	[Hopcroft and Ullman 76], [Ullman and Yannakakis 76],
	[Yannakakis 76], but that theory sparked . . .
The passage, in conforming to the rigidity of the author-date style,
has become a metastasized mess.

"But The Chicago Manual of Style prefers an author-date style,"
some people point out.  True, but for anachronistic reasons:
	The chief disadvantage of [a style like `plain'] is that
	additions or deletions cannot be made after the manuscript
	is typed without changing numbers in both text references
	and list. (page 401, thirteenth edition)
With computer-typesetting systems like LaTeX, however, this
disadvantage obviously evaporates.

"But I hate it when someone writes a sentence like `The mumble theorem
was proved in [13]', forcing me to flip to the reference list to see
who did the proving," others complain.  I hate that too; but in my
view the fault lies with the *writer* of that sentence, not with the
number-in-brackets style itself.  Nothing in that flexible style
prevents a writer from saying `Knuth [13] proved the mumble theorem,'
or from giving additional information that's useful to the reader
(or from omitting even the author's name in the rare circumstances
that call for it).  To overstate the argument a bit: Just as we don't
blame a typeface for the poor writing of those who use that typeface,
we shouldn't blame the numbers-in-brackets style for the sloppiness
of those who use that style.

And it's not just the text that suffers from an author-date style; the
reference list has logical deficiencies too, as anyone who's written a
thorough program for such a style can attest.  *All* author-date
styles have these deficiencies, but which deficiencies arise depends
on the specific author-date style.  For example a style that uses
labels like [ABC86] (for a 1986 paper by Fred Aza, Joe Bloe, and Bill
Collier) must sort first by label, then by author (otherwise---if it
sorted first by author---a reader might have to search three pages of
`A' listings in a large bibliography to find the reference, because he
won't necessarily know that the `A' stands for `Aza' and he must
therefore look through all the `A' listings before finding [ABC86] at
the end).  But this kind of sorting (label first, then author) gives
an unnatural order: The [ABC86] paper, for example, would come near
the beginning of the `A' listings, rather than in its natural spot
near the end.  [Note: Since I originally wrote this spiel, I've
changed my mind---I now think that the problems with label-first
sorting are worse than the problems with author-first sorting; hence
the next version of the `alpha' standard style will probably use
author-first sorting; but having to decide which type of sorting
is worse merely underscores the author-date deficiencies.]

Alternatively, a style that uses labels like [Aza et al. 86] will
produce entries that tend not to be as far from their natural spot as
as with the [ABC86] style (although the alternative style can still
produce an unnatural order), but the longer, more cumbersome labels
are a nuisance.

Worse yet are styles that don't use labels in the reference list at all.
They have the advantage of being in natural order, but they might
separate the entries for [Smith et al. 83a] and [Smith et al. 83b] by a
full page in a large reference list.  Not only that, but these nonlabel
styles might require the reader to search two pages of `Smith' listings
to find the [Smith et al. 84] entry.

Any one of these problems is fixable, but only at the expense of
introducing new, worse problems.  What a mess.  There are other
problems with certain author-date styles, but that's enough for now.
(If forced to choose among these author-date styles, I'd choose,
probably kicking and screaming, the one that produces labels like
[ABC86] and [Knu76], which is BibTeX's `alpha' style.)

The `plain' style has none of these reference-list deficiencies---
it produces the natural order, it has short labels, and it has the
simplest and quickest scheme for finding a reference in the list---
all while providing the most flexible in-text citation scheme.

END OF FLAME.

If you don't buy my arguments, or if you're saddled with an
unenlightened editor, there are author-date options.  BibTeX's
standard style `alpha' uses labels like [ABC86] for multiple authors
and [Knu76] for single authors.  There is also an `apalike' style that
has no labels in the reference list and that produces citations in the
text like (Aho, 1983) or (Aho and Hopcroft, 1983) or (Aho et al., 1983).
This style resides in the Clarkson style collection.  (In addition to
the file apalike.bst, you'll need apalike.sty (so that you can give
`apalike' as an optional argument to the \documentstyle command) if
you're using BibTeX with LaTeX, or apalike.tex if you're using BibTeX
with TeX.)  Both these styles are for BibTeX version 0.99 or later.
If you need a variation on these styles, it's best to (1) start with
`alpha' if your style uses labels in the reference list, or with
`apalike' if your style doesn't have labels, and (2) then modify
(but change the name when you're finished modifying).  The Clarkson
style collection may have other author-date styles as well.

	--Oren Patashnik (opbibtex@neon.stanford.edu)

spqr@ecs.soton.ac.uk (Sebastian Rahtz) (02/11/91)

In article <1991Feb9.203132.6226@Neon.Stanford.EDU> opbibtex@Neon.Stanford.EDU (Oren Patashnik) writes:

   By contrast, author-date styles encourage flabby writing.  For
   instance many such styles almost require the passive voice---"It has
   been shown [Knuth 76] that ..."; the `plain' style avoids the passive
   voice---"Knuth [13] shows that ..."

The usage `it has been shown (Knuth 1987)' is often found in BibTeX
usage because of the difficulty of getting \cite{Knuth/87} to produce
(1987), as in `Knuth has shown (1987)'; yes, I know about \shortcite,
but that's not distributed as standard with BibTeX. I find it much
easier to follow the argument in `Knuth has so often shown (1983,
1984, 1990)' than in `Knuth has so often shown (13, 67, 89)', because
the year references provide an intermediate clue about whether i want
to bother following-up the reference. I like very much to have instant
feedback on the date of a reference, and I also like the fact that, in
a field with which I am familiar, I recognize `Knuth 1986' as "oh yes,
that paper, I've read that", without having to bother flick through
the bibliography.

Sebastian
--
Sebastian Rahtz                        S.Rahtz@uk.ac.soton.ecs (JANET)
Computer Science                       S.Rahtz@ecs.soton.ac.uk (Bitnet)
Southampton S09 5NH, UK                S.Rahtz@sot-ecs.uucp    (uucp)

douglis@cs.vu.nl (Fred Douglis) (02/12/91)

Along these lines... I've seen people produce papers that look like
they came from LaTeX, with citations of the form [Knuth 1984] rather than
[Knu84] (the alpha style).  I assume there's a .bst file floating around
that produces this format, but don't know which it is.  If it's standard,
I'd appreciate a pointer to it, and if it's not, I'd appreciate it if
someone could post a copy or refer me to an FTPable copy.  

Thanks.

--
=============================================================================
     Fred Douglis, Vrije Universiteit, douglis@cs.vu.nl +31 20 548-5777
=============================================================================

opbibtex@Neon.Stanford.EDU (Oren Patashnik) (02/13/91)

[This article is fairly long, because I address not only the specific
issues that Sebastian raises, but also some related issues that others
have raised in private e-mail discussions with me, or that people have
asked me to post.]

In article <SPQR.91Feb11120111@caxton.ecs.soton.ac.uk> spqr@ecs.soton.ac.uk (Sebastian Rahtz) writes:

> In article [. . .] opbibtex@Neon.Stanford.EDU (Oren Patashnik) writes:
>
>    By contrast, author-date styles encourage flabby writing.  For
>    instance many such styles almost require the passive voice---"It has
>    been shown [Knuth 76] that ..."; the `plain' style avoids the passive
>    voice---"Knuth [13] shows that ..."
>
> The usage `it has been shown (Knuth 1987)' is often found in BibTeX
> usage because of the difficulty of getting \cite{Knuth/87} to produce
> (1987), as in `Knuth has shown (1987)'; yes, I know about \shortcite,
> but that's not distributed as standard with BibTeX.

You've raised, directly or implicitly, several points in that
sentence.  What I perceive to be your main point---that people who use
BibTeX tend to use the form `(name year)' more often than the form
`(year)' because the `(year)' form doesn't come standard with BibTeX
---has a germ of truth, but is inaccurate on two counts.  First,
you've incorrectly implied that the `(name year)' form *does* come
standard with BibTeX.  There are only four standard styles (plain,
abbrv, unsrt, alpha) and none of them uses such a form.  There have
been several styles written that use that form (except that most of
those styles put a comma between the name and year); I even wrote one
of them myself (apalike).  Some such styles live in certain
repositories, and some may even come on certain distribution tapes,
but as of the .99 version of BibTeX, only the four styles mentioned
above come standard with BibTeX.  Your real complaint, I think, is
that it's easier to get your hands on a style that uses `(name, year)'
than on one using `(year)' alone.  Which leads to the second
inaccuracy: I think you've put the cart before the horse.  It's not
that more BibTeX users use the `(name, year)' form because that's the
form that's easier to find; rather, the `(name, year)' form is easier
to find because there's been more of a demand for that form, hence
that's the form that style writers have produced.  (At least that
demand explains why apalike---perhaps the most easily accessible such
style---was written with the `(name, year)' form only.)  But there's
nothing inherent in either BibTeX or LaTeX that makes the `(year)'
form harder to write a style for; in fact it's a little easier.

Incidentally, you've implicitly stated that you prefer the `(year)'
form to the `(name, year)' form.  I do too; I think the `(year)' form,
in the text at least, avoids many of the problems of the `(name,
year)' form.  But most of its reference-list logical deficiencies
(mentioned in my previous article) remain.

You brought up another issue: Which styles are standardly distributed
with BibTeX?  Here's a slightly oversimplified view of the TeX/LaTeX/
BibTeX world.  With TeX, Knuth provided a basic product (TeX the
program) along with a useful macro package (plain); with LaTeX,
Lamport provided a higher-level macro package, along with some useful
styles (article, report, book).  Similarly, with BibTeX there's a
basic product (BibTeX the program) and sample formats---one well-
thought-out style (plain) and several useful variations (abbrv, unsrt,
alpha).  The variations were chosen because they used reasonably
common features and hence gave examples of typical functions that one
might program in the bibliography-style (.bst) language.

So for versions .98 and .99, there were just those four standard
styles.  Incidentally, they are called `standard styles' because they
are standardly distributed with BibTeX, not because they are thought
to implement somebody's standards (even though the `plain' style does
come pretty close to the style recommended by van Leunen in her
"Handbook for Scholars").  For BibTeX version 1.00, which will be the
frozen version (in the same way that TeX 3.0 was a frozen version)
there will be a few changes.  In addition to the four standard styles,
there will probably be the four semi-standard styles, acm, apalike,
ieeetr, siam, which are four other styles that I've written or
maintained; they will (probably) be distributed with BibTeX because
they provide more examples (and because it will make my life easier to
have all the styles I maintain distributed together), but not because
there's something special about them.  (By the way, current plans are
to add to apalike the `(year)' form, along with a few other APA
recommendations, and perhaps change the name to simply `apa' if I
decide that what results is close enough to the APA style.)

> I find it much easier to follow the argument in `Knuth has so often
> shown (1983, 1984, 1990)' than in `Knuth has so often shown (13, 67,
> 89)', because the year references provide an intermediate clue about
> whether i want to bother following-up the reference.

I think the year serves two purposes in the style you prefer: Its
primary purpose is to provide a pointer into the reference list (you
might need to distinguish among several works by Knuth, hence you need
a `1983'---you might even need a `1983a' and a `1983b' to distinguish
between two of his works from the same year); but as I argued in my
previous article, it's inferior, as a pointer into the reference list,
to a number in brackets.  As an often useful secondary purpose, it
also provides, as you say, an intermediate clue (additional
information) about the particular work; but why force the writer to
always give the year as an intermediate clue---why not let the writer
choose the information, if any, that's appropriate for the sentence at
hand?  One answer is that many writers are negligent and, by habit,
never provide any additional information; it's better to force them,
through the citation style, to at least provide the year.  That view
has some merit; but I think it would be better if writers would get in
the habit of thinking about the information they are providing the
reader, and if journal editors would encourage such thinking through
their editorial policy.  [Here's one simply stated rule that should, I
think, be included in editorial policies: Always use the number-in-
brackets as a parenthetical remark (`. . . as Knuth [13] shows')
rather than as a part of speech (`. . . as shown in [13]').]  Perhaps
there will be a trend in that editorial direction once people realize
that the main reason that author-date styles have become popular is no
longer valid (as I pointed out in my previous posting, in quoting the
Chicago Manual of Style).

To sum up: I think it's better to decouple the two purposes served by
the `(year)' form, so that each purpose may be served better; the
`plain' style does that.  So while it's true that a year conveys more
information than a number in brackets, it does so by serving two
purposes; the number in brackets, in serving just one (reference-list
pointer) purpose, lets the writer provide whatever information is
necessary to serve the other purpose, in a more flexible way.

> I like very much to have instant feedback on the date of a reference,
> and I also like the fact that, in a field with which I am familiar, I
> recognize `Knuth 1986' as "oh yes, that paper, I've read that",
> without having to bother flick through the bibliography.

I suspect that what you really want is not instant feedback on the
date of reference, but rather instant feedback on which work the
author is referring to; the date of reference happens to be one way to
try to give that feedback.  But I claim that, in a field with which
you are familiar, `Knuth [13] shows that . . .', together with the
information contained in the `. . .', almost surely serves to uniquely
identify the work.

One last point.  I'm not claiming that there's no utility in seeing
the year of publication.  To the contrary, I agree that a name and
year pinpoint the reference pretty quickly for a reader who's familiar
with the field and who gets used to seeing that name-year combination
in paper after paper.  But by the same token, if you rely solely on
that name-year convention, you're putting at an unnecessary
disadvantage the reader who is *unfamiliar* with the field: You're
quite possibly omitting some information that would be as useful to
the reader familiar with the field as the year information is, but
that would be much more useful than the year information is to the
reader who's unfamiliar with the field.  (If you don't rely solely on
the name-year convention---for example if you take a well-written
sentence in the `plain' style and simply substitute a year for a
number in brackets---many of my objections about the text itself
disappear, although, again, the reference-list deficiencies remain.)

	--Oren Patashnik (opbibtex@neon.stanford.edu)