[comp.lang.misc] A very brief history of optimal sorting methods

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/10/90)

This is funny. Computer science really does progress backwards.

In the '60s, people knew that sorting was linear in the number of bytes
being sorted. The post office had already been using MSD radix sort for
a long time, though they probably never bothered to analyze its running
time. Knuth says that LSD became more popular for computers because in
MSD ``the multiplicity of piles becomes confusing''; others pointed
wisely to the initials LSD and smiled.

In 1987, Tarjan et al. appeared on the sorting scene, with their ``new
partitioning algorithms.'' That's right: sorting in linear time. With
what? MSD radix sort. Hey, I'm impressed.

Last year, somebody or other appeared on the sorting scene. Guess what?
An o(n log n) sort. This one wasn't linear, though; it wasn't even
practical. It used all sorts of weird bit manipulations to ``break the
sorting barrier.'' Something like n log n/log log n. Computer scientists
around the world began teaching courses on this huge advance. (No joke.)

Now Jim Giles appears on the sorting scene. Guess what? Radix sorts are
slower than n log n. n log n is a lower bound.

I thank God for not making me a computer scientist.

Luckily, on the other side of the fence, we programmers still know that
sorting is linear. Thanks to Keith Bostic, a nicely cleaned up version
of my linear-time radix sort code will appear in the BSD 4.4 library.

In article <5488@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
  [ there aren't linear-time sorts, everything's O(n log n) ]
> That's why I didn't bother to answer
> your next post - you didn't seem to realize that an Nlog(N) sort is
> _faster_ than a 'linear' radix sort when the keyspace is sparcely
> filled.)

The sort program I plan to contribute to Berkeley runs more than twice
as fast as the AT&T-derived version, which I presume uses a merge sort
or some other n log n method. Believe what you want to, Jim.

---Dan

rh@smds.UUCP (Richard Harter) (11/11/90)

In article <6913:Nov1008:23:5690@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> This is funny. Computer science really does progress backwards.

> In the '60s, people knew that sorting was linear in the number of bytes
> being sorted. The post office had already been using MSD radix sort for
> a long time, though they probably never bothered to analyze its running
> time. Knuth says that LSD became more popular for computers because in
> MSD ``the multiplicity of piles becomes confusing''; others pointed
> wisely to the initials LSD and smiled.

	[Much more material in this vein deleted.]

Dan, me boy, you're having a wee bit of trouble telling your burro 
from yer burrow.

If you have N records with distinct keys the effective length of the
keys is log(N).  MS* sorts and binary sorts (qsort, mergesort, whatever)
must do O(N(log(N)) record (or pointer) moves.  In either case the
result is N*log(N) records moved.  In principle an MS* sort need only
examine N*log(N) bits whereas a binary sort must examine N*log(N)**2
bits; in practice this advantage is pretty nominal.  The true advantage
of MS* (but not LS*) sorts is that one can use more than two bins,
changing the base of the log term.  For example, a byte based MSB sort
uses 256 bins which reduces the number of moves by a factor of 8.

THERE ARE NO TRULY LINEAR SORTS.

There are, however, quasi-linear sorts, i.e. sorts which are linear
for a restricted range of N and some restrictions on the data.  These
sorts depend, ultimately, on the fact that arithmetic is (up to the
word size) a parallel operation.  The fastest possible sort requires
an address calculation function which gives, for each key, the location
in the sorted file that the record must be moved to.  Given this one
can sort in linear time -- provided that a record can be moved from
one location to any other in constant time.  The latter requirement cannot
be satisfied for sufficiently large N (i.e. the file is larger than
available memory).
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/11/90)

In article <235@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
> In article <6913:Nov1008:23:5690@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> > This is funny. Computer science really does progress backwards.
> > In the '60s, people knew that sorting was linear in the number of bytes
> > being sorted.
> If you have N records with distinct keys the effective length of the
> keys is log(N).
  [ several sentences of similar handwaving deleted ]
> THERE ARE NO TRULY LINEAR SORTS.

Sorting is linear in the number of bytes being sorted. Feel free to
spout on about implicit log n factors that don't exist; I'll continue to
sort n-byte files in time between bn and cn seconds for nonzero b and c.
Models of computation are cute, but I live in the real world.

---Dan

rh@smds.UUCP (Richard Harter) (11/12/90)

In article <16709:Nov1113:56:2390@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

> Sorting is linear in the number of bytes being sorted. Feel free to
> spout on about implicit log n factors that don't exist; I'll continue to
> sort n-byte files in time between bn and cn seconds for nonzero b and c.
> Models of computation are cute, but I live in the real world.

One time only, I'll try to make it very simple.  If you do a radix sort
you make multiple passes over the data.  In each pass you move all of the
records.  The minimum number of passes needed is log(N) where N is the
number of records and the base of the logarithm is the bin size.  N*log(N)
records must be moved.  Is that clear enough?

-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/12/90)

In article <237@smds.UUCP>, rh@smds.UUCP (Richard Harter) writes:
> In article <16709:Nov1113:56:2390@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> > Sorting is linear in the number of bytes being sorted.

> One time only, I'll try to make it very simple.

It is worth noting, however, that for a *fixed* range of integers [a,b],
a record is ceil(lg(b-a+1)) bits (call that lg(M)), the entire collection
of records is O(N*lg(M)) bits, and if we sort K bits at a time, we require
ceil(lg(M)/K) passes, each pass of which is linear in the number of bytes
being sorted.  Consider an array of N 32-bit integers, where we take 8-bit
"digits".  (Not wholly unrealistic.)  Then we need a maximum of 4 passes.

So if you want to sort an array of Pascal 'integer's, C 'long's,
Fortran 'INTEGER*4's, and so on, it's going to take you at most 4 passes.
For practical purposes, radix sort of machine integers may be taken as
linear in the number of bytes, because the range of values is fixed.

The studies which have allegedly shown quicksort to be fast have considered
sorting hardware integers.  For that special case, quicksort is O(N**2) and
radix sort is O(N), and a fairly crude radix sort can make quicksort look
very sick indeed.

(A crude radix sort would not do for languages like Lisp, Scheme, Pop, and
some Prologs which don't impose arbitrary and to-the-user unmotivated
limits on the size of integers.)

-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

hansm@cs.kun.nl (Hans Mulder) (11/12/90)

In article <237@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
> .....  I'll continue to
> sort n-byte files in time between bn and cn seconds for nonzero b and c.

In article <16709:Nov1113:56:2390@kramden.acf.nyu.edu>,
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) argues that if the file
contains N records, they must be at least log(N) long, in order to be
distinct, and that it takes at least N*log(N) time to sort them.

This is perfectly consistent: if there are N records and each is log(N)
bytes long, then there are N*log(N) bytes of data in the file.

In other words, n == N*log(N).  You agree.

Have a nice day,

Hans Mulder	hansm@cs.kun.nl

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/13/90)

In article <237@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
> If you do a radix sort
> you make multiple passes over the data.  In each pass you move all of the
> records.

Wrong.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/13/90)

In article <4248@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
> For practical purposes, radix sort of machine integers may be taken as
> linear in the number of bytes, because the range of values is fixed.

You imply that this is not true for theoretical purposes. You are wrong.

Sorting is linear in the number of bytes being sorted. This is true both
theoretically and practically.

> (A crude radix sort would not do for languages like Lisp, Scheme, Pop, and
> some Prologs

A *crude* radix sort would not do for any serious application.

A *good* radix sort is linear in the number of bytes being sorted.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/13/90)

In article <2432@wn1.sci.kun.nl> hansm@cs.kun.nl (Hans Mulder) writes:
  [ Hans attributed to Richard, but I actually said: ]
> > .....  I'll continue to
> > sort n-byte files in time between bn and cn seconds for nonzero b and c.
  [ Hans paraphrased Richard, attributing it to me: ]
> if the file
> contains N records, they must be at least log(N) long, in order to be
> distinct,

There is no reason to assume distinctness. In fact, many real database
applications feature large numbers of identical keys. It would be silly
to use an O(n log n) sort when a constant-time-per-byte sort suffices.

However, you are correct that a linear sort is asymptotically no faster
than an O(n log n)-per-record sort *if* the records are distinct.

End technical content.

This whole discussion has me rather amused. I just received mail from a
know-it-all who, in a remarkable display of alliteration, wrote

  You're winning big bozo points with your blustering bad mathematics

(presumably in reference to this thread). Hey, bozo, I know you're
listening, and I challenge you to identify yourself and to point out any
example of ``blustering bad mathematics'' in any of my articles in this
group. I know you'll be too scared to come out of the closet, because
once you bother to shed your computer science training and actually
think, you'll realize that I'm right. Or will you just go ahead and
post, and give me the fun of flaming you for it? For several weeks now
I've been in the perfect mood to take out my anger on a stupid yuckie
who wouldn't know mathematical truth if it hit him in the face.

---Dan

svissag@hubcap.clemson.edu (Steve L Vissage II) (11/14/90)

From article <24945:Nov1218:54:5590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> A *crude* radix sort would not do for any serious application.
> 
> A *good* radix sort is linear in the number of bytes being sorted.

I've never heard of radix sort.  Could someone please send me some source?
Preferably a *good* radix sort.  :-)

> ---Dan
 
Thanks,
Steve L Vissag II
svissag@hubcap.clemson.edu

jlg@lanl.gov (Jim Giles) (11/14/90)

From article <16709:Nov1113:56:2390@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> Sorting is linear in the number of bytes being sorted. Feel free to
> spout on about implicit log n factors that don't exist; I'll continue to
> sort n-byte files in time between bn and cn seconds for nonzero b and c.
> Models of computation are cute, but I live in the real world.

So do I.  In the real world, keys are almost _never_ just one byte
long.  The log n factors implicit in radix sorts _do_ exist for any
keyspaces that are above the 'toy' level.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/14/90)

In article <5812@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <16709:Nov1113:56:2390@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > Models of computation are cute, but I live in the real world.
> So do I.  In the real world, keys are almost _never_ just one byte
> long.

Sorting is linear in the number of bytes being sorted. The length of the
keys is absolutely, entirely irrelevant to this conclusion.

> The log n factors implicit in radix sorts _do_ exist for any
> keyspaces that are above the 'toy' level.

In the very common case that keys are not all distinct, radix sort can
be asymptotically FASTER than n log n, depending on the amount of
repetition. The ``implicit'' factor that all you computer scientists
keep imagining does NOT exist.

---Dan

jlg@lanl.gov (Jim Giles) (11/15/90)

From article <4440:Nov1405:06:2590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> Sorting is linear in the number of bytes being sorted. The length of the
> keys is absolutely, entirely irrelevant to this conclusion.

Radix sorting is linear in the number of bytes perhaps.  It is also
linear in the number of keys to be sorted _times_ the length of each
key.  The speed of sorting techniques is _uniformly_ given by _all_
the literature as a function of the number of _keys_ to be sorted,
_not_ as a function of the number of bytes sorted.  When the number
of _bytes_ sorted is greater than N log N (where N is the number of
_keys_), it is often better to use some other sort.

If a whole key (like a 64-bit integer for example) can be compared in
the same time as it takes to compare your little 8-bit chunk of it, and
if the whole key can be loaded, moved, stored in _less_ time than it
takes to _unpack_ just _one_ of your little 8-bit chunks, and if the
keyspace is large (say, 64-bit integers) but sparcely populated (like
you're only sorting 100 million of them), then radix sorting is usually
at a severe deficit compared to other methods (quicksort, for example).
Since all of the conditions above are met by _most_ sorting problems,
radix sorting is at a deficit _most_ of the time.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

In article <5948@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <4440:Nov1405:06:2590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > [...]
> > Sorting is linear in the number of bytes being sorted. The length of the
> > keys is absolutely, entirely irrelevant to this conclusion.
> Radix sorting is linear in the number of bytes perhaps.

No, not ``perhaps.'' Sorting is linear in the number of bytes being
sorted.

> It is also
> linear in the number of keys to be sorted _times_ the length of each
> key.

No, it isn't. That is an upper bound which is rarely true in practice.
Please don't say anything if you don't know what you're talking about.

> The speed of sorting techniques is _uniformly_ given by _all_
> the literature as a function of the number of _keys_ to be sorted,
> _not_ as a function of the number of bytes sorted.

You are wrong. Here's a counterexample: In 1987, in the article by
Tarjan and somebody else ``inventing'' MSD radix sort, the time bounds
were given in terms of the number of bytes. Less precise bounds were
given in terms of the number of keys under various restrictions. The
references in that article contain lots more counterexamples.

Please don't say anything if you don't know what you're talking about.

> When the number
> of _bytes_ sorted is greater than N log N (where N is the number of
> _keys_), it is often better to use some other sort.

Rarely. The number of bytes in the termcap file here is about five times
bigger than n lg n, where n is the number of lines. It is forty times
bigger than n log_256 n. Yet my radix sort takes less time on it than
the supposedly optimized merge sort that appeared in the source groups a
couple of years ago.

Please don't say anything if you don't know what you're talking about.

> If a whole key (like a 64-bit integer for example) can be compared in
> the same time as it takes to compare your little 8-bit chunk of it,

Radix sorts do not compare the bytes they are sorting. Please don't say
anything if you don't know what you're talking about.

---Dan

jlg@lanl.gov (Jim Giles) (11/16/90)

From article <5293:Nov1518:36:0490@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [... ]
> Please don't say anything if you don't know what you're talking about.

A statement that applies to _you_ far more often than to me.

> [...]
>> The speed of sorting techniques is _uniformly_ given by _all_
>> the literature as a function of the number of _keys_ to be sorted,
>> _not_ as a function of the number of bytes sorted.
> 
> You are wrong. Here's a counterexample: In 1987, in the article by
> Tarjan and somebody else ``inventing'' MSD radix sort, [...]

Oh wow!  A _single_ paper (which you can't even give a proper citation
of) which goes against the grain of the _whole_ rest of the industry.
Big deal.

> [...]                                       Less precise bounds were
> given in terms of the number of keys under various restrictions. [...]

So, even in your supposed counter-example paper the authors _do_
give the speed in terms of the number of _keys_.  This, as I pointed
out before, _is_ the standard practice of the industry.  Everyone in
the industry will _assume_ (correctly - for all but Dan) that a quoted
sort speen is in terms of the number of _keys_ - unless _explicitly_
stated otherwise.  I suspect that your cited paper _does_ make such
an explicit caveat.  _YOUR_ original claims _did_not_.

> [...]
> Please don't say anything if you don't know what you're talking about.

Again, heed your own advice.

> [...]
> Rarely. The number of bytes in the termcap file here is about five times
> bigger than n lg n, where n is the number of lines. It is forty times
> bigger than n log_256 n. Yet my radix sort takes less time on it than
> the supposedly optimized merge sort that appeared in the source groups a
> couple of years ago.

Woopee!  It is my experience with _all_ "supposedly optimized" UNIX
utilities that they are incredibly slow.  The fact that you can use
a slow technique and still beat a UNIX utility is hardly a surprise.
Further, sorting a termcap file is hardly a very interesting test:
text sorting of that kind is one of the _few_ places where I would
have agreed with you about the value of radix sorts.  You did not
suggest radix sorts in that context.  Instead, you recommended the
sort for a task which it is _not_ suited to.  You need to re-read
Knuth: he discusses the trade-offs of different sorting techniques.
He does _not_ find (as you seem to have concluded) that _one_
technique is _always_ the best choice.

> [...]
> Please don't say anything if you don't know what you're talking about.

Again, please follow this advice yourself.

> [...]
>> If a whole key (like a 64-bit integer for example) can be compared in
>> the same time as it takes to compare your little 8-bit chunk of it,
> 
> Radix sorts do not compare the bytes they are sorting. [...]

Yes, I figured you'd seize on this as soon as I posted it.  I typed
in a hurry and should have said "index with" rather than "compare."
It is interesting that you should consider this to be an important
point since it doesn't help your case at all.  The fact remains that
comparing multi-byte objects is, on most machines, just as fast
as address calculations using one byte indices.  Indeed, the address
calculation uses the same functional unit as the multi-byte compare
on most machines.  Addresses are multi-byte on most machines.

> [...]                                                  Please don't say
> anything if you don't know what you're talking about.

Again, take your own advice.  Or better yet, I don't care about
whether you follow this advice or not.  Just QUIT making insulting
remarks about others when those _same_ remarks apply more appropriately
to _you_.

J. Giles

ttw@lanl.gov (Tony Warnock) (11/16/90)

Dan Bernstein has posted the following advice:
"Please don't say anything if you don't know what you're talking about."
 
He has incorrectly stated the complexity of sorting.
Perhaps he could follow his own advice:
"Please don't say anything if you don't know what you're talking about."
 
He has stated that "a few bits" are all that are necessary to represent
value that grows combinatorily.
Perhaps he could follow his own advice:
"Please don't say anything if you don't know what you're talking about."
 
He has incorrectly stated that Fortran does not have separate compilation.
Perhaps he could follow his own advice:
"Please don't say anything if you don't know what you're talking about."
 

The road of excess leads to the palace of wisdom.  - William Blake

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

In article <6056@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <5293:Nov1518:36:0490@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > Please don't say anything if you don't know what you're talking about.
> A statement that applies to _you_ far more often than to me.

That accusation is not borne out by the evidence. See below.

> >> The speed of sorting techniques is _uniformly_ given by _all_
> >> the literature as a function of the number of _keys_ to be sorted,
> >> _not_ as a function of the number of bytes sorted.
> > You are wrong. Here's a counterexample: In 1987, in the article by
> > Tarjan and somebody else ``inventing'' MSD radix sort, [...]
> Oh wow!  A _single_ paper (which you can't even give a proper citation
> of) which goes against the grain of the _whole_ rest of the industry.

You are WRONG! You said ``uniformly.'' You are WRONG! You said ``all.''
You are WRONG! There is no escaping this fact. I rest my case.

In attempting to redeem yourself you imply that it is just a ``single''
paper. But you're wrong there. As I said in the article you're quoting
(but probably failed to read), the references to that paper have many
further counterexamples to your original statement.

Give up while you're only this far behind.

> > [...]                                       Less precise bounds were
> > given in terms of the number of keys under various restrictions. [...]
> So, even in your supposed counter-example paper the authors _do_
> give the speed in terms of the number of _keys_.

No; they only give some imprecise bounds, and only under various
restrictions (the keys being all distinct, the keys having this much to
distinguish between them, etc.). They give the speed in terms of the
number of bytes, as per a convention established in many previous
papers. There is no escaping this fact.

> Everyone in
> the industry will _assume_ (correctly - for all but Dan) that a quoted
> sort speen is in terms of the number of _keys_ - unless _explicitly_
> stated otherwise.  I suspect that your cited paper _does_ make such
> an explicit caveat.  _YOUR_ original claims _did_not_.

Are you implying that I have ever stated ``sorting is linear'' without
saying ``in the number of bytes''? I challenge you to prove your
accusation. If you fail, we will all know you have lied (again).

> Woopee!  It is my experience with _all_ "supposedly optimized" UNIX
> utilities that they are incredibly slow.  The fact that you can use
> a slow technique and still beat a UNIX utility is hardly a surprise.

Jim, I'll bet you can't write a sort program even half as fast as the
standard UNIX version. I challenge you (again) to write a sort program
that will beat mine, under the restriction that you use an O(n log n)
sort---which, of course, you claim is faster than radix sort.

  [ Knuth's chapter on sorting ]
> He does _not_ find (as you seem to have concluded) that _one_
> technique is _always_ the best choice.

He dismissed MSD radix sort as too much housekeeping to be worthwhile.
Furthermore, there are optimizations to MSD radix sort that he did not
consider. I have brought up ``adaptive distribution sort'' in a letter
to him.

As much as I respect Knuth, the man makes occasional misjudgments. His
statements about the asymptotic speed of multiplication, for example,
are incorrect. I hope to convince him to correct his statements about
Nussbaumer's method. Similarly, I hope to convince him that a variant of
radix sort is appropriate for an extremely wide range of applications,
and worth consideration in every application.

> >> If a whole key (like a 64-bit integer for example) can be compared in
> >> the same time as it takes to compare your little 8-bit chunk of it,
> > Radix sorts do not compare the bytes they are sorting. [...]
> Yes, I figured you'd seize on this as soon as I posted it.  I typed
> in a hurry and should have said "index with" rather than "compare."

Fine, so you admit you're wrong.

> > [...]                                                  Please don't say
> > anything if you don't know what you're talking about.
> Again, take your own advice.

I find it sickening how you post a series of incorrect statements, I
refute each of them, and now you imply that I'm the one who's off his
rocker.

> Just QUIT making insulting
> remarks about others when those _same_ remarks apply more appropriately
> to _you_.

Hasn't happened yet in my conversations with you.

---Dan

jlg@lanl.gov (Jim Giles) (11/16/90)

From article <8420:Nov1522:11:2290@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <6056@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> [...]
> You are WRONG! You said ``uniformly.'' You are WRONG! You said ``all.''
> You are WRONG! There is no escaping this fact. I rest my case.

You can only rest your case if (as I said previously) the paper
you cite _didn't_ explicitly mention that the time was quoted per
_byte_ of the keys.  If the paper quoted speeds in this way _without_
an explicit caveat, _then_ you have made a valid point.  Otherwise,
the paper has _no_ relevance to your claim that sorting speeds are
given in the same terms that _you_ originally did.

> [...]
> In attempting to redeem yourself you imply that it is just a ``single''
> paper. But you're wrong there. As I said in the article you're quoting
> (but probably failed to read), the references to that paper have many
> further counterexamples to your original statement.

Do they? Or do they also follow the convention that I claim is the
universal: to quote speeds as a function of the number of _keys_ and
(maybe) also in other terms - but _only_ after _explicitly_ stating
what those other terms are?  I suspect the latter.  I've been informed
that the specific paper you mentioned _does_ do this.

> [...]
> Give up while you're only this far behind.

Again, extremely good advice for _you_ to take.

> [...]                            They give the speed in terms of the
> number of bytes, as per a convention established in many previous
> papers. There is no escaping this fact.

Yes, the convention extablished of giving the speed in terms of
bytes AFTER explicitly stating that they were going to do so.

> [...]
> Are you implying that I have ever stated ``sorting is linear'' without
> saying ``in the number of bytes''? I challenge you to prove your
> accusation. If you fail, we will all know you have lied (again).

AGAIN?!?!?  The _last_ time you accused me of lying, I only had to
look back _TWO_DAYS_ to prove that you _had_ made the statement that
I attributed to you.  I have not yet recieved an apology for your
previous unfounded accusations.

In this case, the claim that you made is over a week old and has
been discarded (I only same the parts of your messages that I quote
in my responses).  However, what "we will all know" is what everyone
remembers: you _did_ claim your sort was linear _without_ saying
that you were using the non-standard measure of the number of bytes.
You corrected that to say the number of bytes _after_ several
people (including myself) took issue with your claim.  If anyone
has a longer archive of this newsgroup that I do, they will catch
you with it.

In the meantime, it's not worth pursuing since I don't even get
an apology.  The technical point has been settled anyway: your
claim to be linear used different units than everyone _expects_.
It should be noted that bubble sort is linear with the number of
_compares_ it does.  Changing the unit of measure can make just
about _anything_ linear.

> [...]
> Jim, I'll bet you can't write a sort program even half as fast as the
> standard UNIX version. I challenge you (again) to write a sort program
> that will beat mine, under the restriction that you use an O(n log n)
> sort---which, of course, you claim is faster than radix sort.

Depends on the benchmark.  For the _overwhelming_ majority of sorting
tasks, radix sort is _not_ optimal.  Only in the domain of sorting
text lines and similar applications does radix sort really shine.
Again, read your Knuth (or any other good sorting text).  Radix is
_not_ a universal solution to sorting problems.

>   [ Knuth's chapter on sorting ]
> [...]
> He dismissed MSD radix sort as too much housekeeping to be worthwhile.
> Furthermore, there are optimizations to MSD radix sort that he did not
> consider. I have brought up ``adaptive distribution sort'' in a letter
> to him.

And, since he is somewhat more intelligent than I am, I suspect he
completely ignored you (as I should have done).

>> > [...]                                                  Please don't say
>> > anything if you don't know what you're talking about.
>> Again, take your own advice.
> 
> I find it sickening how you post a series of incorrect statements, I
> refute each of them, and now you imply that I'm the one who's off his
> rocker.

Same here.  But with the additional caveat that _you_ never even
admit to having _said_ the things that you get wrong.  You change
your arguments, the units of measurement, or the context of the
discussion and then claim to have been right all along.  When it
is _proven_ that your accusations of _lying_ are completely
unfounded, you refuse to even admit it (in fact, you don't respond
at all).

> [...]
>> Just QUIT making insulting
>> remarks about others when those _same_ remarks apply more appropriately
>> to _you_.
> 
> Hasn't happened yet in my conversations with you.

1.) UNIX does asynchronous I/O automatically on output.
2.) UNIX has a reconnect capability (to jobs started by another tty session).
...
N-1.) The _compiler_ can detect all interprocedural aliasing.
VERY_LARGE_N.) Fortran doesn't have independent compilation.

These are examples of positions you've held in just the last year.
Practically _every_ discussion I've _had_ with you, you've been wrong.
Further, you've _never_once_ admitted to being wrong.  You've been
_VERY_ insulting as often as the opportunity presented itself.

Finally, the discussion _would_ be much more pleasant (and more
productive) if you'd leave the invective _out_!

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

In article <6067@lanl.gov> ttw@lanl.gov (Tony Warnock) writes:
  [ about me ]
> He has incorrectly stated the complexity of sorting.

No, I have not. You are a liar or a fool. I challenge you to show where
I have incorrectly stated the complexity of sorting. If you fail the
challenge you are a liar. If you think you have succeeded you are a
fool, for my statements about sorting have been entirely correct.

> He has stated that "a few bits" are all that are necessary to represent
> value that grows combinatorily.

No, I have not. You are a liar or a fool. I challenge you to show where
I have stated that a few bits suffice to represent an exponential
quantity. If you fail the challenge you are a liar. You will not
succeed, as I have made no such statement.

> He has incorrectly stated that Fortran does not have separate compilation.

No, I have not. You are a liar or a fool. I have said that Fortran does
not have separate compilation *as in C*; compilation follows an entirely
different philosophy in each language. I challenge you to show where I
have stated that Fortran does not have separate compilation. If you fail
the challenge you are a liar. You will not succeed, as I have made no
such statement.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

In article <6082@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <8420:Nov1522:11:2290@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > You are WRONG! You said ``uniformly.'' You are WRONG! You said ``all.''
> > You are WRONG! There is no escaping this fact. I rest my case.
> You can only rest your case if (as I said previously) the paper
> you cite _didn't_ explicitly mention that the time was quoted per
> _byte_ of the keys.

This is not a game where you can change the rules. You've been making a
series of incorrect statements, and no amount of squirming and
revisionist history will make your statements correct.

I am perfectly willing to accept your point that people assume that
``sorting is linear'' refers to linearity in the number of keys. So
such a statement would only be true if the key length is fixed, not in
the general case. That's why I make sure to say ``in the number of
bytes.''

  [ whimper about how the Holy Grail has expired ]

Just name the thread and I'll send you all my articles in that thread.

> However, what "we will all know" is what everyone
> remembers: you _did_ claim your sort was linear _without_ saying
> that you were using the non-standard measure of the number of bytes.

No, I did not. If I have ever said so it can only have been in a context
where the keys were fixed-length (sorting machine integers, perhaps).
Maybe you're remembering something like that?

> If anyone
> has a longer archive of this newsgroup that I do, they will catch
> you with it.

Go for it, folks! I'd love to see the results. Come on, everybody who
thinks that I've been making incorrect statements. Lots of sites have
two-week histories.

> > Jim, I'll bet you can't write a sort program even half as fast as the
> > standard UNIX version. I challenge you (again) to write a sort program
> > that will beat mine, under the restriction that you use an O(n log n)
> > sort---which, of course, you claim is faster than radix sort.
> Depends on the benchmark.  For the _overwhelming_ majority of sorting
> tasks, radix sort is _not_ optimal.

See, you keep claiming this. I'll even let you pick a set of three
sorting benchmarks that an impartial judge accepts as realistic. I'm
still waiting for your response to my challenge.

> >   [ Knuth's chapter on sorting ]
> > He dismissed MSD radix sort as too much housekeeping to be worthwhile.
> > Furthermore, there are optimizations to MSD radix sort that he did not
> > consider. I have brought up ``adaptive distribution sort'' in a letter
> > to him.
> And, since he is somewhat more intelligent than I am, I suspect he
> completely ignored you (as I should have done).

He is a busy man, and has many letters to respond to. However, as he
owes me about $30, I expect he will write back soon.

> >> Just QUIT making insulting
> >> remarks about others when those _same_ remarks apply more appropriately
> >> to _you_.
> > Hasn't happened yet in my conversations with you.
> 1.) UNIX does asynchronous I/O automatically on output.

It does. The meaning of asynchronous is ``not synchronous.'' There is
only one synchronous I/O (you tell the disk to run, wait for it to
complete, and then continue); there are lots of degrees of being not
synchronous. UNIX does asynchronous I/O on output.

I agree that buffering does not make I/O as asynchronous as it could be;
I thank you for helping me understand that the extra copy time is a real
burden on Crays, and hence should be eliminated if possible.

> 2.) UNIX has a reconnect capability (to jobs started by another tty session).

It does. See my pty program. UNIX has a reconnect capability.

> N-1.) The _compiler_ can detect all interprocedural aliasing.

I never said any such thing. I said that the compiler can detect a
particular type of interprocedural aliasing, which should suffice for
all the real-world problems you care about.

> VERY_LARGE_N.) Fortran doesn't have independent compilation.

I never said any such thing. I said that Fortran doesn't have
independent compilation *as in C*. I am sick of how you ignore these
qualifiers, from ``packed'' on up.

> Practically _every_ discussion I've _had_ with you, you've been wrong.

Practically every discussion I've had with you, you've been wrong.

> Further, you've _never_once_ admitted to being wrong.

Further, you simply cut off communication because it hurts your pride to
admit your mistakes.

When I make a mistake, I admit it.

---Dan

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/16/90)

In article <6082@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
> Do they? Or do they also follow the convention that I claim is the
> universal: to quote speeds as a function of the number of _keys_ and
> (maybe) also in other terms - but _only_ after _explicitly_ stating
> what those other terms are?

I think that it is worth noting that quoting sorting speed as a function
of the number of bytes to be sorted is not an *arbitrary* criterion;
when we talk about an algorithm being O(f(N)) the argument N is usually
understood as measure of the size of the input, such as the number of
bits in a "reasonable" encoding.  [See for example Garey & Johnson.]

So the convention Dan Bernstein uses to describe MSD radix sort
(linear in the size of the input) is the convention which is usual
*except* for sorting.

How come sorting uses a different convention?  For the simple reason
that most discussions of sorting complexity consider keys as single
symbols.  A typical example is Sedgewick's Acta Informatica article
about quicksort, where the bottom line (that quicksort is quick) was
obtained by plugging into his formulas the assumption that the cost
of comparing a whole key is 1/4 of the cost of an integer assignment.

If we don't assume small keys, then since quicksort does O(K log K) key
(or pointer) moves on the average, K being the number of keys, and since
K distinct keys (or pointers) need O(log K) bits each, the average cost
of quicksort is O(K (log K)**2).

What was the Tarjan et al 1987 article again?  I think it's time I read it.

-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

gsh7w@astsun.astro.Virginia.EDU (Greg Hennessy) (11/16/90)

Maybe this should be moved to alt.pissing.contest?

Tony Warnock writes:
#He[Dan] has stated that "a few bits" are all that are necessary to represent
#value that grows combinatorily.

Dan Bernstein writes:
#No, I have not. You are a liar or a fool. I challenge you to show where
#I have stated that a few bits suffice to represent an exponential
#quantity. If you fail the challenge you are a liar. You will not
#succeed, as I have made no such statement.

Perhaps Dan is engaging in one of his retorical tricks by changing the
word combinatorily to exponential, and then calling Tony a liar.

Perhaps Dand did not expect people to nitice.

#No, I have not. You are a liar or a fool. I have said that Fortran does
#not have separate compilation *as in C*; compilation follows an entirely
#different philosophy in each language. 

Then you should have said the Fortran does not have the same way that
C does. Your revisionism does not hold here. By standard use of the
english language (you do speak english, don't you Dan), your statement
was the fortran does not have seperate compilation. If you want to say
that it has it, but it is different in philosophy, then do so.

--
-Greg Hennessy, University of Virginia
 USPS Mail:     Astronomy Department, Charlottesville, VA 22903-2475 USA
 Internet:      gsh7w@virginia.edu  
 UUCP:		...!uunet!virginia!gsh7w

new@ee.udel.edu (Darren New) (11/17/90)

In article <16709:Nov1113:56:2390@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>Sorting is linear in the number of bytes being sorted. Feel free to
>spout on about implicit log n factors that don't exist; I'll continue to
>sort n-byte files in time between bn and cn seconds for nonzero b and c.
>Models of computation are cute, but I live in the real world.

Actually, for sorting to be linear in the number of bytes sorted, don't
you need some opperation on the keys other than comparison?  For
example, how would you sort (say, in Pascal) an array of floats (reals,
whatever) without using division or multiplication and still have it
take linear time?

How about this:

"I want you to sort this array of encrypted passwords into the same
order as the user numbers they belong to. Since for security reasons I
cannot reveal to your program the user number belonging to each
password, I will provide a function "porder(A,B)" which returns true if
the user number for password A comes before the user number for
password B and false otherwise. Since the user number space is very
sparse, this will not compromise security."

In this case, you will need O(n log n) calls to the porder function,
hence taking O(n log n) time, regardless of how big the keys are. This
is easy to see based on a decision-tree argument. In this case, you
really don't even have the keys (the user number), but only pointers to
keys (the passwords). I would be *very* interested to see some
(pseudo-) code that does such a sort in linear time.
                      -- Darren
-- 
--- Darren New --- Grad Student --- CIS --- Univ. of Delaware ---
----- Network Protocols, Graphics, Programming Languages, 
      Formal Description Techniques (esp. Estelle), Coffee, Amigas -----
              =+=+=+ Let GROPE be an N-tuple where ... +=+=+=

mwm@raven.relay.pa.dec.com (Mike (My Watch Has Windows) Meyer) (11/17/90)

   In article <24945:Nov1218:54:5590@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
   Sorting is linear in the number of bytes being sorted. This is true both
   theoretically and practically.

While obviously true in some situations (but I can do O(1) sorting in
"some" situations), this goes against the grain for the general case.

Rather than arguing theory, I'll just ask for proof. How about making
available sources to either 1) a plug-compatible replacement for
qsort(3) or 2) a replacement for sort(1), so we can try some test
cases and see how they work, and how well they work.

	<mike
--

ttw@lanl.gov (Tony Warnock) (11/17/90)

    Dan has called me a liar and a fool, he may be right about the
    latter as I am taking time to reply to him. With respect to the
    former, I shall only quote Dan's own words. If he says that his
    points are being misunderstood, he should clarify them rather than
    engaging in name calling. Here are a few of the exact quotes made
    with respect to sorting, number of ways of aliasing arguments and
    Fortran compilation.
 
"The number of the partition chosen takes at most a few bits to encode."
                                                        - Dan Bernstein
"Please don't say anything if you don't know what you're talking about."
                                                        - Dan Bernstein
 
 
"Luckily, on the other side of the fence, we programmers still know that
sorting is linear."
                                                        - Dan Bernstein
"Please don't say anything if you don't know what you're talking about."
                                                        - Dan Bernstein
 
 
"Fortran does not have separate compilation like C."
                                                        - Dan Bernstein
"Please don't say anything if you don't know what you're talking about."
                                                        - Dan Bernstein
 
                                                        - Dan Bernstein
"Please don't say anything if you don't know what you're talking about."
                                                        - Dan Bernstein
 
"I thank God for not making me a computer scientist."
                                                        - Dan Bernstein
"Please don't say anything if you don't know what you're talking about."
                                                        - Dan Bernstein

     If the fool would persist in his folly he would become wise.
                                                     - William Blake

dbc@bushido.uucp (Dave Caswell) (11/18/90)

.
.In the very common case that keys are not all distinct, radix sort can
.be asymptotically FASTER than n log n, depending on the amount of
.repetition. 

Shouldn't this be "keys are not all the same"?


-- 
David Caswell                             dbc%bushido.uucp@umich.edu

dbc@bushido.uucp (Dave Caswell) (11/18/90)

In article <6067@lanl.gov> ttw@lanl.gov (Tony Warnock) writes:
.>
.>
.>Dan Bernstein has posted the following advice:
.>"Please don't say anything if you don't know what you're talking about."
.> 
.>He has incorrectly stated the complexity of sorting.
.>Perhaps he could follow his own advice:
.>"Please don't say anything if you don't know what you're talking about."
.> 
.>He has stated that "a few bits" are all that are necessary to represent
.>value that grows combinatorily.
.>Perhaps he could follow his own advice:
.>"Please don't say anything if you don't know what you're talking about."
.> 
.>He has incorrectly stated that Fortran does not have separate compilation.
.>Perhaps he could follow his own advice:
.>"Please don't say anything if you don't know what you're talking about."
.> 
.>
.
I read this group religiously and haven't seen Dan make statements that
would backup the above claims.  I think it's more likely you don't know
the answers.  I often ask people about sorting at interviews, anyone
who just regurgitates a quote like n log n loses.  I don't have a book
that discusses sorting (Knuth etc.) that says the "answer" is n log n.


-- 
David Caswell                             dbc%bushido.uucp@umich.edu

dbc@bushido.uucp (Dave Caswell) (11/18/90)

.Actually, for sorting to be linear in the number of bytes sorted, don't
.you need some opperation on the keys other than comparison?  For

Yes.  For many (most, all) problems that isn't hard.  Dan has consistently
repeatedly, over and over again stated that that wasn't the model he was
using; he just doesn't compare keys.


-- 
David Caswell                             dbc%bushido.uucp@umich.edu

sommar@enea.se (Erland Sommarskog) (11/18/90)

Dave Caswell (dbc@bushido.uucp) writes:
>I read this group religiously and haven't seen Dan make statements that
>would backup the above claims.

The problem seems to be that he hasn't bothered to give any
technical arguments, but just tried to refute people with
statements like "Wrong." He may be right, but is he is certainly
not convincing. Disclaimer: due to mild interest and news
problems, I haven't read all articles on this thread.

>I often ask people about sorting at interviews, anyone
>who just regurgitates a quote like n log n loses.  I don't have a book
>that discusses sorting (Knuth etc.) that says the "answer" is n log n.

I guess you would never employ me. If we forget the recent
discussion I wouldn't have even the faintest idea. I recall
that I was exposed to exchange, bubble and quicksort in
school but whether they are linear, quadtratic or cubic
I don't bother about. (Don't we have rec.games.trivia for
that? :-) Seriously, the days are through when the most
essential in computer science and software engineering was
sorting algorithms. The day I need one, I will consult the
literature. Or simply think of the best way of calling
some built-in sort facility.

Oh, actually I wrote a generic Ada package for the topological
sorting algorithm presented in Knuth's book some years ago.
I seem to recall that this algorithm is linear in difference
to tsort(1) which is quadratic. (Topologcial sorting is a
special-purpose sorting and is not compatible normal sorting.)
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
"Handlar Hanssons halta h|na har haft hosta hela halva h|sten"

rang@cs.wisc.edu (Anton Rang) (11/19/90)

Arrggh.  I know I'm going to regret getting into this thread.

In article <237@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
>One time only, I'll try to make it very simple.  If you do a radix sort
>you make multiple passes over the data.  In each pass you move all of the
>records.  The minimum number of passes needed is log(N) where N is the
>number of records and the base of the logarithm is the bin size.  N*log(N)
>records must be moved.  Is that clear enough?

  Are we assuming that the radix sort uses a fixed amount of auxiliary
storage?  If you sort using pointers, the time would be made up of:

  * The number of passes, which is at most the number of bytes/key;
      (replace 'byte' by a storage unit for your favorite base)
    Times the number of keys, since you're sorting all the records;
    Times the size of a pointer, since you're moving the pointers.

      This is O(bytes/key) * O(number of keys) * size of pointer.

      (If pointers are not fixed-length, this isn't linear; if they
      are, it's linear in the number of bytes.)

  * The time to rearrange records, which is obviously linear in the
    number of bytes.

Am I missing something here?  The original messages aren't on my news
system any more.  Is there an explanation of the algorithm being
discussed published somewhere?

	Anton
   
+---------------------------+------------------+-------------+
| Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison |
+---------------------------+------------------+-------------+