[comp.lang.misc] A very brief history of optimal sorting methods

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/10/90)

This is funny. Computer science really does progress backwards.

In the '60s, people knew that sorting was linear in the number of bytes
being sorted. The post office had already been using MSD radix sort for
a long time, though they probably never bothered to analyze its running
time. Knuth says that LSD became more popular for computers because in
MSD ``the multiplicity of piles becomes confusing''; others pointed
wisely to the initials LSD and smiled.

In 1987, Tarjan et al. appeared on the sorting scene, with their ``new
partitioning algorithms.'' That's right: sorting in linear time. With
what? MSD radix sort. Hey, I'm impressed.

Last year, somebody or other appeared on the sorting scene. Guess what?
An o(n log n) sort. This one wasn't linear, though; it wasn't even
practical. It used all sorts of weird bit manipulations to ``break the
sorting barrier.'' Something like n log n/log log n. Computer scientists
around the world began teaching courses on this huge advance. (No joke.)

Now Jim Giles appears on the sorting scene. Guess what? Radix sorts are
slower than n log n. n log n is a lower bound.

I thank God for not making me a computer scientist.

Luckily, on the other side of the fence, we programmers still know that
sorting is linear. Thanks to Keith Bostic, a nicely cleaned up version
of my linear-time radix sort code will appear in the BSD 4.4 library.

In article <5488@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
  [ there aren't linear-time sorts, everything's O(n log n) ]
> That's why I didn't bother to answer
> your next post - you didn't seem to realize that an Nlog(N) sort is
> _faster_ than a 'linear' radix sort when the keyspace is sparcely
> filled.)

The sort program I plan to contribute to Berkeley runs more than twice
as fast as the AT&T-derived version, which I presume uses a merge sort
or some other n log n method. Believe what you want to, Jim.

---Dan

rh@smds.UUCP (Richard Harter) (11/11/90)

In article <6913:Nov1008:23:5690@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> This is funny. Computer science really does progress backwards.

> In the '60s, people knew that sorting was linear in the number of bytes
> being sorted. The post office had already been using MSD radix sort for
> a long time, though they probably never bothered to analyze its running
> time. Knuth says that LSD became more popular for computers because in
> MSD ``the multiplicity of piles becomes confusing''; others pointed
> wisely to the initials LSD and smiled.

	[Much more material in this vein deleted.]

Dan, me boy, you're having a wee bit of trouble telling your burro 
from yer burrow.

If you have N records with distinct keys the effective length of the
keys is log(N).  MS* sorts and binary sorts (qsort, mergesort, whatever)
must do O(N(log(N)) record (or pointer) moves.  In either case the
result is N*log(N) records moved.  In principle an MS* sort need only
examine N*log(N) bits whereas a binary sort must examine N*log(N)**2
bits; in practice this advantage is pretty nominal.  The true advantage
of MS* (but not LS*) sorts is that one can use more than two bins,
changing the base of the log term.  For example, a byte based MSB sort
uses 256 bins which reduces the number of moves by a factor of 8.

THERE ARE NO TRULY LINEAR SORTS.

There are, however, quasi-linear sorts, i.e. sorts which are linear
for a restricted range of N and some restrictions on the data.  These
sorts depend, ultimately, on the fact that arithmetic is (up to the
word size) a parallel operation.  The fastest possible sort requires
an address calculation function which gives, for each key, the location
in the sorted file that the record must be moved to.  Given this one
can sort in linear time -- provided that a record can be moved from
one location to any other in constant time.  The latter requirement cannot
be satisfied for sufficiently large N (i.e. the file is larger than
available memory).
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/11/90)

In article <235@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
> In article <6913:Nov1008:23:5690@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> > This is funny. Computer science really does progress backwards.
> > In the '60s, people knew that sorting was linear in the number of bytes
> > being sorted.
> If you have N records with distinct keys the effective length of the
> keys is log(N).
  [ several sentences of similar handwaving deleted ]
> THERE ARE NO TRULY LINEAR SORTS.

Sorting is linear in the number of bytes being sorted. Feel free to
spout on about implicit log n factors that don't exist; I'll continue to
sort n-byte files in time between bn and cn seconds for nonzero b and c.
Models of computation are cute, but I live in the real world.

---Dan

rh@smds.UUCP (Richard Harter) (11/12/90)

In article <16709:Nov1113:56:2390@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

> Sorting is linear in the number of bytes being sorted. Feel free to
> spout on about implicit log n factors that don't exist; I'll continue to
> sort n-byte files in time between bn and cn seconds for nonzero b and c.
> Models of computation are cute, but I live in the real world.

One time only, I'll try to make it very simple.  If you do a radix sort
you make multiple passes over the data.  In each pass you move all of the
records.  The minimum number of passes needed is log(N) where N is the
number of records and the base of the logarithm is the bin size.  N*log(N)
records must be moved.  Is that clear enough?


-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/12/90)

In article <237@smds.UUCP>, rh@smds.UUCP (Richard Harter) writes:
> In article <16709:Nov1113:56:2390@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> > Sorting is linear in the number of bytes being sorted.

> One time only, I'll try to make it very simple.

It is worth noting, however, that for a *fixed* range of integers [a,b],
a record is ceil(lg(b-a+1)) bits (call that lg(M)), the entire collection
of records is O(N*lg(M)) bits, and if we sort K bits at a time, we require
ceil(lg(M)/K) passes, each pass of which is linear in the number of bytes
being sorted.  Consider an array of N 32-bit integers, where we take 8-bit
"digits".  (Not wholly unrealistic.)  Then we need a maximum of 4 passes.

So if you want to sort an array of Pascal 'integer's, C 'long's,
Fortran 'INTEGER*4's, and so on, it's going to take you at most 4 passes.
For practical purposes, radix sort of machine integers may be taken as
linear in the number of bytes, because the range of values is fixed.

The studies which have allegedly shown quicksort to be fast have considered
sorting hardware integers.  For that special case, quicksort is O(N**2) and
radix sort is O(N), and a fairly crude radix sort can make quicksort look
very sick indeed.

(A crude radix sort would not do for languages like Lisp, Scheme, Pop, and
some Prologs which don't impose arbitrary and to-the-user unmotivated
limits on the size of integers.)

-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

hansm@cs.kun.nl (Hans Mulder) (11/12/90)

In article <237@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
> .....  I'll continue to
> sort n-byte files in time between bn and cn seconds for nonzero b and c.

In article <16709:Nov1113:56:2390@kramden.acf.nyu.edu>,
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) argues that if the file
contains N records, they must be at least log(N) long, in order to be
distinct, and that it takes at least N*log(N) time to sort them.

This is perfectly consistent: if there are N records and each is log(N)
bytes long, then there are N*log(N) bytes of data in the file.

In other words, n == N*log(N).  You agree.

Have a nice day,

Hans Mulder	hansm@cs.kun.nl

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/13/90)

In article <237@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
> If you do a radix sort
> you make multiple passes over the data.  In each pass you move all of the
> records.

Wrong.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/13/90)

In article <4248@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
> For practical purposes, radix sort of machine integers may be taken as
> linear in the number of bytes, because the range of values is fixed.

You imply that this is not true for theoretical purposes. You are wrong.

Sorting is linear in the number of bytes being sorted. This is true both
theoretically and practically.

> (A crude radix sort would not do for languages like Lisp, Scheme, Pop, and
> some Prologs

A *crude* radix sort would not do for any serious application.

A *good* radix sort is linear in the number of bytes being sorted.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/13/90)

In article <2432@wn1.sci.kun.nl> hansm@cs.kun.nl (Hans Mulder) writes:
  [ Hans attributed to Richard, but I actually said: ]
> > .....  I'll continue to
> > sort n-byte files in time between bn and cn seconds for nonzero b and c.
  [ Hans paraphrased Richard, attributing it to me: ]
> if the file
> contains N records, they must be at least log(N) long, in order to be
> distinct,

There is no reason to assume distinctness. In fact, many real database
applications feature large numbers of identical keys. It would be silly
to use an O(n log n) sort when a constant-time-per-byte sort suffices.

However, you are correct that a linear sort is asymptotically no faster
than an O(n log n)-per-record sort *if* the records are distinct.

End technical content.

This whole discussion has me rather amused. I just received mail from a
know-it-all who, in a remarkable display of alliteration, wrote

  You're winning big bozo points with your blustering bad mathematics

(presumably in reference to this thread). Hey, bozo, I know you're
listening, and I challenge you to identify yourself and to point out any
example of ``blustering bad mathematics'' in any of my articles in this
group. I know you'll be too scared to come out of the closet, because
once you bother to shed your computer science training and actually
think, you'll realize that I'm right. Or will you just go ahead and
post, and give me the fun of flaming you for it? For several weeks now
I've been in the perfect mood to take out my anger on a stupid yuckie
who wouldn't know mathematical truth if it hit him in the face.

---Dan

svissag@hubcap.clemson.edu (Steve L Vissage II) (11/14/90)

From article <24945:Nov1218:54:5590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> A *crude* radix sort would not do for any serious application.
> 
> A *good* radix sort is linear in the number of bytes being sorted.

I've never heard of radix sort.  Could someone please send me some source?
Preferably a *good* radix sort.  :-)

> ---Dan
 
Thanks,
Steve L Vissag II
svissag@hubcap.clemson.edu

jlg@lanl.gov (Jim Giles) (11/14/90)

From article <16709:Nov1113:56:2390@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> Sorting is linear in the number of bytes being sorted. Feel free to
> spout on about implicit log n factors that don't exist; I'll continue to
> sort n-byte files in time between bn and cn seconds for nonzero b and c.
> Models of computation are cute, but I live in the real world.

So do I.  In the real world, keys are almost _never_ just one byte
long.  The log n factors implicit in radix sorts _do_ exist for any
keyspaces that are above the 'toy' level.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/14/90)

In article <5812@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <16709:Nov1113:56:2390@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > Models of computation are cute, but I live in the real world.
> So do I.  In the real world, keys are almost _never_ just one byte
> long.

Sorting is linear in the number of bytes being sorted. The length of the
keys is absolutely, entirely irrelevant to this conclusion.

> The log n factors implicit in radix sorts _do_ exist for any
> keyspaces that are above the 'toy' level.

In the very common case that keys are not all distinct, radix sort can
be asymptotically FASTER than n log n, depending on the amount of
repetition. The ``implicit'' factor that all you computer scientists
keep imagining does NOT exist.

---Dan

jlg@lanl.gov (Jim Giles) (11/15/90)

From article <4440:Nov1405:06:2590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [...]
> Sorting is linear in the number of bytes being sorted. The length of the
> keys is absolutely, entirely irrelevant to this conclusion.

Radix sorting is linear in the number of bytes perhaps.  It is also
linear in the number of keys to be sorted _times_ the length of each
key.  The speed of sorting techniques is _uniformly_ given by _all_
the literature as a function of the number of _keys_ to be sorted,
_not_ as a function of the number of bytes sorted.  When the number
of _bytes_ sorted is greater than N log N (where N is the number of
_keys_), it is often better to use some other sort.

If a whole key (like a 64-bit integer for example) can be compared in
the same time as it takes to compare your little 8-bit chunk of it, and
if the whole key can be loaded, moved, stored in _less_ time than it
takes to _unpack_ just _one_ of your little 8-bit chunks, and if the
keyspace is large (say, 64-bit integers) but sparcely populated (like
you're only sorting 100 million of them), then radix sorting is usually
at a severe deficit compared to other methods (quicksort, for example).
Since all of the conditions above are met by _most_ sorting problems,
radix sorting is at a deficit _most_ of the time.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

In article <5948@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <4440:Nov1405:06:2590@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > [...]
> > Sorting is linear in the number of bytes being sorted. The length of the
> > keys is absolutely, entirely irrelevant to this conclusion.
> Radix sorting is linear in the number of bytes perhaps.

No, not ``perhaps.'' Sorting is linear in the number of bytes being
sorted.

> It is also
> linear in the number of keys to be sorted _times_ the length of each
> key.

No, it isn't. That is an upper bound which is rarely true in practice.
Please don't say anything if you don't know what you're talking about.

> The speed of sorting techniques is _uniformly_ given by _all_
> the literature as a function of the number of _keys_ to be sorted,
> _not_ as a function of the number of bytes sorted.

You are wrong. Here's a counterexample: In 1987, in the article by
Tarjan and somebody else ``inventing'' MSD radix sort, the time bounds
were given in terms of the number of bytes. Less precise bounds were
given in terms of the number of keys under various restrictions. The
references in that article contain lots more counterexamples.

Please don't say anything if you don't know what you're talking about.

> When the number
> of _bytes_ sorted is greater than N log N (where N is the number of
> _keys_), it is often better to use some other sort.

Rarely. The number of bytes in the termcap file here is about five times
bigger than n lg n, where n is the number of lines. It is forty times
bigger than n log_256 n. Yet my radix sort takes less time on it than
the supposedly optimized merge sort that appeared in the source groups a
couple of years ago.

Please don't say anything if you don't know what you're talking about.

> If a whole key (like a 64-bit integer for example) can be compared in
> the same time as it takes to compare your little 8-bit chunk of it,

Radix sorts do not compare the bytes they are sorting. Please don't say
anything if you don't know what you're talking about.

---Dan

jlg@lanl.gov (Jim Giles) (11/16/90)

From article <5293:Nov1518:36:0490@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> [... ]
> Please don't say anything if you don't know what you're talking about.

A statement that applies to _you_ far more often than to me.

> [...]
>> The speed of sorting techniques is _uniformly_ given by _all_
>> the literature as a function of the number of _keys_ to be sorted,
>> _not_ as a function of the number of bytes sorted.
> 
> You are wrong. Here's a counterexample: In 1987, in the article by
> Tarjan and somebody else ``inventing'' MSD radix sort, [...]

Oh wow!  A _single_ paper (which you can't even give a proper citation
of) which goes against the grain of the _whole_ rest of the industry.
Big deal.

> [...]                                       Less precise bounds were
> given in terms of the number of keys under various restrictions. [...]

So, even in your supposed counter-example paper the authors _do_
give the speed in terms of the number of _keys_.  This, as I pointed
out before, _is_ the standard practice of the industry.  Everyone in
the industry will _assume_ (correctly - for all but Dan) that a quoted
sort speen is in terms of the number of _keys_ - unless _explicitly_
stated otherwise.  I suspect that your cited paper _does_ make such
an explicit caveat.  _YOUR_ original claims _did_not_.

> [...]
> Please don't say anything if you don't know what you're talking about.

Again, heed your own advice.

> [...]
> Rarely. The number of bytes in the termcap file here is about five times
> bigger than n lg n, where n is the number of lines. It is forty times
> bigger than n log_256 n. Yet my radix sort takes less time on it than
> the supposedly optimized merge sort that appeared in the source groups a
> couple of years ago.

Woopee!  It is my experience with _all_ "supposedly optimized" UNIX
utilities that they are incredibly slow.  The fact that you can use
a slow technique and still beat a UNIX utility is hardly a surprise.
Further, sorting a termcap file is hardly a very interesting test:
text sorting of that kind is one of the _few_ places where I would
have agreed with you about the value of radix sorts.  You did not
suggest radix sorts in that context.  Instead, you recommended the
sort for a task which it is _not_ suited to.  You need to re-read
Knuth: he discusses the trade-offs of different sorting techniques.
He does _not_ find (as you seem to have concluded) that _one_
technique is _always_ the best choice.

> [...]
> Please don't say anything if you don't know what you're talking about.

Again, please follow this advice yourself.

> [...]
>> If a whole key (like a 64-bit integer for example) can be compared in
>> the same time as it takes to compare your little 8-bit chunk of it,
> 
> Radix sorts do not compare the bytes they are sorting. [...]

Yes, I figured you'd seize on this as soon as I posted it.  I typed
in a hurry and should have said "index with" rather than "compare."
It is interesting that you should consider this to be an important
point since it doesn't help your case at all.  The fact remains that
comparing multi-byte objects is, on most machines, just as fast
as address calculations using one byte indices.  Indeed, the address
calculation uses the same functional unit as the multi-byte compare
on most machines.  Addresses are multi-byte on most machines.

> [...]                                                  Please don't say
> anything if you don't know what you're talking about.

Again, take your own advice.  Or better yet, I don't care about
whether you follow this advice or not.  Just QUIT making insulting
remarks about others when those _same_ remarks apply more appropriately
to _you_.

J. Giles

ttw@lanl.gov (Tony Warnock) (11/16/90)

Dan Bernstein has posted the following advice:
"Please don't say anything if you don't know what you're talking about."
 
He has incorrectly stated the complexity of sorting.
Perhaps he could follow his own advice:
"Please don't say anything if you don't know what you're talking about."
 
He has stated that "a few bits" are all that are necessary to represent
value that grows combinatorily.
Perhaps he could follow his own advice:
"Please don't say anything if you don't know what you're talking about."
 
He has incorrectly stated that Fortran does not have separate compilation.
Perhaps he could follow his own advice:
"Please don't say anything if you don't know what you're talking about."
 

The road of excess leads to the palace of wisdom.  - William Blake

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

In article <6056@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <5293:Nov1518:36:0490@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > Please don't say anything if you don't know what you're talking about.
> A statement that applies to _you_ far more often than to me.

That accusation is not borne out by the evidence. See below.

> >> The speed of sorting techniques is _uniformly_ given by _all_
> >> the literature as a function of the number of _keys_ to be sorted,
> >> _not_ as a function of the number of bytes sorted.
> > You are wrong. Here's a counterexample: In 1987, in the article by
> > Tarjan and somebody else ``inventing'' MSD radix sort, [...]
> Oh wow!  A _single_ paper (which you can't even give a proper citation
> of) which goes against the grain of the _whole_ rest of the industry.

You are WRONG! You said ``uniformly.'' You are WRONG! You said ``all.''
You are WRONG! There is no escaping this fact. I rest my case.

In attempting to redeem yourself you imply that it is just a ``single''
paper. But you're wrong there. As I said in the article you're quoting
(but probably failed to read), the references to that paper have many
further counterexamples to your original statement.

Give up while you're only this far behind.

> > [...]                                       Less precise bounds were
> > given in terms of the number of keys under various restrictions. [...]
> So, even in your supposed counter-example paper the authors _do_
> give the speed in terms of the number of _keys_.

No; they only give some imprecise bounds, and only under various
restrictions (the keys being all distinct, the keys having this much to
distinguish between them, etc.). They give the speed in terms of the
number of bytes, as per a convention established in many previous
papers. There is no escaping this fact.

> Everyone in
> the industry will _assume_ (correctly - for all but Dan) that a quoted
> sort speen is in terms of the number of _keys_ - unless _explicitly_
> stated otherwise.  I suspect that your cited paper _does_ make such
> an explicit caveat.  _YOUR_ original claims _did_not_.

Are you implying that I have ever stated ``sorting is linear'' without
saying ``in the number of bytes''? I challenge you to prove your
accusation. If you fail, we will all know you have lied (again).

> Woopee!  It is my experience with _all_ "supposedly optimized" UNIX
> utilities that they are incredibly slow.  The fact that you can use
> a slow technique and still beat a UNIX utility is hardly a surprise.

Jim, I'll bet you can't write a sort program even half as fast as the
standard UNIX version. I challenge you (again) to write a sort program
that will beat mine, under the restriction that you use an O(n log n)
sort---which, of course, you claim is faster than radix sort.

  [ Knuth's chapter on sorting ]
> He does _not_ find (as you seem to have concluded) that _one_
> technique is _always_ the best choice.

He dismissed MSD radix sort as too much housekeeping to be worthwhile.
Furthermore, there are optimizations to MSD radix sort that he did not
consider. I have brought up ``adaptive distribution sort'' in a letter
to him.

As much as I respect Knuth, the man makes occasional misjudgments. His
statements about the asymptotic speed of multiplication, for example,
are incorrect. I hope to convince him to correct his statements about
Nussbaumer's method. Similarly, I hope to convince him that a variant of
radix sort is appropriate for an extremely wide range of applications,
and worth consideration in every application.

> >> If a whole key (like a 64-bit integer for example) can be compared in
> >> the same time as it takes to compare your little 8-bit chunk of it,
> > Radix sorts do not compare the bytes they are sorting. [...]
> Yes, I figured you'd seize on this as soon as I posted it.  I typed
> in a hurry and should have said "index with" rather than "compare."

Fine, so you admit you're wrong.

> > [...]                                                  Please don't say
> > anything if you don't know what you're talking about.
> Again, take your own advice.

I find it sickening how you post a series of incorrect statements, I
refute each of them, and now you imply that I'm the one who's off his
rocker.

> Just QUIT making insulting
> remarks about others when those _same_ remarks apply more appropriately
> to _you_.

Hasn't happened yet in my conversations with you.

---Dan

jlg@lanl.gov (Jim Giles) (11/16/90)

From article <8420:Nov1522:11:2290@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> In article <6056@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> [...]
> You are WRONG! You said ``uniformly.'' You are WRONG! You said ``all.''
> You are WRONG! There is no escaping this fact. I rest my case.

You can only rest your case if (as I said previously) the paper
you cite _didn't_ explicitly mention that the time was quoted per
_byte_ of the keys.  If the paper quoted speeds in this way _without_
an explicit caveat, _then_ you have made a valid point.  Otherwise,
the paper has _no_ relevance to your claim that sorting speeds are
given in the same terms that _you_ originally did.

> [...]
> In attempting to redeem yourself you imply that it is just a ``single''
> paper. But you're wrong there. As I said in the article you're quoting
> (but probably failed to read), the references to that paper have many
> further counterexamples to your original statement.

Do they? Or do they also follow the convention that I claim is the
universal: to quote speeds as a function of the number of _keys_ and
(maybe) also in other terms - but _only_ after _explicitly_ stating
what those other terms are?  I suspect the latter.  I've been informed
that the specific paper you mentioned _does_ do this.

> [...]
> Give up while you're only this far behind.

Again, extremely good advice for _you_ to take.

> [...]                            They give the speed in terms of the
> number of bytes, as per a convention established in many previous
> papers. There is no escaping this fact.

Yes, the convention extablished of giving the speed in terms of
bytes AFTER explicitly stating that they were going to do so.

> [...]
> Are you implying that I have ever stated ``sorting is linear'' without
> saying ``in the number of bytes''? I challenge you to prove your
> accusation. If you fail, we will all know you have lied (again).

AGAIN?!?!?  The _last_ time you accused me of lying, I only had to
look back _TWO_DAYS_ to prove that you _had_ made the statement that
I attributed to you.  I have not yet recieved an apology for your
previous unfounded accusations.

In this case, the claim that you made is over a week old and has
been discarded (I only same the parts of your messages that I quote
in my responses).  However, what "we will all know" is what everyone
remembers: you _did_ claim your sort was linear _without_ saying
that you were using the non-standard measure of the number of bytes.
You corrected that to say the number of bytes _after_ several
people (including myself) took issue with your claim.  If anyone
has a longer archive of this newsgroup that I do, they will catch
you with it.

In the meantime, it's not worth pursuing since I don't even get
an apology.  The technical point has been settled anyway: your
claim to be linear used different units than everyone _expects_.
It should be noted that bubble sort is linear with the number of
_compares_ it does.  Changing the unit of measure can make just
about _anything_ linear.

> [...]
> Jim, I'll bet you can't write a sort program even half as fast as the
> standard UNIX version. I challenge you (again) to write a sort program
> that will beat mine, under the restriction that you use an O(n log n)
> sort---which, of course, you claim is faster than radix sort.

Depends on the benchmark.  For the _overwhelming_ majority of sorting
tasks, radix sort is _not_ optimal.  Only in the domain of sorting
text lines and similar applications does radix sort really shine.
Again, read your Knuth (or any other good sorting text).  Radix is
_not_ a universal solution to sorting problems.

>   [ Knuth's chapter on sorting ]
> [...]
> He dismissed MSD radix sort as too much housekeeping to be worthwhile.
> Furthermore, there are optimizations to MSD radix sort that he did not
> consider. I have brought up ``adaptive distribution sort'' in a letter
> to him.

And, since he is somewhat more intelligent than I am, I suspect he
completely ignored you (as I should have done).

>> > [...]                                                  Please don't say
>> > anything if you don't know what you're talking about.
>> Again, take your own advice.
> 
> I find it sickening how you post a series of incorrect statements, I
> refute each of them, and now you imply that I'm the one who's off his
> rocker.

Same here.  But with the additional caveat that _you_ never even
admit to having _said_ the things that you get wrong.  You change
your arguments, the units of measurement, or the context of the
discussion and then claim to have been right all along.  When it
is _proven_ that your accusations of _lying_ are completely
unfounded, you refuse to even admit it (in fact, you don't respond
at all).

> [...]
>> Just QUIT making insulting
>> remarks about others when those _same_ remarks apply more appropriately
>> to _you_.
> 
> Hasn't happened yet in my conversations with you.

1.) UNIX does asynchronous I/O automatically on output.
2.) UNIX has a reconnect capability (to jobs started by another tty session).
...
N-1.) The _compiler_ can detect all interprocedural aliasing.
VERY_LARGE_N.) Fortran doesn't have independent compilation.

These are examples of positions you've held in just the last year.
Practically _every_ discussion I've _had_ with you, you've been wrong.
Further, you've _never_once_ admitted to being wrong.  You've been
_VERY_ insulting as often as the opportunity presented itself.

Finally, the discussion _would_ be much more pleasant (and more
productive) if you'd leave the invective _out_!

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

In article <6067@lanl.gov> ttw@lanl.gov (Tony Warnock) writes:
  [ about me ]
> He has incorrectly stated the complexity of sorting.

No, I have not. You are a liar or a fool. I challenge you to show where
I have incorrectly stated the complexity of sorting. If you fail the
challenge you are a liar. If you think you have succeeded you are a
fool, for my statements about sorting have been entirely correct.

> He has stated that "a few bits" are all that are necessary to represent
> value that grows combinatorily.

No, I have not. You are a liar or a fool. I challenge you to show where
I have stated that a few bits suffice to represent an exponential
quantity. If you fail the challenge you are a liar. You will not
succeed, as I have made no such statement.

> He has incorrectly stated that Fortran does not have separate compilation.

No, I have not. You are a liar or a fool. I have said that Fortran does
not have separate compilation *as in C*; compilation follows an entirely
different philosophy in each language. I challenge you to show where I
have stated that Fortran does not have separate compilation. If you fail
the challenge you are a liar. You will not succeed, as I have made no
such statement.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

In article <6082@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> From article <8420:Nov1522:11:2290@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein):
> > You are WRONG! You said ``uniformly.'' You are WRONG! You said ``all.''
> > You are WRONG! There is no escaping this fact. I rest my case.
> You can only rest your case if (as I said previously) the paper
> you cite _didn't_ explicitly mention that the time was quoted per
> _byte_ of the keys.

This is not a game where you can change the rules. You've been making a
series of incorrect statements, and no amount of squirming and
revisionist history will make your statements correct.

I am perfectly willing to accept your point that people assume that
``sorting is linear'' refers to linearity in the number of keys. So
such a statement would only be true if the key length is fixed, not in
the general case. That's why I make sure to say ``in the number of
bytes.''

  [ whimper about how the Holy Grail has expired ]

Just name the thread and I'll send you all my articles in that thread.

> However, what "we will all know" is what everyone
> remembers: you _did_ claim your sort was linear _without_ saying
> that you were using the non-standard measure of the number of bytes.

No, I did not. If I have ever said so it can only have been in a context
where the keys were fixed-length (sorting machine integers, perhaps).
Maybe you're remembering something like that?

> If anyone
> has a longer archive of this newsgroup that I do, they will catch
> you with it.

Go for it, folks! I'd love to see the results. Come on, everybody who
thinks that I've been making incorrect statements. Lots of sites have
two-week histories.

> > Jim, I'll bet you can't write a sort program even half as fast as the
> > standard UNIX version. I challenge you (again) to write a sort program
> > that will beat mine, under the restriction that you use an O(n log n)
> > sort---which, of course, you claim is faster than radix sort.
> Depends on the benchmark.  For the _overwhelming_ majority of sorting
> tasks, radix sort is _not_ optimal.

See, you keep claiming this. I'll even let you pick a set of three
sorting benchmarks that an impartial judge accepts as realistic. I'm
still waiting for your response to my challenge.

> >   [ Knuth's chapter on sorting ]
> > He dismissed MSD radix sort as too much housekeeping to be worthwhile.
> > Furthermore, there are optimizations to MSD radix sort that he did not
> > consider. I have brought up ``adaptive distribution sort'' in a letter
> > to him.
> And, since he is somewhat more intelligent than I am, I suspect he
> completely ignored you (as I should have done).

He is a busy man, and has many letters to respond to. However, as he
owes me about $30, I expect he will write back soon.

> >> Just QUIT making insulting
> >> remarks about others when those _same_ remarks apply more appropriately
> >> to _you_.
> > Hasn't happened yet in my conversations with you.
> 1.) UNIX does asynchronous I/O automatically on output.

It does. The meaning of asynchronous is ``not synchronous.'' There is
only one synchronous I/O (you tell the disk to run, wait for it to
complete, and then continue); there are lots of degrees of being not
synchronous. UNIX does asynchronous I/O on output.

I agree that buffering does not make I/O as asynchronous as it could be;
I thank you for helping me understand that the extra copy time is a real
burden on Crays, and hence should be eliminated if possible.

> 2.) UNIX has a reconnect capability (to jobs started by another tty session).

It does. See my pty program. UNIX has a reconnect capability.

> N-1.) The _compiler_ can detect all interprocedural aliasing.

I never said any such thing. I said that the compiler can detect a
particular type of interprocedural aliasing, which should suffice for
all the real-world problems you care about.

> VERY_LARGE_N.) Fortran doesn't have independent compilation.

I never said any such thing. I said that Fortran doesn't have
independent compilation *as in C*. I am sick of how you ignore these
qualifiers, from ``packed'' on up.

> Practically _every_ discussion I've _had_ with you, you've been wrong.

Practically every discussion I've had with you, you've been wrong.

> Further, you've _never_once_ admitted to being wrong.

Further, you simply cut off communication because it hurts your pride to
admit your mistakes.

When I make a mistake, I admit it.

---Dan

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/16/90)

In article <6082@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
> Do they? Or do they also follow the convention that I claim is the
> universal: to quote speeds as a function of the number of _keys_ and
> (maybe) also in other terms - but _only_ after _explicitly_ stating
> what those other terms are?

I think that it is worth noting that quoting sorting speed as a function
of the number of bytes to be sorted is not an *arbitrary* criterion;
when we talk about an algorithm being O(f(N)) the argument N is usually
understood as measure of the size of the input, such as the number of
bits in a "reasonable" encoding.  [See for example Garey & Johnson.]

So the convention Dan Bernstein uses to describe MSD radix sort
(linear in the size of the input) is the convention which is usual
*except* for sorting.

How come sorting uses a different convention?  For the simple reason
that most discussions of sorting complexity consider keys as single
symbols.  A typical example is Sedgewick's Acta Informatica article
about quicksort, where the bottom line (that quicksort is quick) was
obtained by plugging into his formulas the assumption that the cost
of comparing a whole key is 1/4 of the cost of an integer assignment.

If we don't assume small keys, then since quicksort does O(K log K) key
(or pointer) moves on the average, K being the number of keys, and since
K distinct keys (or pointers) need O(log K) bits each, the average cost
of quicksort is O(K (log K)**2).

What was the Tarjan et al 1987 article again?  I think it's time I read it.

-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

gsh7w@astsun.astro.Virginia.EDU (Greg Hennessy) (11/16/90)

Maybe this should be moved to alt.pissing.contest?

Tony Warnock writes:
#He[Dan] has stated that "a few bits" are all that are necessary to represent
#value that grows combinatorily.

Dan Bernstein writes:
#No, I have not. You are a liar or a fool. I challenge you to show where
#I have stated that a few bits suffice to represent an exponential
#quantity. If you fail the challenge you are a liar. You will not
#succeed, as I have made no such statement.

Perhaps Dan is engaging in one of his retorical tricks by changing the
word combinatorily to exponential, and then calling Tony a liar.

Perhaps Dand did not expect people to nitice.

#No, I have not. You are a liar or a fool. I have said that Fortran does
#not have separate compilation *as in C*; compilation follows an entirely
#different philosophy in each language. 

Then you should have said the Fortran does not have the same way that
C does. Your revisionism does not hold here. By standard use of the
english language (you do speak english, don't you Dan), your statement
was the fortran does not have seperate compilation. If you want to say
that it has it, but it is different in philosophy, then do so.

--
-Greg Hennessy, University of Virginia
 USPS Mail:     Astronomy Department, Charlottesville, VA 22903-2475 USA
 Internet:      gsh7w@virginia.edu  
 UUCP:		...!uunet!virginia!gsh7w

new@ee.udel.edu (Darren New) (11/17/90)

In article <16709:Nov1113:56:2390@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>Sorting is linear in the number of bytes being sorted. Feel free to
>spout on about implicit log n factors that don't exist; I'll continue to
>sort n-byte files in time between bn and cn seconds for nonzero b and c.
>Models of computation are cute, but I live in the real world.

Actually, for sorting to be linear in the number of bytes sorted, don't
you need some opperation on the keys other than comparison?  For
example, how would you sort (say, in Pascal) an array of floats (reals,
whatever) without using division or multiplication and still have it
take linear time?

How about this:

"I want you to sort this array of encrypted passwords into the same
order as the user numbers they belong to. Since for security reasons I
cannot reveal to your program the user number belonging to each
password, I will provide a function "porder(A,B)" which returns true if
the user number for password A comes before the user number for
password B and false otherwise. Since the user number space is very
sparse, this will not compromise security."

In this case, you will need O(n log n) calls to the porder function,
hence taking O(n log n) time, regardless of how big the keys are. This
is easy to see based on a decision-tree argument. In this case, you
really don't even have the keys (the user number), but only pointers to
keys (the passwords). I would be *very* interested to see some
(pseudo-) code that does such a sort in linear time.
                      -- Darren
-- 
--- Darren New --- Grad Student --- CIS --- Univ. of Delaware ---
----- Network Protocols, Graphics, Programming Languages, 
      Formal Description Techniques (esp. Estelle), Coffee, Amigas -----
              =+=+=+ Let GROPE be an N-tuple where ... +=+=+=

mwm@raven.relay.pa.dec.com (Mike (My Watch Has Windows) Meyer) (11/17/90)

   In article <24945:Nov1218:54:5590@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
   Sorting is linear in the number of bytes being sorted. This is true both
   theoretically and practically.

While obviously true in some situations (but I can do O(1) sorting in
"some" situations), this goes against the grain for the general case.

Rather than arguing theory, I'll just ask for proof. How about making
available sources to either 1) a plug-compatible replacement for
qsort(3) or 2) a replacement for sort(1), so we can try some test
cases and see how they work, and how well they work.

	<mike
--

ttw@lanl.gov (Tony Warnock) (11/17/90)

    Dan has called me a liar and a fool, he may be right about the
    latter as I am taking time to reply to him. With respect to the
    former, I shall only quote Dan's own words. If he says that his
    points are being misunderstood, he should clarify them rather than
    engaging in name calling. Here are a few of the exact quotes made
    with respect to sorting, number of ways of aliasing arguments and
    Fortran compilation.
 
"The number of the partition chosen takes at most a few bits to encode."
                                                        - Dan Bernstein
"Please don't say anything if you don't know what you're talking about."
                                                        - Dan Bernstein
 
 
"Luckily, on the other side of the fence, we programmers still know that
sorting is linear."
                                                        - Dan Bernstein
"Please don't say anything if you don't know what you're talking about."
                                                        - Dan Bernstein
 
 
"Fortran does not have separate compilation like C."
                                                        - Dan Bernstein
"Please don't say anything if you don't know what you're talking about."
                                                        - Dan Bernstein
 
                                                        - Dan Bernstein
"Please don't say anything if you don't know what you're talking about."
                                                        - Dan Bernstein
 
"I thank God for not making me a computer scientist."
                                                        - Dan Bernstein
"Please don't say anything if you don't know what you're talking about."
                                                        - Dan Bernstein

     If the fool would persist in his folly he would become wise.
                                                     - William Blake

dbc@bushido.uucp (Dave Caswell) (11/18/90)

.
.In the very common case that keys are not all distinct, radix sort can
.be asymptotically FASTER than n log n, depending on the amount of
.repetition. 

Shouldn't this be "keys are not all the same"?


-- 
David Caswell                             dbc%bushido.uucp@umich.edu

dbc@bushido.uucp (Dave Caswell) (11/18/90)

In article <6067@lanl.gov> ttw@lanl.gov (Tony Warnock) writes:
.>
.>
.>Dan Bernstein has posted the following advice:
.>"Please don't say anything if you don't know what you're talking about."
.> 
.>He has incorrectly stated the complexity of sorting.
.>Perhaps he could follow his own advice:
.>"Please don't say anything if you don't know what you're talking about."
.> 
.>He has stated that "a few bits" are all that are necessary to represent
.>value that grows combinatorily.
.>Perhaps he could follow his own advice:
.>"Please don't say anything if you don't know what you're talking about."
.> 
.>He has incorrectly stated that Fortran does not have separate compilation.
.>Perhaps he could follow his own advice:
.>"Please don't say anything if you don't know what you're talking about."
.> 
.>
.
I read this group religiously and haven't seen Dan make statements that
would backup the above claims.  I think it's more likely you don't know
the answers.  I often ask people about sorting at interviews, anyone
who just regurgitates a quote like n log n loses.  I don't have a book
that discusses sorting (Knuth etc.) that says the "answer" is n log n.


-- 
David Caswell                             dbc%bushido.uucp@umich.edu

dbc@bushido.uucp (Dave Caswell) (11/18/90)

.Actually, for sorting to be linear in the number of bytes sorted, don't
.you need some opperation on the keys other than comparison?  For

Yes.  For many (most, all) problems that isn't hard.  Dan has consistently
repeatedly, over and over again stated that that wasn't the model he was
using; he just doesn't compare keys.


-- 
David Caswell                             dbc%bushido.uucp@umich.edu

sommar@enea.se (Erland Sommarskog) (11/18/90)

Dave Caswell (dbc@bushido.uucp) writes:
>I read this group religiously and haven't seen Dan make statements that
>would backup the above claims.

The problem seems to be that he hasn't bothered to give any
technical arguments, but just tried to refute people with
statements like "Wrong." He may be right, but is he is certainly
not convincing. Disclaimer: due to mild interest and news
problems, I haven't read all articles on this thread.

>I often ask people about sorting at interviews, anyone
>who just regurgitates a quote like n log n loses.  I don't have a book
>that discusses sorting (Knuth etc.) that says the "answer" is n log n.

I guess you would never employ me. If we forget the recent
discussion I wouldn't have even the faintest idea. I recall
that I was exposed to exchange, bubble and quicksort in
school but whether they are linear, quadtratic or cubic
I don't bother about. (Don't we have rec.games.trivia for
that? :-) Seriously, the days are through when the most
essential in computer science and software engineering was
sorting algorithms. The day I need one, I will consult the
literature. Or simply think of the best way of calling
some built-in sort facility.

Oh, actually I wrote a generic Ada package for the topological
sorting algorithm presented in Knuth's book some years ago.
I seem to recall that this algorithm is linear in difference
to tsort(1) which is quadratic. (Topologcial sorting is a
special-purpose sorting and is not compatible normal sorting.)
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
"Handlar Hanssons halta h|na har haft hosta hela halva h|sten"

rang@cs.wisc.edu (Anton Rang) (11/19/90)

Arrggh.  I know I'm going to regret getting into this thread.

In article <237@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
>One time only, I'll try to make it very simple.  If you do a radix sort
>you make multiple passes over the data.  In each pass you move all of the
>records.  The minimum number of passes needed is log(N) where N is the
>number of records and the base of the logarithm is the bin size.  N*log(N)
>records must be moved.  Is that clear enough?

  Are we assuming that the radix sort uses a fixed amount of auxiliary
storage?  If you sort using pointers, the time would be made up of:

  * The number of passes, which is at most the number of bytes/key;
      (replace 'byte' by a storage unit for your favorite base)
    Times the number of keys, since you're sorting all the records;
    Times the size of a pointer, since you're moving the pointers.

      This is O(bytes/key) * O(number of keys) * size of pointer.

      (If pointers are not fixed-length, this isn't linear; if they
      are, it's linear in the number of bytes.)

  * The time to rearrange records, which is obviously linear in the
    number of bytes.

Am I missing something here?  The original messages aren't on my news
system any more.  Is there an explanation of the algorithm being
discussed published somewhere?

	Anton
   
+---------------------------+------------------+-------------+
| Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison |
+---------------------------+------------------+-------------+

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/20/90)

In article <36568@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes:
> Actually, for sorting to be linear in the number of bytes sorted, don't
> you need some opperation on the keys other than comparison?

Yes, you do. You need some sort of extraction.

> "I want you to sort this array of encrypted passwords into the same
> order as the user numbers they belong to. Since for security reasons I
> cannot reveal to your program the user number belonging to each
> password, I will provide a function "porder(A,B)" which returns true if
> the user number for password A comes before the user number for
> password B and false otherwise. Since the user number space is very
> sparse, this will not compromise security."

Algorithm R solves problem S. Now you say, ``For security reasons,
something that you do in algorithm R is not allowed. Therefore algorithm
R does not solve problem S.'' Yes, security is a problem---so you should
stick R inside the computing base that evaluates porder().

If comparing two keys is significantly less work than transforming them
into a dictionary-ordered form, then heapsort may indeed be faster than
radix sort. But this had better not be true for porder(). Otherwise your
encryption function is insecure.

If it is indeed impossible to transform keys into a dictionary-ordered
form, then radix sort does not apply.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/20/90)

Little technical content. (This is the ``liar or fool'' subthread.)

In article <6177@lanl.gov> ttw@lanl.gov (Tony Warnock) writes:
>     Dan has called me a liar and a fool,

No, only one or the other.

> With respect to the
>     former, I shall only quote Dan's own words.

I observe that none of the quotes (except the one you took entirely out
of context---see below) match your original claims. Hence I will settle,
with all due respect, upon calling you a liar. I also observe that the
quotes match what I said in my initial response, and so you have
contributed absolutely nothing to this discussion.

> "The number of the partition chosen takes at most a few bits to encode."

You and Jim Giles seem to believe that I want to compile infinitely many
copies of my code. As I have said repeatedly (including in the original
presentation of the method), the number of partitions should be limited
to some fixed number. If that number is 20, as I once suggested, then
each partition takes at most 5 bits to encode. If it is 2, in the
all-out strategy, then each partition takes 1 bit to encode.

You accused me of saying that an exponential (or whatever the word was)
quantity could be encoded in a few bits; if the above quote is your only
justification, as I conjectured in the article you're responding to,
then you are a fool. I prefer to think that you're just lying and trying
to weasel your way out of the hole you've dug.

> "Luckily, on the other side of the fence, we programmers still know that
> sorting is linear."

You ripped this out of the context, in which I had already made it clear
that I was talking about time per byte.

> "Fortran does not have separate compilation like C."

Yes. Observe the ``like C.'' You accused me of saying that Fortran does
not have separate compilation. You lied.

>  
>                                                         - Dan Bernstein

Was the blank line intentional there?

> "I thank God for not making me a computer scientist."
>                                                         - Dan Bernstein

And I still do.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/20/90)

In article <MWM.90Nov16144102@raven.relay.pa.dec.com> mwm@raven.relay.pa.dec.com (Mike (My Watch Has Windows) Meyer) writes:
> Rather than arguing theory, I'll just ask for proof. How about making
> available sources to either 1) a plug-compatible replacement for
> qsort(3) or 2) a replacement for sort(1), so we can try some test
> cases and see how they work, and how well they work.

A pleasantly productive attitude. I am finishing up a slightly improved
clone of the sort program to contribute to BSD 4.4. Keith Bostic has
extracted a radixsort() library function from the first version and made
some improvements; I'm just settling copyright issues with him, and then
it'll be distributable under the usual Berkeley terms. I will probably
post the (PD) sort before the end of the year. If you're desperate for a
preliminary copy right now, send me a note.

I don't think I'll ever code under the qsort() interface, as passing a
comparison function to a sort is neither adequate for radix sort nor
reasonably efficient for most sorting problems.

---Dan

smryan@garth.UUCP (Steven Ryan) (11/20/90)

>Sorting is linear in the number of bytes being sorted. This is true both
>theoretically and practically.

What is the theoretical and practical relation between the number of bytes
and the number keys (records). If you have b bytes and n keys, which is true:

	b = kn
or
	b = k n log n
or
	b > k n log n ?
-- 
...!uunet!ingr!apd!smryan                                       Steven Ryan
...!{apple|pyramid}!garth!smryan              2400 Geng Road, Palo Alto, CA

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/20/90)

In article <1990Nov18.025036.2498@bushido.uucp> dbc@bushido.uucp (Dave Caswell) writes:
> .In the very common case that keys are not all distinct, radix sort can
> .be asymptotically FASTER than n log n, depending on the amount of
> .repetition. 
> Shouldn't this be "keys are not all the same"?

Probably. The ``not''s start to get confusing here.

Better is ``keys include duplicates.''

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/20/90)

In article <157@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
> >Sorting is linear in the number of bytes being sorted. This is true both
> >theoretically and practically.
> What is the theoretical and practical relation between the number of bytes
> and the number keys (records). If you have b bytes and n keys, which is true:
> 	b = kn
> 	b = k n log n
> 	b > k n log n ?

None of the above is always true. That's why it's so much more natural
to report sorting times per byte---measures per key are necessarily
imprecise.

If the keys are of fixed length, b is kn. If the keys are all distinct,
b is at least n log_2 n.

---Dan

dc@sci.UUCP (D. C. Sessions) (11/24/90)

In article <RANG.90Nov18150047@nexus.cs.wisc.edu> rang@cs.wisc.edu (Anton Rang) writes:
>Arrggh.  I know I'm going to regret getting into this thread.

  Amen, Brother!  Here's another self-destructive fool.

>Am I missing something here?  The original messages aren't on my news
>system any more.  Is there an explanation of the algorithm being
>discussed published somewhere?
>
>	Anton

  This whole ****ing contest has gone on long enough.  DB apparently
  understands the difference between CS and the brainless fools who often 
  escape from school with CS degrees (or so I gather from correspondence).  
  As it happens, every analysis of algorithms text I have ever seen (says 
  another CS, hopefully not a brainless fool) makes it quite clear what the 
  limits are of the O(n lg n) theorem are.  In particular, it applies only 
  to sorts based on BINARY COMPARISONS.

  The proof is almost trivial.  Since there are N! ways to arrange a list 
  of N distinct keys, any particular (ie, sorted) list must be chosen from 
  N! alternatives; this requires lg(N!) bits of information.  By Sterling's 
  approximation, this is on the order of (n lg n).  A binary comparison 
  sort returns only one bit per comparison, so O(n lg n) comparisons are 
  needed.  QED.

  DB is quite right in that if you have operators which return more than 
  one bit per operation, the process can be expedited by a similar factor.
  If a given operation returns more than lg(n) bits of information, the
  acceleration stops at linear.  Obviously, a radix M index returns lg(M)
  bits.  BTW: note that for some problems binary is the best you can 
  possibly do, since the keys are not necessarily scalar (graph relations 
  come to mind).

  Since this thread has no great relevance to LANGUAGES, I would suggest 
  that everyone who has been clogging the net with it go read a 
  junior-level algorithms text.  After that, if you want to know how good a 
  sort algorithm is, just calculate how much your main loop reduced the 
  entropy of the list in each pass.

-- 
| The above opinions may not be original, but they are mine and mine alone. |
|            "While it may not be for you to complete the task,             |
|                 neither are you free to refrain from it."                 |
+-=-=-    (I wish this _was_ original!)        D. C. Sessions          -=-=-+

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/26/90)

In article <MWM.90Nov16144102@raven.relay.pa.dec.com>,
mwm@raven.relay.pa.dec.com (Mike (My Watch Has Windows) Meyer) writes:
> In article <24945:Nov1218:54:5590@kramden.acf.nyu.edu>
> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>    Sorting is linear in the number of bytes being sorted. This is true both
>    theoretically and practically.

> Rather than arguing theory, I'll just ask for proof. How about making
> available sources to either 1) a plug-compatible replacement for
> qsort(3) or 2) a replacement for sort(1), so we can try some test
> cases and see how they work, and how well they work.

Let me paraphrase this:
    Dan Bernstein:	you can run faster if you don't wear blinkers
    Mike Meyer:		prove it wearing these blinkers

Nobody is denying that the O(N.lg N) bound applies to COMPARISON-BASED
sorting methods.  Bernstein's point is that not all sorting methods are
comparison-based.  The interface of qsort() is a comparison-based
interface:  qsort() is passed a comparison function and is _not_ passed
the kind of information that the linear-expected-time methods I know of
(apparently not quite the same as what Bernstein is talking about) need.

-- 
I am not now and never have been a member of Mensa.		-- Ariadne.

mwm@fenris.relay.pa.dec.com (Mike (My Watch Has Windows) Meyer) (11/27/90)

In article <4371@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:

   Path: bacchus.pa.dec.com!shlump.nac.dec.com!decuk.uvo.dec.com!decuac!haven!udel!wuarchive!zaphod.mps.ohio-state.edu!samsung!munnari.oz.au!goanna!ok
   From: ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe)
   Newsgroups: comp.lang.misc
   Date: 26 Nov 90 09:15:01 GMT
   References: <6913:Nov1008:23:5690@kramden.acf.nyu.edu> <237@smds.UUCP> <MWM.90Nov16144102@raven.relay.pa.dec.com>
   Organization: Comp Sci, RMIT, Melbourne, Australia
   Lines: 25

   In article <MWM.90Nov16144102@raven.relay.pa.dec.com>,
   mwm@raven.relay.pa.dec.com (Mike (My Watch Has Windows) Meyer) writes:
   > In article <24945:Nov1218:54:5590@kramden.acf.nyu.edu>
   > brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
   >    Sorting is linear in the number of bytes being sorted. This is true both
   >    theoretically and practically.

   > Rather than arguing theory, I'll just ask for proof. How about making
   > available sources to either 1) a plug-compatible replacement for
   > qsort(3) or 2) a replacement for sort(1), so we can try some test
   > cases and see how they work, and how well they work.

   Let me paraphrase this:
       Dan Bernstein:	you can run faster if you don't wear blinkers
       Mike Meyer:		prove it wearing these blinkers

Oh, horse pucky. If the sorting method in question can't be turned
into one of the two options I list, then it's basically worthless.
After all, what's left if after you exclude sorting fixed records &
arbitrary strings of bytes with termination bytes?

That's why I gave both options - to allow as much flexibility as
possible. Dan quite politely offered still-under-construction source
for a routine that is to radix sorting what qsort is to comparison
sorting. It would be quite easy to turn that routine into a primitive
version of sort(1), but (as expected) impossible to turn it into
anything that looks like qsort(3).

   Nobody is denying that the O(N.lg N) bound applies to COMPARISON-BASED
   sorting methods.  Bernstein's point is that not all sorting methods are
   comparison-based.  The interface of qsort() is a comparison-based
   interface:  qsort() is passed a comparison function and is _not_ passed
   the kind of information that the linear-expected-time methods I know of
   (apparently not quite the same as what Bernstein is talking about) need.

Yup. So tell me what about sort(1) is comparison based? Not having
seen his code, I asked for things I could use. Just because I don't
think he can implement a qsort interface for a radix sort doesn't mean
he can't. That's a testable interface, so give that as an option. It
only hurts if someone doesn't read the entire paragraph.

As for at the code Dan let me have, it's a very well-designed (and,
for the most part, written) MSD radix sort. As such, it will indeed
sort arbitrary strings with termination bytes in time linear in the
number of bytes to sort. In it's current form, it can't be used to
sort records with fixed-length keys holding arbitrary data. Since the
former are very popular on unix, and the latter are rare, it's a nice
addition to the unix libraries. But it won't replace quicksort.

I do think Dan is wrong for picking on computer scientists. It doesn't
do anything they say can't be done; it just uses a known special case
in a way that is very broadly applicable on Unix.

	<mike
--

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/27/90)

In article <MWM.90Nov26194430@fenris.relay.pa.dec.com> mwm@fenris.relay.pa.dec.com (Mike (My Watch Has Windows) Meyer) writes:
> In article <4371@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>    Let me paraphrase this:
>        Dan Bernstein:	you can run faster if you don't wear blinkers
>        Mike Meyer:		prove it wearing these blinkers
> Oh, horse pucky. If the sorting method in question can't be turned
> into one of the two options I list, then it's basically worthless.

I have to agree with Mike here: if a supposedly general sorting method
can't at least be used to implement sort(1) (efficiently), it's
pointless.

> As for at the code Dan let me have, it's a very well-designed (and,
> for the most part, written) MSD radix sort. As such, it will indeed
> sort arbitrary strings with termination bytes in time linear in the
> number of bytes to sort. In it's current form, it can't be used to
> sort records with fixed-length keys holding arbitrary data.

Well, yeah, I'm still working on it. I'll make the final sort(1) clone
available ASAP.

The right way to apply radix sort (or, in fact, almost any other sort)
to your fixed-length records or other data structures inside a generic
program is as follows:

  1. Extract out of the structures some lexically ordered key. Append to
     each key some index to the original structure, or the entire
     original structure.
  2. Apply radixsort() to the keys.
  3. Sweep through the sorted keys, reordering the structures.

You can use escape characters (255 1 represents 255, 255 2 represents 0)
to handle the limited character set.

---Dan

schow@bcarh185.bnr.ca (Stanley T.H. Chow) (11/30/90)

In article <MWM.90Nov26194430@fenris.relay.pa.dec.com> mwm@fenris.relay.pa.dec.com (Mike (My Watch Has Windows) Meyer) writes:
>As for at the code Dan let me have, it's a very well-designed (and,
>for the most part, written) MSD radix sort. As such, it will indeed
>sort arbitrary strings with termination bytes in time linear in the
>number of bytes to sort. In it's current form, it can't be used to
>sort records with fixed-length keys holding arbitrary data. Since the
>former are very popular on unix, and the latter are rare, it's a nice
>addition to the unix libraries. But it won't replace quicksort.

I don't understand.

You claim Dan's code _does_ sort in linear time, but only for
"arbitrary strings with termination bytes". I take this to mean a
Unix-like file system where each record is null-terminated.

You then claim that code cannot handle "records with fixed-length
keys holding arbitary data". It is unclear why "fixed-length" gets
into the act. Surely it is linear time to add a termination-byte to
each string! So I assume you mean Dan's code does not work for "keys
holding arbitary data" - in other words, a compare routine is needed
to order any two keys. This is of course, a known property that Dan
(and others) specified as a requiremet.

>
>I do think Dan is wrong for picking on computer scientists. It doesn't
>do anything they say can't be done; it just uses a known special case
>in a way that is very broadly applicable on Unix.

I agree Dan is probably tweeking many noses (and was not very gentle
about it), but you just certified Dan to be right!

Stanley Chow        BitNet:  schow@BNR.CA
BNR		    UUCP:    ..!uunet!bnrgate!bcarh185!schow
(613) 763-2831               ..!psuvax1!BNR.CA.bitnet!schow
Me? Represent other people? Don't make them laugh so hard.

thomson@hub.toronto.edu (Brian Thomson) (11/30/90)

In article <6327:Nov2022:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes:
>> Sorry to inject some mathematics into this lovely flamefest, but...
>> The bound on sorting is O(N log N) comparisons because you must be able to
>> generate all of the N! possible permutations of the N inputs.
>
>That bound is only valid if the only permitted operation is comparison.

Since one Turing Award winner has already contributed to this thread,
I will quote another:

   "Note that radix sort [of n numbers], which is sometimes said to be
    linear time, really requires at least n log n steps, if one assumes
    the input numbers have enough digits so that they all can be distinct."

		    S. A. Cook, "An Overview of Computational Complexity"
                        1982 Turing Award Lecture

This is, of course, a restatement of the "each key must be at least log n
digits long" argument propounded by Mark Carroll.  The only part about this
that worries me is that, if it is "fair ball" to consider the length of the
keys for radix sort, then why do we not consider each comparison in the
comparison-based sorts to be log n steps, yielding O(n log^2 n) cost?
Well, the answer is that we do have to, according to another expert:

  "Actually, as N approaches infinity, a better indication of the time
   needed to sort is N (log N)**2, if the keys are distinct, since the
   size of the keys must grow at least as fast as log N; but for practical
   purposes, N never really approaches infinity."

		    D. Knuth, The Art of Computer Programming Vol. 3, p. 5

He makes it clear later that he is talking about comparison-based sorts,
and justifies it by requiring that the sort be based on an abstract
ordering, for which a radix-based method will not work.

It appears clear to me that "linear in the number of bytes" really is the
most accurate characterization of the time complexity of radix sorting,
and that it is not always the same as saying "n log n in the number of keys".
But, if the objective is to compare radix-based and comparison-based
sorting methods, which I believe was what sparked all this in the first
place, we are faced with an apples-and-pomegranates problem.

Complexity measures depend on 1) how you measure the size of the problem,
and 2) how you measure the size of the computation (i.e., which computational
model do you use).   The conclusion I draw from all the smoke and fire
we have seen here recently is that no two netters can agree on what
set of assumptions to make to put comparison and radix sorts on the
same playing field or, to unmix a metaphor, "in the same orchard".
-- 
		    Brian Thomson,	    CSRI Univ. of Toronto
		    utcsri!uthub!thomson, thomson@hub.toronto.edu