[comp.lang.misc] Sorting bounds

jrbd@craycos.com (James Davies) (11/21/90)

Sorry to inject some mathematics into this lovely flamefest, but...

The bound on sorting is O(N log N) comparisons because you must be able to
generate all of the N! possible permutations of the N inputs.  If each
comparison reduces the size of the remaining search space by a factor of
K, you must make log (N!) comparisons to choose the right permutation.
                    K

N! grows approximately as N**N (Stirling's approximation, I believe it's
called).  Therefore, log N! grows as N log N.  
(log(N**N) = log(N*N*N*..*N) = log(N) + log(N) +...+ log(N) = N * log(N))

Radix or bucket sorts may appear to be linear, but in fact they are also
N log N in comparisons.   (Picking one of K buckets requires a K-way 
comparison.)  The log term may be hidden in the addressing hardware of 
your memory, and may be hidden further by parallelism, but it's there.  
This is most evident when you try to make your bucket sort scale up to 
arbitrary sizes (like, say, sorting a trainload of tapes...)

						Jim Davies
						jrbd@craycos.com

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/21/90)

Skip if you know why radix sort does not observe the n log n bound.

In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes:
> Sorry to inject some mathematics into this lovely flamefest, but...
> The bound on sorting is O(N log N) comparisons because you must be able to
> generate all of the N! possible permutations of the N inputs.

That bound is only valid if the only permitted operation is comparison.

> N! grows approximately as N**N (Stirling's approximation, I believe it's
> called).

No, n does not grow as n^n. Stirling's (first) approximation is that n!
is about n^n/e^n times the square root of 2 pi n.

> Radix or bucket sorts may appear to be linear, but in fact they are also
> N log N in comparisons.   (Picking one of K buckets requires a K-way 
> comparison.)

Picking one of K buckets does not require a K-way comparison. Even if it
did, that would not be relevant. Radix sort is linear in the number of
bytes, which can be o(n log n) if there are many duplicate keys.

You said you would ``introduce some mathematics into this lovely
flamefest.'' But mathematics is a tautology. Asserting that a false
statement is ``in fact'' true does not make it mathematics.

  [ on the supposed fact that radix sorting is n log n ]
> This is most evident when you try to make your bucket sort scale up to 
> arbitrary sizes (like, say, sorting a trainload of tapes...)

The scale factor between sorting 100 of Ritchie's boxcars and just 1 of
them is almost exactly 100---internal sorting by far dominates any
merging. So for practical purposes (even in such a ridiculously large
situation) sorting is linear in the number of bytes. And for theoretical
purposes, since any immediately accessible byte of memory is accessible
in constant time under any realistic computational model, sorting is
also linear in the number of bytes.

---Dan

jrbd@craycos.com (James Davies) (11/22/90)

In article <6327:Nov2022:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>Skip if you know why radix sort does not observe the n log n bound.

(I'm assuming Dan is the only one who skipped this...)

>In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes:
>> Sorry to inject some mathematics into this lovely flamefest, but...
>> The bound on sorting is O(N log N) comparisons because you must be able to
>> generate all of the N! possible permutations of the N inputs.
>
>That bound is only valid if the only permitted operation is comparison.

If you define "pick one of N buckets" as one operation, sorting is indeed
linear.  I'll maintain that your operation takes O(log N) time to perform.

>
>> N! grows approximately as N**N (Stirling's approximation, I believe it's
>> called).
>
>No, n does not grow as n^n. Stirling's (first) approximation is that n!
>is about n^n/e^n times the square root of 2 pi n.

Irrelevant, invalid, etc. (substitute your favorite dismissal word here.)
log(N!) still grows as N log N.

>
>> Radix or bucket sorts may appear to be linear, but in fact they are also
>> N log N in comparisons.   (Picking one of K buckets requires a K-way 
>> comparison.)
>
>Picking one of K buckets does not require a K-way comparison. Even if it
>did, that would not be relevant. Radix sort is linear in the number of
>bytes, which can be o(n log n) if there are many duplicate keys.

I was talking about sorting N items, not N bytes.

Picking one of K buckets is indeed equivalent to a K-way comparison.
That it appears to be "one" operation is merely an illusion, created
mainly by fixed address lengths of real machines and parallel addressing
hardware.  This is probably your big mental block, Dan - you don't realize
what is really going on here.

It would be equally valid to assert that sorting takes one "sort" unit,
so that sorting can be done in constant time.  Memory access times don't
remain constant for an arbitrary number of items (any more than sorting
times do).  And we're talking real asymptotic here, not "real world" 
asymptotic or whatever your concept is.

BTW, what does duplicate keys have to do with it?  If all the keys are
duplicate, then sorting is truly constant time.  This is too realistic a
restriction when talking about a theoretical problem.

>
>The scale factor between sorting 100 of Ritchie's boxcars and just 1 of
>them is almost exactly 100---internal sorting by far dominates any
>merging. 

Just as an exercise, why don't you try calculating the merge time?
Assume constant access time to the data on the tapes (i.e. no delays
for operator intervention), and also assume that your internal memory 
won't hold more than a few tapes at a time.  You might be surprised 
(but then again, I wouldn't expect you to do something so rigorous - 
it's easier to wave your hands and say "merging is linear!").

abrodnik@watdragon.waterloo.edu (Andrej Brodnik (Andy)) (11/22/90)

In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes:
>The bound on sorting is O(N log N) comparisons because you must be able to
>generate all of the N! possible permutations of the N inputs. ... 
>
>... The log term may be hidden in the addressing hardware of 
>your memory, and may be hidden further by parallelism, but it's there.  

Well in a prallel version of sorting described by AKS in early 80th they took
away the linear part and got O(logN) parallel time algorithm. But the problem
in theis construction is an enormous constant hidden in big O. It originates
from the construction of expander graphs. But I belive that the algorithm was
already described in this discussion.

Regards

Andrej

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/25/90)

Rehash of previous exposition in more detail. Again, skip if you know
why radix sort does not obey the n log n bound.

In article <1990Nov21.211000.11359@craycos.com> jrbd@craycos.com (James Davies) writes:
> In article <6327:Nov2022:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes:
> >> Sorry to inject some mathematics into this lovely flamefest, but...
> >> The bound on sorting is O(N log N) comparisons because you must be able to
> >> generate all of the N! possible permutations of the N inputs.

Under the banner of ``mathematics'' that I wish you would stop flying,
you claim here that all (general) sorting methods take time at least cN
log N to sort N numbers, where c is some positive number depending on
the method. If this is an inaccurate translation of your quoted
statement, please let me know.

That claim is entirely false. You have not justified it. It is
contradicted by all variants of MSD radix sort, which are linear in the
number of bits in the inputs; the number of bits can be asymptotically
less than cN log N for all c. Please stop insulting mathematics.

> >That bound is only valid if the only permitted operation is comparison.
> If you define "pick one of N buckets" as one operation, sorting is indeed
> linear.  I'll maintain that your operation takes O(log N) time to perform.

I'm sorry I didn't dismiss this strongly enough in my first response. It
is absolutely irrelevant how long it takes to pick one of N buckets. All
radix sort uses is picking one of a constant number (you said K at
first) of buckets. For definiteness, let's settle upon K as being 256
for all time. It takes constant time, in theory and in practice, to
select one of 256 buckets.

> >> N! grows approximately as N**N (Stirling's approximation, I believe it's
> >> called).
> >No, n does not grow as n^n. Stirling's (first) approximation is that n!
> >is about n^n/e^n times the square root of 2 pi n.
> Irrelevant, invalid, etc. (substitute your favorite dismissal word here.)
> log(N!) still grows as N log N.

You said you were introducing some mathematics into this discussion. I
was correcting your errors, one by one. Here you began a supposed proof
that log(n!) grows as n log n; but your first statement, namely that n!
grows as n^n, was false. The conclusion is true but you still haven't
proven it. (Given that n! is larger than n^n/e^n times the square root
of 2 pi n [see Abramowitz & Stegun], log n! is larger than n times the
log of n/e. When n is greater than 3, n/e is greater than the square
root of n, so log n/e is greater than half log n. Hence log n! is at
least half n log n for large n.)

> >> Radix or bucket sorts may appear to be linear, but in fact they are also
> >> N log N in comparisons.   (Picking one of K buckets requires a K-way 
> >> comparison.)
> >Picking one of K buckets does not require a K-way comparison. Even if it
> >did, that would not be relevant. Radix sort is linear in the number of
> >bytes, which can be o(n log n) if there are many duplicate keys.
> I was talking about sorting N items, not N bytes.

Yes, and I was following your terminology. As I said before, the number
of bytes can be o(n log n). If you attempt to prove your statement that
radix sort uses at least cn log n comparisons, you will find an
uncrossable gap at this point.

> Picking one of K buckets is indeed equivalent to a K-way comparison.
> That it appears to be "one" operation is merely an illusion, created
> mainly by fixed address lengths of real machines and parallel addressing
> hardware.  This is probably your big mental block, Dan - you don't realize
> what is really going on here.

Fine, believe what you want. Even if it took K operations to pick one of
K buckets, that would be absolutely irrelevant: K is a *constant*. Make
it 2 if you want.

> It would be equally valid to assert that sorting takes one "sort" unit,
> so that sorting can be done in constant time.

I was sticking to your terminology. You claim that radix sort is
theta(n log n) where n is the number of keys. (Here theta is the inverse
of big O.) That claim is simply false.

> And we're talking real asymptotic here, not "real world" 
> asymptotic or whatever your concept is.

I am talking about real asymptotics on a large class of (realistic)
computational models. Once you see Knuth make a simply false statement
about ``real-world'' asymptotics, you learn to pay attention to those
log factors. In this problem there's no log factor.

> BTW, what does duplicate keys have to do with it?  If all the keys are
> duplicate, then sorting is truly constant time.

What are you talking about? By definition, a general sorting method
cannot know such information about the keys beforehand. Furthermore,
nobody is talking about all keys being the same.

> >The scale factor between sorting 100 of Ritchie's boxcars and just 1 of
> >them is almost exactly 100---internal sorting by far dominates any
> >merging. 
> Just as an exercise, why don't you try calculating the merge time?

As a matter of fact, I did calculate it, and I presented the results in
my followup to his article. Just as an exercise, why don't you go back
and read my followup before you assign any further exercises?

> (but then again, I wouldn't expect you to do something so rigorous - 
> it's easier to wave your hands and say "merging is linear!").

Why the ad hominem attacks? I haven't seen you show any respect for
mathematical rigor. Merging is not linear, but it is, even for such
ridiculously large problems as Ritchie's trainload of tapes, dominated
by sorting. (David Chase has pointed out in e-mail that a distributed
solution might tip the scale the other way, but I don't believe any
distributed solution will be cost-effective.)

---Dan

peter@ficc.ferranti.com (Peter da Silva) (11/26/90)

In article <29686:Nov2505:33:4290@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> I'm sorry I didn't dismiss this strongly enough in my first response. It
> is absolutely irrelevant how long it takes to pick one of N buckets. All
> radix sort uses is picking one of a constant number (you said K at
> first) of buckets. For definiteness, let's settle upon K as being 256
> for all time. It takes constant time, in theory and in practice, to
> select one of 256 buckets.

How about picking 2 buckets? Does it still work? In constant time?

Would it not be safe to assume that it takes O(log(key length)) time to do
something with the bucket once you've found it, assuming that the key length
is greater than log2(buckets)?
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

carroll@cis.udel.edu (Mark Carroll) (11/27/90)

In article <6327:Nov2022:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes:
>> Sorry to inject some mathematics into this lovely flamefest, but...
>> The bound on sorting is O(N log N) comparisons because you must be able to
>> generate all of the N! possible permutations of the N inputs.
>
>That bound is only valid if the only permitted operation is comparison.
>
>  [ on the supposed fact that radix sorting is n log n ]
>> This is most evident when you try to make your bucket sort scale up to 
>> arbitrary sizes (like, say, sorting a trainload of tapes...)
>
>The scale factor between sorting 100 of Ritchie's boxcars and just 1 of
>them is almost exactly 100---internal sorting by far dominates any
>merging. So for practical purposes (even in such a ridiculously large
>situation) sorting is linear in the number of bytes. And for theoretical
>purposes, since any immediately accessible byte of memory is accessible
>in constant time under any realistic computational model, sorting is
>also linear in the number of bytes.
>

Saying that sorting is "linear in the number of bytes" is truly foolish.
It's nothing but a deceptive way of saying "Sorting takes time O(n lg n)
in the number of keys".

Proof:
  Suppose I have a list of N keys that need to be sorted.
  Given some way of converting these keys to numbers in constant
  time, I've got a list of numbers that can be sorted linearly in
  the number of bytes required to store that list of numbers. 

  How many bytes is that, in terms of the number of keys? Well, the
  number of bits required to uniquely store a value between 1 and N 
  is O(log n), so the number of bytes required to store
  those N keys is O(n log n). The time to sort is linear in this, 
  so sorting time is O(n log n).

In other words, this entire argument has been zero content. Yes, n log n
IS the realistic lower bound on sorting N keys. Yes, N keys can be sorted
in time linear to the number of bytes. Both right, both mean EXACTLY the
same thing.

Incidentally, the best lower bound known, taking advantage of some of the
special features of real machines to create a computation model that
includes more than just strict comparisons is O(n log n / log log n). 

>---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/27/90)

Skip unless you have a dogmatic belief in n log n.

In article <37256@nigel.ee.udel.edu> carroll@cis.udel.edu (Mark Carroll) writes:
> Saying that sorting is "linear in the number of bytes" is truly foolish.
> It's nothing but a deceptive way of saying "Sorting takes time O(n lg n)
> in the number of keys".

Sorting *is* linear in the number of bytes. Sorting is *not* n log n in
the number of keys. The latter bound is only true under various
restrictions. It's not foolish to say an always true statement rather
than a sometimes false statement.

> Proof:
>   Suppose I have a list of N keys that need to be sorted.
  [ ... ]
> so the number of bytes required to store
>   those N keys is O(n log n).

That statement is false. That ``proof'' is not a proof.

Consider, for instance, N strings of total length L. I can sort them in
time O(L). L can be smaller than cN log N---the average length of the
strings might be constant, or asymptotically log log N, or whatever.
You claim that this is impossible. You're wrong.

> In other words, this entire argument has been zero content. Yes, n log n
> IS the realistic lower bound on sorting N keys.

No, it is not.

> Yes, N keys can be sorted
> in time linear to the number of bytes. Both right, both mean EXACTLY the
> same thing.

No, they do not. The linear bound is precise. The n log n bound is only
sometimes correct, and is much less precise.

> Incidentally, the best lower bound known, taking advantage of some of the
> special features of real machines to create a computation model that
> includes more than just strict comparisons is O(n log n / log log n). 

I was quite amused at this ``new'' result from a couple of years ago
(which, btw, I mentioned at the beginning of the sorting thread). If I
remember right, the method uses bit manipulations on integers, has a
huge constant of proportionality, and would only be used for real
problems by the criminally insane. Rather funny how computer science
advances backwards.

(Oh, and can you explain why your ``proof'' doesn't apply to this
method?)

---Dan

peter@ficc.ferranti.com (Peter da Silva) (11/28/90)

In article <20414:Nov2623:37:2390@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> Consider, for instance, N strings of total length L. I can sort them in
> time O(L). L can be smaller than cN log N---the average length of the
> strings might be constant, or asymptotically log log N, or whatever.
                   ^^^^^^^^

In that case you have a maximum upper bound on the number of unique strings
you can handle. If you nave N distinct values, the shortest string that
can be used to uniquely identify them is log2(n) bits.

All you're doing is deciding to use a constant string length and accepting
either rather small limits on your data set size or a rather high constant.

How did this thread get started anyway?
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/28/90)

In article <1B779_7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> In article <29686:Nov2505:33:4290@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> > I'm sorry I didn't dismiss this strongly enough in my first response. It
> > is absolutely irrelevant how long it takes to pick one of N buckets. All
> > radix sort uses is picking one of a constant number (you said K at
> > first) of buckets. For definiteness, let's settle upon K as being 256
> > for all time. It takes constant time, in theory and in practice, to
> > select one of 256 buckets.
> How about picking 2 buckets? Does it still work? In constant time?

Yes. In constant time. Why anyone would want to use eight passes with
lots of bit manipulation where one much simpler pass would suffice, I
don't know.

> Would it not be safe to assume that it takes O(log(key length)) time to do
> something with the bucket once you've found it, assuming that the key length
> is greater than log2(buckets)?

Why would you assume something like that?

---Dan

rang@cs.wisc.edu (Anton Rang) (11/28/90)

In article <37256@nigel.ee.udel.edu> carroll@cis.udel.edu (Mark Carroll) writes:
>Saying that sorting is "linear in the number of bytes" is truly foolish.
>It's nothing but a deceptive way of saying "Sorting takes time O(n lg n)
>in the number of keys".
>
>  [ justification removed ]

  This assumes that all keys are distinct.

	Anton
   
+---------------------------+------------------+-------------+
| Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison |
+---------------------------+------------------+-------------+

peter@ficc.ferranti.com (Peter da Silva) (11/28/90)

In article <4343:Nov2720:36:5790@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> In article <1B779_7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> > In article <29686:Nov2505:33:4290@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> > > I'm sorry I didn't dismiss this strongly enough in my first response. It
> > > is absolutely irrelevant how long it takes to pick one of N buckets. All
> > > radix sort uses is picking one of a constant number (you said K at
> > > first) of buckets. For definiteness, let's settle upon K as being 256
> > > for all time. It takes constant time, in theory and in practice, to
> > > select one of 256 buckets.
> > How about picking 2 buckets? Does it still work? In constant time?

> Yes. In constant time. Why anyone would want to use eight passes with
> lots of bit manipulation where one much simpler pass would suffice, I
> don't know.

OK, you then need to use 8 passes. With a larger sample size you need
to use more passes. The number of passes increase as the log of the
sample size.

So, you need log(n) passes, and each pass takes O(n) time. QED.

All you're doing is changing the constant in O(n log n) (or the base of
the logarithm, which comes to the same thing).
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

peter@ficc.ferranti.com (Peter da Silva) (11/28/90)

In article <RANG.90Nov27145400@nexus.cs.wisc.edu> rang@cs.wisc.edu (Anton Rang) writes:
> In article <37256@nigel.ee.udel.edu> carroll@cis.udel.edu (Mark Carroll) writes:
> >Saying that sorting is "linear in the number of bytes" is truly foolish.
> >It's nothing but a deceptive way of saying "Sorting takes time O(n lg n)
> >in the number of keys".

>   This assumes that all keys are distinct.

If they're not you can take an initial O(n) pass sticking all the equal
keys together. Now your problem becomes sorting (n') new keys, which is
again O(n' log n').
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/29/90)

In article <OOV7NC9@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> > > How about picking 2 buckets? Does it still work? In constant time?
> > Yes. In constant time. Why anyone would want to use eight passes with
> > lots of bit manipulation where one much simpler pass would suffice, I
> > don't know.
> OK, you then need to use 8 passes. With a larger sample size you need
> to use more passes.

Wrong. Who said anything about sample size?

> The number of passes increase as the log of the
> sample size.

No, the number of passes does not increase as the log of the sample
size. Do you understand how an MSD radix sort works?

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/29/90)

In article <SOV7LG9@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> In article <RANG.90Nov27145400@nexus.cs.wisc.edu> rang@cs.wisc.edu (Anton Rang) writes:
> > In article <37256@nigel.ee.udel.edu> carroll@cis.udel.edu (Mark Carroll) writes:
> > >Saying that sorting is "linear in the number of bytes" is truly foolish.
> > >It's nothing but a deceptive way of saying "Sorting takes time O(n lg n)
> > >in the number of keys".
> >   This assumes that all keys are distinct.
> If they're not you can take an initial O(n) pass sticking all the equal
> keys together.

How, pray tell, do you plan to do that? The only way I can think of is
to sort the keys (which takes time O(b), b the number of bytes).

> Now your problem becomes sorting (n') new keys, which is
> again O(n' log n').

Ah. I understand. It's heretical to take bytes as the unit of measure,
because the bound O(b) on sorting time is too precise, and computer
scientists hate precision. Instead, it's better to write sorting in
terms of a different quantity---as Jim says, n, the number of keys. Of
course, sorting may be faster than n log n, or slower; and n has no
simple relation to b in general. But that's still not imprecise enough
for you. You want to take n', the number of distinct keys. Now you have
an n' log n' bound which is correct even less often than n log n, and
besides that n' is a pain to compute.

This seems to be typical of progress in computer ``science.''

---Dan