jrbd@craycos.com (James Davies) (11/21/90)
Sorry to inject some mathematics into this lovely flamefest, but... The bound on sorting is O(N log N) comparisons because you must be able to generate all of the N! possible permutations of the N inputs. If each comparison reduces the size of the remaining search space by a factor of K, you must make log (N!) comparisons to choose the right permutation. K N! grows approximately as N**N (Stirling's approximation, I believe it's called). Therefore, log N! grows as N log N. (log(N**N) = log(N*N*N*..*N) = log(N) + log(N) +...+ log(N) = N * log(N)) Radix or bucket sorts may appear to be linear, but in fact they are also N log N in comparisons. (Picking one of K buckets requires a K-way comparison.) The log term may be hidden in the addressing hardware of your memory, and may be hidden further by parallelism, but it's there. This is most evident when you try to make your bucket sort scale up to arbitrary sizes (like, say, sorting a trainload of tapes...) Jim Davies jrbd@craycos.com
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/21/90)
Skip if you know why radix sort does not observe the n log n bound. In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes: > Sorry to inject some mathematics into this lovely flamefest, but... > The bound on sorting is O(N log N) comparisons because you must be able to > generate all of the N! possible permutations of the N inputs. That bound is only valid if the only permitted operation is comparison. > N! grows approximately as N**N (Stirling's approximation, I believe it's > called). No, n does not grow as n^n. Stirling's (first) approximation is that n! is about n^n/e^n times the square root of 2 pi n. > Radix or bucket sorts may appear to be linear, but in fact they are also > N log N in comparisons. (Picking one of K buckets requires a K-way > comparison.) Picking one of K buckets does not require a K-way comparison. Even if it did, that would not be relevant. Radix sort is linear in the number of bytes, which can be o(n log n) if there are many duplicate keys. You said you would ``introduce some mathematics into this lovely flamefest.'' But mathematics is a tautology. Asserting that a false statement is ``in fact'' true does not make it mathematics. [ on the supposed fact that radix sorting is n log n ] > This is most evident when you try to make your bucket sort scale up to > arbitrary sizes (like, say, sorting a trainload of tapes...) The scale factor between sorting 100 of Ritchie's boxcars and just 1 of them is almost exactly 100---internal sorting by far dominates any merging. So for practical purposes (even in such a ridiculously large situation) sorting is linear in the number of bytes. And for theoretical purposes, since any immediately accessible byte of memory is accessible in constant time under any realistic computational model, sorting is also linear in the number of bytes. ---Dan
jrbd@craycos.com (James Davies) (11/22/90)
In article <6327:Nov2022:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Skip if you know why radix sort does not observe the n log n bound. (I'm assuming Dan is the only one who skipped this...) >In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes: >> Sorry to inject some mathematics into this lovely flamefest, but... >> The bound on sorting is O(N log N) comparisons because you must be able to >> generate all of the N! possible permutations of the N inputs. > >That bound is only valid if the only permitted operation is comparison. If you define "pick one of N buckets" as one operation, sorting is indeed linear. I'll maintain that your operation takes O(log N) time to perform. > >> N! grows approximately as N**N (Stirling's approximation, I believe it's >> called). > >No, n does not grow as n^n. Stirling's (first) approximation is that n! >is about n^n/e^n times the square root of 2 pi n. Irrelevant, invalid, etc. (substitute your favorite dismissal word here.) log(N!) still grows as N log N. > >> Radix or bucket sorts may appear to be linear, but in fact they are also >> N log N in comparisons. (Picking one of K buckets requires a K-way >> comparison.) > >Picking one of K buckets does not require a K-way comparison. Even if it >did, that would not be relevant. Radix sort is linear in the number of >bytes, which can be o(n log n) if there are many duplicate keys. I was talking about sorting N items, not N bytes. Picking one of K buckets is indeed equivalent to a K-way comparison. That it appears to be "one" operation is merely an illusion, created mainly by fixed address lengths of real machines and parallel addressing hardware. This is probably your big mental block, Dan - you don't realize what is really going on here. It would be equally valid to assert that sorting takes one "sort" unit, so that sorting can be done in constant time. Memory access times don't remain constant for an arbitrary number of items (any more than sorting times do). And we're talking real asymptotic here, not "real world" asymptotic or whatever your concept is. BTW, what does duplicate keys have to do with it? If all the keys are duplicate, then sorting is truly constant time. This is too realistic a restriction when talking about a theoretical problem. > >The scale factor between sorting 100 of Ritchie's boxcars and just 1 of >them is almost exactly 100---internal sorting by far dominates any >merging. Just as an exercise, why don't you try calculating the merge time? Assume constant access time to the data on the tapes (i.e. no delays for operator intervention), and also assume that your internal memory won't hold more than a few tapes at a time. You might be surprised (but then again, I wouldn't expect you to do something so rigorous - it's easier to wave your hands and say "merging is linear!").
abrodnik@watdragon.waterloo.edu (Andrej Brodnik (Andy)) (11/22/90)
In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes: >The bound on sorting is O(N log N) comparisons because you must be able to >generate all of the N! possible permutations of the N inputs. ... > >... The log term may be hidden in the addressing hardware of >your memory, and may be hidden further by parallelism, but it's there. Well in a prallel version of sorting described by AKS in early 80th they took away the linear part and got O(logN) parallel time algorithm. But the problem in theis construction is an enormous constant hidden in big O. It originates from the construction of expander graphs. But I belive that the algorithm was already described in this discussion. Regards Andrej
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/25/90)
Rehash of previous exposition in more detail. Again, skip if you know why radix sort does not obey the n log n bound. In article <1990Nov21.211000.11359@craycos.com> jrbd@craycos.com (James Davies) writes: > In article <6327:Nov2022:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes: > >> Sorry to inject some mathematics into this lovely flamefest, but... > >> The bound on sorting is O(N log N) comparisons because you must be able to > >> generate all of the N! possible permutations of the N inputs. Under the banner of ``mathematics'' that I wish you would stop flying, you claim here that all (general) sorting methods take time at least cN log N to sort N numbers, where c is some positive number depending on the method. If this is an inaccurate translation of your quoted statement, please let me know. That claim is entirely false. You have not justified it. It is contradicted by all variants of MSD radix sort, which are linear in the number of bits in the inputs; the number of bits can be asymptotically less than cN log N for all c. Please stop insulting mathematics. > >That bound is only valid if the only permitted operation is comparison. > If you define "pick one of N buckets" as one operation, sorting is indeed > linear. I'll maintain that your operation takes O(log N) time to perform. I'm sorry I didn't dismiss this strongly enough in my first response. It is absolutely irrelevant how long it takes to pick one of N buckets. All radix sort uses is picking one of a constant number (you said K at first) of buckets. For definiteness, let's settle upon K as being 256 for all time. It takes constant time, in theory and in practice, to select one of 256 buckets. > >> N! grows approximately as N**N (Stirling's approximation, I believe it's > >> called). > >No, n does not grow as n^n. Stirling's (first) approximation is that n! > >is about n^n/e^n times the square root of 2 pi n. > Irrelevant, invalid, etc. (substitute your favorite dismissal word here.) > log(N!) still grows as N log N. You said you were introducing some mathematics into this discussion. I was correcting your errors, one by one. Here you began a supposed proof that log(n!) grows as n log n; but your first statement, namely that n! grows as n^n, was false. The conclusion is true but you still haven't proven it. (Given that n! is larger than n^n/e^n times the square root of 2 pi n [see Abramowitz & Stegun], log n! is larger than n times the log of n/e. When n is greater than 3, n/e is greater than the square root of n, so log n/e is greater than half log n. Hence log n! is at least half n log n for large n.) > >> Radix or bucket sorts may appear to be linear, but in fact they are also > >> N log N in comparisons. (Picking one of K buckets requires a K-way > >> comparison.) > >Picking one of K buckets does not require a K-way comparison. Even if it > >did, that would not be relevant. Radix sort is linear in the number of > >bytes, which can be o(n log n) if there are many duplicate keys. > I was talking about sorting N items, not N bytes. Yes, and I was following your terminology. As I said before, the number of bytes can be o(n log n). If you attempt to prove your statement that radix sort uses at least cn log n comparisons, you will find an uncrossable gap at this point. > Picking one of K buckets is indeed equivalent to a K-way comparison. > That it appears to be "one" operation is merely an illusion, created > mainly by fixed address lengths of real machines and parallel addressing > hardware. This is probably your big mental block, Dan - you don't realize > what is really going on here. Fine, believe what you want. Even if it took K operations to pick one of K buckets, that would be absolutely irrelevant: K is a *constant*. Make it 2 if you want. > It would be equally valid to assert that sorting takes one "sort" unit, > so that sorting can be done in constant time. I was sticking to your terminology. You claim that radix sort is theta(n log n) where n is the number of keys. (Here theta is the inverse of big O.) That claim is simply false. > And we're talking real asymptotic here, not "real world" > asymptotic or whatever your concept is. I am talking about real asymptotics on a large class of (realistic) computational models. Once you see Knuth make a simply false statement about ``real-world'' asymptotics, you learn to pay attention to those log factors. In this problem there's no log factor. > BTW, what does duplicate keys have to do with it? If all the keys are > duplicate, then sorting is truly constant time. What are you talking about? By definition, a general sorting method cannot know such information about the keys beforehand. Furthermore, nobody is talking about all keys being the same. > >The scale factor between sorting 100 of Ritchie's boxcars and just 1 of > >them is almost exactly 100---internal sorting by far dominates any > >merging. > Just as an exercise, why don't you try calculating the merge time? As a matter of fact, I did calculate it, and I presented the results in my followup to his article. Just as an exercise, why don't you go back and read my followup before you assign any further exercises? > (but then again, I wouldn't expect you to do something so rigorous - > it's easier to wave your hands and say "merging is linear!"). Why the ad hominem attacks? I haven't seen you show any respect for mathematical rigor. Merging is not linear, but it is, even for such ridiculously large problems as Ritchie's trainload of tapes, dominated by sorting. (David Chase has pointed out in e-mail that a distributed solution might tip the scale the other way, but I don't believe any distributed solution will be cost-effective.) ---Dan
peter@ficc.ferranti.com (Peter da Silva) (11/26/90)
In article <29686:Nov2505:33:4290@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > I'm sorry I didn't dismiss this strongly enough in my first response. It > is absolutely irrelevant how long it takes to pick one of N buckets. All > radix sort uses is picking one of a constant number (you said K at > first) of buckets. For definiteness, let's settle upon K as being 256 > for all time. It takes constant time, in theory and in practice, to > select one of 256 buckets. How about picking 2 buckets? Does it still work? In constant time? Would it not be safe to assume that it takes O(log(key length)) time to do something with the bucket once you've found it, assuming that the key length is greater than log2(buckets)? -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
carroll@cis.udel.edu (Mark Carroll) (11/27/90)
In article <6327:Nov2022:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <1990Nov20.202143.10746@craycos.com> jrbd@craycos.com (James Davies) writes: >> Sorry to inject some mathematics into this lovely flamefest, but... >> The bound on sorting is O(N log N) comparisons because you must be able to >> generate all of the N! possible permutations of the N inputs. > >That bound is only valid if the only permitted operation is comparison. > > [ on the supposed fact that radix sorting is n log n ] >> This is most evident when you try to make your bucket sort scale up to >> arbitrary sizes (like, say, sorting a trainload of tapes...) > >The scale factor between sorting 100 of Ritchie's boxcars and just 1 of >them is almost exactly 100---internal sorting by far dominates any >merging. So for practical purposes (even in such a ridiculously large >situation) sorting is linear in the number of bytes. And for theoretical >purposes, since any immediately accessible byte of memory is accessible >in constant time under any realistic computational model, sorting is >also linear in the number of bytes. > Saying that sorting is "linear in the number of bytes" is truly foolish. It's nothing but a deceptive way of saying "Sorting takes time O(n lg n) in the number of keys". Proof: Suppose I have a list of N keys that need to be sorted. Given some way of converting these keys to numbers in constant time, I've got a list of numbers that can be sorted linearly in the number of bytes required to store that list of numbers. How many bytes is that, in terms of the number of keys? Well, the number of bits required to uniquely store a value between 1 and N is O(log n), so the number of bytes required to store those N keys is O(n log n). The time to sort is linear in this, so sorting time is O(n log n). In other words, this entire argument has been zero content. Yes, n log n IS the realistic lower bound on sorting N keys. Yes, N keys can be sorted in time linear to the number of bytes. Both right, both mean EXACTLY the same thing. Incidentally, the best lower bound known, taking advantage of some of the special features of real machines to create a computation model that includes more than just strict comparisons is O(n log n / log log n). >---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/27/90)
Skip unless you have a dogmatic belief in n log n. In article <37256@nigel.ee.udel.edu> carroll@cis.udel.edu (Mark Carroll) writes: > Saying that sorting is "linear in the number of bytes" is truly foolish. > It's nothing but a deceptive way of saying "Sorting takes time O(n lg n) > in the number of keys". Sorting *is* linear in the number of bytes. Sorting is *not* n log n in the number of keys. The latter bound is only true under various restrictions. It's not foolish to say an always true statement rather than a sometimes false statement. > Proof: > Suppose I have a list of N keys that need to be sorted. [ ... ] > so the number of bytes required to store > those N keys is O(n log n). That statement is false. That ``proof'' is not a proof. Consider, for instance, N strings of total length L. I can sort them in time O(L). L can be smaller than cN log N---the average length of the strings might be constant, or asymptotically log log N, or whatever. You claim that this is impossible. You're wrong. > In other words, this entire argument has been zero content. Yes, n log n > IS the realistic lower bound on sorting N keys. No, it is not. > Yes, N keys can be sorted > in time linear to the number of bytes. Both right, both mean EXACTLY the > same thing. No, they do not. The linear bound is precise. The n log n bound is only sometimes correct, and is much less precise. > Incidentally, the best lower bound known, taking advantage of some of the > special features of real machines to create a computation model that > includes more than just strict comparisons is O(n log n / log log n). I was quite amused at this ``new'' result from a couple of years ago (which, btw, I mentioned at the beginning of the sorting thread). If I remember right, the method uses bit manipulations on integers, has a huge constant of proportionality, and would only be used for real problems by the criminally insane. Rather funny how computer science advances backwards. (Oh, and can you explain why your ``proof'' doesn't apply to this method?) ---Dan
peter@ficc.ferranti.com (Peter da Silva) (11/28/90)
In article <20414:Nov2623:37:2390@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > Consider, for instance, N strings of total length L. I can sort them in > time O(L). L can be smaller than cN log N---the average length of the > strings might be constant, or asymptotically log log N, or whatever. ^^^^^^^^ In that case you have a maximum upper bound on the number of unique strings you can handle. If you nave N distinct values, the shortest string that can be used to uniquely identify them is log2(n) bits. All you're doing is deciding to use a constant string length and accepting either rather small limits on your data set size or a rather high constant. How did this thread get started anyway? -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/28/90)
In article <1B779_7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > In article <29686:Nov2505:33:4290@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > > I'm sorry I didn't dismiss this strongly enough in my first response. It > > is absolutely irrelevant how long it takes to pick one of N buckets. All > > radix sort uses is picking one of a constant number (you said K at > > first) of buckets. For definiteness, let's settle upon K as being 256 > > for all time. It takes constant time, in theory and in practice, to > > select one of 256 buckets. > How about picking 2 buckets? Does it still work? In constant time? Yes. In constant time. Why anyone would want to use eight passes with lots of bit manipulation where one much simpler pass would suffice, I don't know. > Would it not be safe to assume that it takes O(log(key length)) time to do > something with the bucket once you've found it, assuming that the key length > is greater than log2(buckets)? Why would you assume something like that? ---Dan
rang@cs.wisc.edu (Anton Rang) (11/28/90)
In article <37256@nigel.ee.udel.edu> carroll@cis.udel.edu (Mark Carroll) writes: >Saying that sorting is "linear in the number of bytes" is truly foolish. >It's nothing but a deceptive way of saying "Sorting takes time O(n lg n) >in the number of keys". > > [ justification removed ] This assumes that all keys are distinct. Anton +---------------------------+------------------+-------------+ | Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison | +---------------------------+------------------+-------------+
peter@ficc.ferranti.com (Peter da Silva) (11/28/90)
In article <4343:Nov2720:36:5790@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > In article <1B779_7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > > In article <29686:Nov2505:33:4290@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > > > I'm sorry I didn't dismiss this strongly enough in my first response. It > > > is absolutely irrelevant how long it takes to pick one of N buckets. All > > > radix sort uses is picking one of a constant number (you said K at > > > first) of buckets. For definiteness, let's settle upon K as being 256 > > > for all time. It takes constant time, in theory and in practice, to > > > select one of 256 buckets. > > How about picking 2 buckets? Does it still work? In constant time? > Yes. In constant time. Why anyone would want to use eight passes with > lots of bit manipulation where one much simpler pass would suffice, I > don't know. OK, you then need to use 8 passes. With a larger sample size you need to use more passes. The number of passes increase as the log of the sample size. So, you need log(n) passes, and each pass takes O(n) time. QED. All you're doing is changing the constant in O(n log n) (or the base of the logarithm, which comes to the same thing). -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
peter@ficc.ferranti.com (Peter da Silva) (11/28/90)
In article <RANG.90Nov27145400@nexus.cs.wisc.edu> rang@cs.wisc.edu (Anton Rang) writes: > In article <37256@nigel.ee.udel.edu> carroll@cis.udel.edu (Mark Carroll) writes: > >Saying that sorting is "linear in the number of bytes" is truly foolish. > >It's nothing but a deceptive way of saying "Sorting takes time O(n lg n) > >in the number of keys". > This assumes that all keys are distinct. If they're not you can take an initial O(n) pass sticking all the equal keys together. Now your problem becomes sorting (n') new keys, which is again O(n' log n'). -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/29/90)
In article <OOV7NC9@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > > > How about picking 2 buckets? Does it still work? In constant time? > > Yes. In constant time. Why anyone would want to use eight passes with > > lots of bit manipulation where one much simpler pass would suffice, I > > don't know. > OK, you then need to use 8 passes. With a larger sample size you need > to use more passes. Wrong. Who said anything about sample size? > The number of passes increase as the log of the > sample size. No, the number of passes does not increase as the log of the sample size. Do you understand how an MSD radix sort works? ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/29/90)
In article <SOV7LG9@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: > In article <RANG.90Nov27145400@nexus.cs.wisc.edu> rang@cs.wisc.edu (Anton Rang) writes: > > In article <37256@nigel.ee.udel.edu> carroll@cis.udel.edu (Mark Carroll) writes: > > >Saying that sorting is "linear in the number of bytes" is truly foolish. > > >It's nothing but a deceptive way of saying "Sorting takes time O(n lg n) > > >in the number of keys". > > This assumes that all keys are distinct. > If they're not you can take an initial O(n) pass sticking all the equal > keys together. How, pray tell, do you plan to do that? The only way I can think of is to sort the keys (which takes time O(b), b the number of bytes). > Now your problem becomes sorting (n') new keys, which is > again O(n' log n'). Ah. I understand. It's heretical to take bytes as the unit of measure, because the bound O(b) on sorting time is too precise, and computer scientists hate precision. Instead, it's better to write sorting in terms of a different quantity---as Jim says, n, the number of keys. Of course, sorting may be faster than n log n, or slower; and n has no simple relation to b in general. But that's still not imprecise enough for you. You want to take n', the number of distinct keys. Now you have an n' log n' bound which is correct even less often than n log n, and besides that n' is a pain to compute. This seems to be typical of progress in computer ``science.'' ---Dan