jlg@lanl.gov (Jim Giles) (11/16/90)
A certain other contributor to this newsgroup claims to have _never_ asserted that sorting could be done in linear time _without_ mentioning it was linear in the number of bytes and not the number of keys. This was the _original_ remark that I referred to when I mentioned the issue to begin with: > (Actually, you might speed the _test_ up by sorting the addresses of all > potentially aliased objects and then moving through the sorted list and > comparing adjacent items. This is still _at_least_ N*log(N). Great, now you don't even know how to sort a collection of numbers in linear time. This was a direct quote from an email message dated 23 Jul, 1990. The person who sent the message challenged me to find any quote that he made of the above variety. I took that as implicit permission to quote private mail communications over the net. As usual, I don't expect _any_ apology (or indeed, any response at all) from the other party to this travesty. _I_ apologise to all the other readers of the net for ever having responded to the person in question at all. J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)
The saga continues: Jim finally finds a reference for Dan's supposed statement that sorting is linear time (though it was in mail, not news). But he fails to notice the word ``numbers''! Fans, keep the word ``numbers'' in your head. Remember ``numbers.'' Remember that machine addresses are fixed length (typically length 4). Remember that sorting numbers is not the same as sorting in general. Dan points out Jim's mistake in excruciating detail and flames Jim for it. Life goes on. In article <6097@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > A certain other contributor to this newsgroup claims to have _never_ > asserted that sorting could be done in linear time _without_ mentioning > it was linear in the number of bytes and not the number of keys. Not in news, anyway. But I'll accept your supposed example, then flame you to death for your mistake. > > (Actually, you might speed the _test_ up by sorting the addresses of all > > potentially aliased objects and then moving through the sorted list and > > comparing adjacent items. This is still _at_least_ N*log(N). > Great, now you don't even know how to sort a collection of numbers in > linear time. Notice, everybody, the word ``numbers'' there. ``Sort a collection of numbers.'' Notice that both Jim and I are talking about machine addresses, which are numbers typically of length 2 or 4 bytes. > This was a direct quote from an email message dated 23 Jul, 1990. Correct. > I took that as implicit permission to > quote private mail communications over the net. No problem. Fair use is rarely a problem in quoting short sections of e-mail. Libel is. > As usual, I don't expect _any_ apology (or indeed, any response at all) > from the other party to this travesty. I will not apologize, because you have not demonstrated me wrong. Please, somebody with some intelligence, confirm that the word ``numbers'' does indeed appear in the above description? That I was talking about sorting numbers, not about sorting in general? That my statement is correct, and that sorting (fixed-length, as in machine) numbers is, indeed, a linear problem? Somebody who's been paying attention to this discussion, please confirm that Jim has accused me of saying that sorting is linear time? Please express an opinion on whether the above quote---which has to do with sorting a collection of numbers, specifically machine addresses---is or is not applicable to Jim's accusation that I have said sorting is linear time. Jim, I expect an apology from you. What you have just done in this discussion is an example of what I mean by perversion. You take an issue (whether my statements about sorting have or have not been correct). You then change the issue into something else. By failing to quote enough of previous articles, you introduce enough discontinuity that some readers don't notice your perversion. And you continue to pervert what was said, in order to cover your wounded ass. I have not said that sorting is linear without qualification. I believe that in news I have not said that sorting is linear without adding the specific phrase ``in the number of bytes,'' though I am not so sure about this. I am sure that my perceptions of radix sorting are correct. I see no explanation for Jim's article but intentional perversion. ---Dan
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/16/90)
In article <6097@lanl.gov>, jlg@lanl.gov (Jim Giles) writes: >G> (Actually, you might speed the _test_ up by sorting the addresses of all >G> potentially aliased objects and then moving through the sorted list and >G> comparing adjacent items. This is still _at_least_ N*log(N). X> Great, now you don't even know how to sort a collection of numbers in X> linear time. In all fairness, the quote from X here explicitly says "numbers", G having said "addresses", and we may reasonably take it that this means hardware addresses and hardware numbers. (I understood X's claim not to have omitted the qualification "in the number of bytes" to refer to generic sorts that didn't know they had numbers to work with. Evidently I misunderstood.) There is a new book Introduction to Algorithms Thomas H. Cormen, Charles E. Leiserson, & Ronald L. Rivest MIT Press, 1990 ISBN 0-262-03141-8 (MIT Press, world-wide) ISBN 0-07-013143-0 (McGraw-Hill, USA) I like it a lot. Section 9.4 says on p180 Bucket sort runs in linear time on the average. ... bucket sort assumes that that the input is generated by a random process that distributes elements uniformly over the interval [0,1). They explicitly use the formula O(n), where n is the number of keys. Clearly addresses aren't random floating-point numbers, but (ptr - ptrmin)/scale where double scale = ptrmax - ptrmin; _is_ in the interval [0,1) and might be close enough to random. This is not a defence of X. I'm just pointing out to anyone who might have missed it that there _is_ a method described in the literature as sorting numbers in time linear in the number of keys. -- The problem about real life is that moving one's knight to QB3 may always be replied to with a lob across the net. --Alasdair Macintyre.
oz@yunexus.yorku.ca (Ozan Yigit) (11/17/90)
In article <4298@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: >There is a new book > Introduction to Algorithms > Thomas H. Cormen, Charles E. Leiserson, & Ronald L. Rivest > MIT Press, 1990 > ISBN 0-262-03141-8 (MIT Press, world-wide) > ISBN 0-07-013143-0 (McGraw-Hill, USA) >I like it a lot. ... Yes, I think it is a very nice book also. Kinda big, heavy and expensive but well worth it. The title of Chapter 9 is "Sorting in Linear Time". Last paragraph of the intro reads: Sections 9.2, 9.3 and 9.4 examine three sorting algorithms -- counting sort, radix sort, and bucket sort -- that run in linear time. Needless to say, these algorithms use operations other than comparisons to determine the sorted order. Consequently, the O(n log n) lower bound does not apply to them. I think Dan does seem to know what he is talking about, regardless of his overall level of (ahem) obnoxiousness. :-) oz --- The king: If there's no meaning internet: oz@nexus.yorku.ca in it, that saves a world of trouble usenet: ....!utai!yunexus!oz you know, as we needn't try to find any. bitnet: oz@[yulibra|yuyetti] Lewis Carroll (Alice in Wonderland) Phonet: +1 416 7365257x33976
rh@smds.UUCP (Richard Harter) (11/17/90)
In article <4298@goanna.cs.rmit.oz.au>, ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: > I like it a lot. Section 9.4 says on p180 > Bucket sort runs in linear time on the average. ... > bucket sort assumes that that the input is generated by > a random process that distributes elements uniformly > over the interval [0,1). > They explicitly use the formula O(n), where n is the number of keys. The good news is that you don't need uniformity - almost any relatively smooth distribution will do. The bad news is that many real life sorting problems are distinctly not smooth. It should also be noted that bucket sorts, distribution sorts, math sorts, et. al. are linear only within a range. [If the file size is large enough the implicit assumptions about short data access times fail.] -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.
dbc@bushido.uucp (Dave Caswell) (11/18/90)
. .This is not a defence of X. I'm just pointing out to anyone who .might have missed it that there _is_ a method described in the .literature as sorting numbers in time linear in the number of keys. There isn't anything special about that book; many books have sections about linear time sorts. What's scary is that some people don't seem to know it. I feel some justification for my practice of discussing sorting complexity during job interviews. -- David Caswell dbc%bushido.uucp@umich.edu
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/20/90)
In article <246@smds.UUCP> rh@smds.UUCP (Richard Harter) writes: > The good news is that you don't need uniformity - almost any relatively > smooth distribution will do. The bad news is that many real life > sorting problems are distinctly not smooth. The point of what I call ``adaptive distribution sort'' is that MSD radix sorting can take advantage of all the same tricks as adaptive quadrature. But any MSD radix sort is linear in the number of bytes. > It should also be noted that bucket sorts, distribution sorts, math sorts, > et. al. are linear only within a range. [If the file size is large enough > the implicit assumptions about short data access times fail.] All immediately accessible storage, whether internal or external, is accessible within a fixed time. Any computational model that accurately reflects this fact will admit sorting linear in the number of bytes. In practice the large constant factor makes record movement desirable, so a properly designed merge-radix sort is one of the best choices for external data sets. ---Dan
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/26/90)
In article <246@smds.UUCP>, rh@smds.UUCP (Richard Harter) writes: > It should also be noted that bucket sorts, distribution sorts, math sorts, > et. al. are linear only within a range. [If the file size is large enough > the implicit assumptions about short data access times fail.] This applies to *all* "in-core" sorting methods. It's not that the sorting method breaks down, but that the "array access is constant time" assumption breaks down. That also applies to things like matrix multiplication! One can use samples (as one does in linear-time selection algorithms) to partition the collection into manageable pieces. -- I am not now and never have been a member of Mensa. -- Ariadne.