ken@rochester.UUCP (02/24/87)
Wow, you've just rediscoved radix sort. It's sorting with *only* binary comparisons that is O(n log n). Ken
firth@sei.cmu.edu.UUCP (02/24/87)
** From: Mail Delivery Subsystem <MAILER-DAEMON@sei.cmu.edu> ** Subject: Returned mail: User unknown The above is my excuse for posting here a long reply that should have gone to the author. Please accept my apologies. In article <814@fmsrl7.UUCP> you write: > > I need information on how to share a discovery with others in >the computer science profession. I have an algorithm which implements >an order(N) sort (ie the search time is linear with respect to the >number of items with a constant overhead). I seem to remember that >someone "proved" that this could not be done and I do not have any >information handy on this. (Back in my college days, my professor >and I did not see eye-to-eye and he threw me out of my data structures >class.) I really need a couple of things: > > 1) Could someone point me to the this "proof" so that I can > refute it. I think I know which part would be invalid but I > would like to make sure. Mail containing the proof would be > great. Well, the proof as I learned it was quite simple, and went thus: You are given an unordered set S of items X, of cardinality N, and are required to order it. You have an ordering function "<" [X,X=>Boolean] which is guaranteed total, transitive and antisymmetric. That is, you may assume "x<y" computable for all x,y in S; you may assume that x>y xor y>x, and you may assume x<y and y<z implies x<z. (A) Now the set N has N! possible arrangements, of which exactly one is correctly sorted. (B) Any sorting engine applied to the set therefore requires at least log(N!) bits of information to do the sort. (C) Each comparison gives exactly one bit of information, the Boolean result. Therefore, you need to make O(log(N!)) comparisons, which is of course O(NlogN) by Stirling's theorem. (D) Assuming each comparison takes a fixed time, there is therefore a time component of order O(NlogN) and no sort algorithm can do better. (Multiprocessing is cheating, but any way the product of processors*runtime is what counts as the cost) Can we defeat this proof? Let's look at the assumptions. (A) Actually, we need a stronger assumption here - that all arrangements are equally likely. Otherwise, we certainly CAN take advantage of partial ordering to speed up the sort. That's one reason why a linear insertion sort is a good tool: even though it is O(N^2) in principle, it takes advantage of any ordered subsets. (B) This one is the information-theoretic equivalent of the second law of thermodynamics, and you'd better believe it! Note that we make no assumptions about the speed of this engine; the cost of the comparisons is what the proof addresses. (C) True by hypothesis. However, there are far better comparison functions than this one. For example, simple subtraction of the arguments gives you more information - it tells you HOW MUCH bigger one is than the other. This information can be used, along with knowledge of the regularity with which values are distributed, to improve the sort. In the limit of a perfectly smooth distribution, a radix distribution sort is O(N). (D) Dubious, to say the least. For example, a hardware "find first bit" instruction performs the equivalent of many comparisons in a fixed time. This could be used as the basis of a shellsort by taking a block of key values, transposing them bitwise, and using a chain of "find first bit" instructions to find the first key greater than the pivot. I'm not saying it would be fast on current machines, but it would be better than O(NlogN). Hope this helps with your first question. Robert Firth Postscript: sloppy presentations of the theorem tend to confuse people on point (C) of the above four - the comparison has to to yield a pure Boolean for the proof to work.
asente@figaro.UUCP (02/24/87)
In article <814@fmsrl7.UUCP> wayne@fmsrl7.UUCP (Michael R. Wayne) writes: > I need information on how to share a discovery with others in >the computer science profession. I have an algorithm which implements >an order(N) sort (ie the search time is linear with respect to the >number of items with a constant overhead). I seem to remember that >someone "proved" that this could not be done and I do not have any >information handy on this. See Knuth, "The Art of Computer Programming," Volume 3, "Sorting and Searching," page 183 ff. To summarize, he gives a proof that in both the best worst case and the average case you can't do better than lg n!, which is approximately equal to n lg n. The conditions are that each permutation of the data is equally likely and that ***the sorting is being done by comparisons***. This is very important. In earlier chapters he gives examples of sorting methods that don't use comparisons and work more quickly; unfortunately these only work for specialized cases. >Additional info: > In order to implement the sort, the key must be completely > specified. This means that the sort is data dependent > Examples of complete key specification: > 9 digits (ie SS#) > 20-80 ASCII characters > Does this invalidate the algorithm? It sounds from this as if you've just rediscovered radix sort. See section 5.2.5 of the above book. If you know what the data you're going to be sorting is you can write a radix sort that runs in O(n) time. Unfortunately you have to rewrite the sort for different data. -paul asente
dave@viper.UUCP (02/25/87)
In article <814@fmsrl7.UUCP> wayne@fmsrl7.UUCP (Michael R. Wayne) writes: > I need information on how to share a discovery with others in >the computer science profession. I have an algorithm which implements >an order(N) sort (ie the search time is linear with respect to the >number of items with a constant overhead). I seem to remember that >someone "proved" that this could not be done and I do not have any >information handy on this. When sorting is done by binary comparisons, it is limited to O(N log(N)). I bet you are using one of the methods that doesn't use comparisons. > 1) Could someone point me to the this "proof" so that I can > refute it. I think I know which part would be invalid but I > would like to make sure. Mail containing the proof would be > great. If you can refute it, then you really have something. It is a pretty simple proof. Perhaps you should go into the perpetual-motion machine business. :-) >Additional info: > In order to implement the sort, the key must be completely > specified. This means that the sort is data dependent > Examples of complete key specification: > 9 digits (ie SS#) > 20-80 ASCII characters > Does this invalidate the algorithm? > Of course, the sort time IS dependent on the size of the key > (After all, how can a 5 character key and a 5000 character > key take the same amount of time to compare? Aha! What you got here is the radix sort I'll bet. Does it work by scanning each key digit in turn (starting at the LSB) so that, after pass X, the file is sorted on the last X digits? That algorithm has been known for a LONG time. The old fashioned mechanical card-sorters used this algorithm. -- | David Messer - Lynx Data Systems If you can't convince | amdahl \ them, confuse them. | ihnp4 --!-- dayton --!viper!dave -- Harry S. Truman | rutgers / \ meccts /
ron@brl-sem.UUCP (02/26/87)
The O(nlog n) lower bound is for sorts in place. RADIX sorts can easily be done in lower time. Think of an old model 82 sorter (for those of you who have been around long enough). It has lots of little bins (memory locations) to place the output data. The runtime is O(Number of Cards * Number of columns in the data). Usually the number of cards >> the sort key size. -Ron
kort@hounx.UUCP (02/26/87)
If you're short on memory, the bubble sort, slow and humble as it is, will get the job done. It uses lots of pair-wise comparisons. If you have memory to burn, the pigeon-hole (or post-office) sort is fast as lightning. This sort doesn't even use the comparison test. Space and time are often interchangeable. As memory becomes cheaper, space-intensive applications become more attractive. Barry Kort
crowl@rochester.UUCP (02/27/87)
In article <1080@hounx.UUCP> kort@hounx.UUCP (B.KORT) writes: >If you're short on memory, the bubble sort, slow and humble as it is, >will get the job done. It uses lots of pair-wise comparisons. The quicksort is an inplace O(n log n) sort while the bubble sort is O(n^2). The quicksort will provide much better performance for the same space. That is unless you are worried about program space. -- Lawrence Crowl 716-275-5766 University of Rochester crowl@rochester.arpa Computer Science Department ...!{allegra,decvax,seismo}!rochester!crowl Rochester, New York, 14627
bhaskar@cvl.UUCP (02/27/87)
In article <25329@rochester.ARPA>, crowl@rochester.ARPA (Lawrence Crowl) writes: > > The quicksort is an inplace O(n log n) sort while the bubble sort is O(n^2). > The quicksort will provide much better performance for the same space. That > is unless you are worried about program space. > -- Quicksort requires atleast O(log n) space ; Bubblesort is an in-place sort. The extra space needed by Quicksort is the space used by the data structure which maintains the subfiles still to be sorted. Also, Quicksort is O(n log n ) only in the average. The worst case is still O(n^2).
bhaskar@cvl.UUCP (02/27/87)
In article <2064@cvl.umd.edu>, bhaskar@cvl.umd.edu (S.K. Bhaskar) writes: > In article <25329@rochester.ARPA>, crowl@rochester.ARPA (Lawrence Crowl) writes: > > > > Quicksort requires atleast O(log n) space ; Bubblesort is an in-place sort. > Of course, the O(log n) is in addition to the O(n) needed for the input !
drw@cullvax.UUCP (02/27/87)
kort@hounx.UUCP (B.KORT) writes: > If you have memory to burn, the pigeon-hole (or post-office) sort is > fast as lightning. This sort doesn't even use the comparison test. Eh? OK, you've got zillions of bins, and putting things in them is real fast. Now, get them back out again, without looking at each of the zillions of bins. (Remember, the goal is to produce a sorted list, not to just have fast access to the items.) Pigeon-hole is fast in the real world because "scan for next bin with contents" is done by the visual hardware of the brain. Dale -- Dale Worley Cullinet Software UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw ARPA: cullvax!drw@eddie.mit.edu
franka@mntgfx.UUCP (02/27/87)
In article <25329@rochester.ARPA> crowl@rochester.UUCP (Lawrence Crowl) writes: >In article <1080@hounx.UUCP> kort@hounx.UUCP (B.KORT) writes: >>If you're short on memory, the bubble sort, slow and humble as it is, >>will get the job done. It uses lots of pair-wise comparisons. > >The quicksort is an inplace O(n log n) sort while the bubble sort is O(n^2). >The quicksort will provide much better performance for the same space. That >is unless you are worried about program space. >-- However, quicksort still uses O(log n) space for the stack of unsorted sub-arrays. Bubble sort only uses O(1) (the variable for the swap). In addition, quicksort does not achieve better performance than bubble sort until about 15 elements are to be sorted. It takes an even larger array (~500 elements) to outdo heapsort. Quicksort's performance on almost sorted arrays is even worse. Remember, quicksort is O(n log n) average case. Worst case it degenerates to O(n**2). The moral of this story is not to make blind statements. When choosing a sorting algorithm you need to 1) take into account the amount of data to be sorted... 2) take into account the distribution of the data to be sorted... 3) have very good luck in predicting the two previous conditions. Frank Adrian Mentor Graphics, Inc.
ark@alice.UUCP (02/28/87)
In article <1080@hounx.UUCP>, kort@hounx.UUCP writes: > If you're short on memory, the bubble sort, slow and humble as it is, > will get the job done. It uses lots of pair-wise comparisons. The Shell sort is much faster and uses no more memory. It is typically O(n**1.4); the bubble sort is O(n**2). It isn't much longer to code, either.
ark@alice.UUCP (02/28/87)
In article <25329@rochester.ARPA>, crowl@rochester.ARPA (Lawrence Crowl) writes: > The quicksort is an inplace O(n log n) sort while the bubble sort is O(n^2). > The quicksort will provide much better performance for the same space. That > is unless you are worried about program space. But quicksort uses O(log n) additional memory, which one cannot always afford. It is also O(n**2) worst case.
markv@uoregon.UUCP (02/28/87)
In article <2064@cvl.umd.edu> bhaskar@cvl.umd.edu (S.K. Bhaskar) writes: >In article <25329@rochester.ARPA>, crowl@rochester.ARPA (Lawrence Crowl) writes: >> >> The quicksort is an inplace O(n log n) sort while the bubble sort is O(n^2). >> The quicksort will provide much better performance for the same space. That >> is unless you are worried about program space. > >Quicksort requires atleast O(log n) space ; Bubblesort is an in-place sort. >The extra space needed by Quicksort is the space used by the data structure >which maintains the subfiles still to be sorted. > >Also, Quicksort is O(n log n ) only in the average. The worst case is still >O(n^2). First of all, O(log n) space is really not than much space, according to my handy dandy calculater... log(1000000) is only 13. Figure roughly four bytes of storage necessary for each recursive call (lower+upper bound, shorts) We have roughly 26 bytes of storage. Not really too bad... Also, the worst case performance isn't too bad, particularly if you are reasonably clever about picking pivots. I have always been partial to an in-situ heap sort, but only because it reminds me of myself cleaning my apartment... :-) Keep the faith. -- Mark VandeWettering University of Oregon Computer and Information Sciences Department markv@uoregon.edu OR markv@uoregon.uucp
jkg@gitpyr.UUCP (02/28/87)
In article <2064@cvl.umd.edu> bhaskar@cvl.umd.edu (S.K. Bhaskar) writes: >Quicksort requires atleast O(log n) space ; Bubblesort is an in-place sort. >Also, Quicksort is O(n log n ) only in the average. The worst case is still >O(n^2). Why not use Heapsort? It is O(nlogn) in both the worst and average cases. It also sorts in place. Jim Greenlee -- The Shadow...!{akgua,allegra,amd,hplabs,ihnp4,seismo,ut-ngp}!gatech!gitpyr!jkg Jryy, abj lbh'ir tbar naq qbar vg! Whfg unq gb xrrc svqqyvat jvgu vg hagvy lbh oebxr vg, qvqa'g lbh?!
jsgray@watrose.UUCP (02/28/87)
In article <2064@cvl.umd.edu> bhaskar@cvl.umd.edu (S.K. Bhaskar) writes: > >Quicksort requires atleast O(log n) space ; Bubblesort is an in-place sort. >The extra space needed by Quicksort is the space used by the data structure >which maintains the subfiles still to be sorted. There is a new result from Europe which shows it is possible to quicksort with no extra space! (And *you* thought you knew all there was to know about quicksort, didn't you! (I did)). The key idea is to mark the subfiles to be sorted by slightly changing the order of already sorted subfiles. I'll post more detailed information and the reference soon. Jan Gray jsgray@watrose University of Waterloo (519) 885-1211 x3870
btb@ncoast.UUCP (03/02/87)
In article <25329@rochester.ARPA> crowl@rochester.UUCP (Lawrence Crowl) writes: >In article <1080@hounx.UUCP> kort@hounx.UUCP (B.KORT) writes: >>If you're short on memory, the bubble sort, slow and humble as it is, >>will get the job done. It uses lots of pair-wise comparisons. > >The quicksort is an inplace O(n log n) sort while the bubble sort is O(n^2). >The quicksort will provide much better performance for the same space. That >is unless you are worried about program space. >-- > Lawrence Crowl 716-275-5766 University of Rochester the one problem with quicksort that has always bothered me is that it is not stable (right?), it doesn't preserve any order that already exists in what you are trying to sort, so that you can't easily use it to sort on multiple keys sequentially... i am certainly no expert on sorts, but for the small kinds of sorts that i have a need for, selection sort is the most practical... yes, it is an O(n^2) sort, but it is much faster than bubble sort because it doesn't do O(n^2) exchanges, only O(n). selection sort is also as easy to remember as bubble sort. -- Brad Banko ...!decvax!cwruecmp!ncoast!btb Cleveland, Ohio "The only thing we have to fear on this planet is man." -- Carl Jung, 1875-1961
dam@uvacs.UUCP (03/03/87)
In article <2064@cvl.umd.edu> bhaskar@cvl.umd.edu (S.K. Bhaskar) writes: >Quicksort requires atleast O(log n) space ; Bubblesort is an in-place sort.... >Also, Quicksort is O(n log n ) only in the average. The worst case is still >O(n^2). Quite true. I suggest using heapsort, which treats the array to be sorted as a d-heap (d == 2 or 3 usually). It's an O(n log n) sort in both average and worst cases, and is as much of an in-place sort as B-argh-le sort. -- From the University of Virginia at Boar's Head, C.S. Department-in-Exile: Dave Montuori (Dr. ZRFQ); DAMONT quandum, DAMONTque futurus. dam@uvacs.cs.virginia.edu or {allegra | cbosgd | ncsu}!uvacs!dam NSA line eater food: terrorist, CIA, drugs, cipher, secret, fnord, FBI, decode.
atbowler@watmath.UUCP (03/04/87)
In article <2064@cvl.umd.edu> bhaskar@cvl.umd.edu (S.K. Bhaskar) writes: >Quicksort requires atleast O(log n) space ; Bubblesort is an in-place sort. >The extra space needed by Quicksort is the space used by the data structure >which maintains the subfiles still to be sorted. > >Also, Quicksort is O(n log n ) only in the average. The worst case is still >O(n^2). While everyone is discussing their favourite sort I might as well remind the world about Shellsort (as described by Knuth with a relatively prime delta sequence, not Shell's original binary one). This has a number of advantages as the incore sort of first choice. 1) It is an in place sort requiring only constant work space. 2) While its worst case is not well is not well understood, it is clearly never as bad as O(n^2). 3) It is simple, and easily implemented. It requires only a few more lines than bubble class sorts. Quicksort by comparison is easily miscoded (witness the famous example in Wirth's book). I have long conjectured that reason everyone is so fond of Quicksort is the analysis for the average case behaviour is such a nice example of the analysis of algorithms. It is a nice elegant piece of mathematics. On the other hand, the last I heard, a proper analysis of Shellsort was still an unsolved problem. It is clear that its worse case behaviour is better than O(n^2), but it is equally clear that its best case is not as good as Quicksort's O(N) and that its average case is not as good as Quicksort's O(N log N) all that really exists is some empirical data suggesting things like O(N (log N)^2). This is not to say that Shellsort is the only choice to make. It's simplicity means that it will usually out perform Quicksort on small to medium problems (the changeover is usually somewhere between 500 and 1000 items), but as the size of the task gets bigger, the fact that it is not a O(N log N) sort means that other sorts such as Quicksort or Heapsort will outperform it. Wheither Quicksort will or not depends on the data. Heapsort whose behavious is always O(N log N) is guaranteed to outperform Shellsort for a sufficiently large problem. However, for a large enough problem one probably is not doing an in memory sort, and then the classical merge type sorts tend to be the winners.
jkg@gitpyr.UUCP (03/04/87)
In article <2701@well.UUCP> physh@well.UUCP (Jon Foreman) writes: > All this talk about the best sort leads me to a question. >Has anyone ever written a program which will unsort a dataset >to the worst case for various types of sorts? I'm sure this question This is easy for Quicksort (which seems to a favorite of netters) - just sort the dataset! Jim Greenlee -- The Shadow...!{akgua,allegra,amd,hplabs,ihnp4,seismo,ut-ngp}!gatech!gitpyr!jkg Jryy, abj lbh'ir tbar naq qbar vg! Whfg unq gb xrrc svqqyvat jvgu vg hagvy lbh oebxr vg, qvqa'g lbh?!
howard@cpocd2.UUCP (Howard A. Landman) (03/05/87)
In article <814@fmsrl7.UUCP> wayne@fmsrl7.UUCP (Michael R. Wayne) writes: > I need information on how to share a discovery with others in >the computer science profession. I have an algorithm which implements >an order(N) sort (ie the search time is linear with respect to the >number of items with a constant overhead). I seem to remember that >someone "proved" that this could not be done and I do not have any >information handy on this. > > 3) Has this problem already been solved? If so, pointers to > this would also be appreciated (since no one I talked to knew > about the solution). > >Additional info: > In order to implement the sort, the key must be completely > specified. This means that the sort is data dependent > Examples of complete key specification: > 9 digits (ie SS#) > 20-80 ASCII characters > Does this invalidate the algorithm? > Of course, the sort time IS dependent on the size of the key > (After all, how can a 5 character key and a 5000 character > key take the same amount of time to compare? > The algorithm need not be very memory intensive. I see no reason > why thousands of records with 200 character keys could not be done > on a Z-80 CP/M machine. Sounds like you've reinvented distribution sort or partitioning sort. Both of these classes of algorithms can appear O(N)*O(K), where K is the length of the key. But guess what? If every element can have a unique key, then the length of the key must grow as O(log N) as N increases. This results in an asymptotic complexity of O(N log N), just as you would expect. This is not to say that these algorithms cannot be useful in practice. In fact, IBM punched-card sorters can be considered as mechanical examples of distribution sort. The real question is: can your algorithm outperform commercial sorting algorithms on commercial workloads? If so, you can make a bundle, because a sizeable fraction of business computing is just sorting large files (over 50% at many sites). -- Howard A. Landman ...!intelca!mipos3!cpocd2!howard
johnw@astroatc.UUCP (03/12/87)
First: In the discussions of "bubble" vs other sorts, I assume (perhapes I'm brain-damaged) that "bubble-sort" referes to and O(n^2) sort, not just the "REAL (or supper-cruddy O(n^2) style) BUBBLE" sort. In article <3178@gitpyr.gatech.EDU> jkg@gitpyr.UUCP (Jim Greenlee) writes: >In article <2701@well.UUCP> physh@well.UUCP (Jon Foreman) writes: >>Has anyone ever written a program which will unsort a dataset >>to the worst case for various types of sorts? > >This is easy for Quicksort (which seems to a favorite of netters) - > just sort the dataset! Sorry, but the worst case for any decent(*) QuickSort is *NOT* just reverse order!! (*) decent here means one that uses the median of three for the pivot! ----------------- Someone made a comment about QuickSort *NOT* maintainig order. I wrote a quick-sort package that *DID* do this. It swapped indexes, not the real data (unless requested -- this generality was required [Fortran: tables in parrallel arrays...]) Anyway, all I had to do was change the compare to check the original INDEX for equal VALUEd elements! It worked just fine! In retrospect, I pro'ly should have started from scratch and researched algorithms, but I was 1: young and foolish at the time, 2: given a very limited implementation as a model to work from. John W - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: John F. Wardale UUCP: ... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw arpa: astroatc!johnw@rsch.wisc.edu snail: 5800 Cottage Gr. Rd. ;;; Madison WI 53716 audio: 608-221-9001 eXt 110 To err is human, to really foul up world news requires the net!