wa263@sdcc12.UUCP (bookmark) (09/23/86)
<--- bug snack I know one really good use for a BIG core memory. There is a simple O(n) sorting algorithm... suppose we want to sort n elements with keys ranging in value from 1 to k (k >= n). Just get a big array of k slots, mark them all with some illegal value, then copy each of the items to be sorted into the appropriate (k'th) place in the array (this is called an "address calculation" sort). Then scan the array through and copy out the items in order. The sort phase is O(n), and the scan phase takes about constant+(n*otherconst) time, so the whole thing is O(n). (Obviously, this simple scheme doesn't deal with multiple items having the same key... but there are a variety of simple fixes for that. If the item is big, so you don't want to copy it, you can just copy a pointer to it, or its index in the source array, or whatever. Please... don't bother posting (or mailing) anything about similar quibbles--if you can think of a quibble and its solution in less than five minutes, so can everyone else including me.) Now, this is a great sorting scheme. The reason we worry all the time about neat things like Quicksort, or some of the time about horrors like twelve-tape polyphase merge sorts, is because we often don't have enough memory to do a sort in the simple way detailed above. However, the more core you give me, the more able I'll be to use algorithms like this sorting scheme that trade size for speed-- and the faster my programs will run. bookmark@sdcsvax.ARPA ...!sdcsvax!bookmark
desj@brahms.BERKELEY.EDU (David desJardins) (09/25/86)
In article <600@sdcc12.UUCP> wa263@sdcc12.UUCP (bookmark) writes: > > I know one really good use for a BIG core memory. There is a >simple O(n) sorting algorithm... suppose we want to sort n elements with >keys ranging in value from 1 to k (k >= n). Just get a big array of k >slots, mark them all with some illegal value, then copy each of the items >to be sorted into the appropriate (k'th) place in the array (this is >called an "address calculation" sort). Then scan the array through and >copy out the items in order. The sort phase is O(n), and the scan phase >takes about constant+(n*otherconst) time, so the whole thing is O(n). The 'scan phase' takes ** O(k) ** time, not O(n). In typical sorting applications k >>> n. If k ~ n, then we already have enough memory for the size k table (if we have enough memory to quicksort the items them- selves!). -- David desJardins
jlg@lanl.ARPA (Jim Giles) (09/25/86)
In article <600@sdcc12.UUCP> wa263@sdcc12.UUCP (bookmark) writes: > I know one really good use for a BIG core memory. There is a >simple O(n) sorting algorithm... suppose we want to sort n elements with >keys ranging in value from 1 to k (k >= n). Just get a big array >of k slots, mark them all with some illegal value, then copy >each of the items to be sorted into the appropriate (k'th) place in the >array (this is called an "address calculation" sort). Then scan the array >through and copy out the items in order. The sort phase is O(n), and the >scan phase takes about constant+(n*otherconst) time, so the whole thing is O(n). This algorithm is O(k) not O(n). If you want to sort 10^6 elements (about 2^20) and each element was a 32 bit integer, the algorithm would be 2^12 times as slow as an O(n). Of course you could map the integers into a 20 bit address and then sort the mapped data, but this is hashing. Now you have to worry about collisions, efficiency of the hash function, etc. In general, quicksort would be faster than the direct O(k) sort. J. Giles Los Alamos
ark@alice.UucP (Andrew Koenig) (09/26/86)
> I know one really good use for a BIG core memory. There is a > simple O(n) sorting algorithm... suppose we want to sort n elements with > keys ranging in value from 1 to k (k >= n). Just get a big array > of k slots, mark them all with some illegal value, then copy > each of the items to be sorted into the appropriate (k'th) place in the > array (this is called an "address calculation" sort). Then scan the array > through and copy out the items in order. The sort phase is O(n), and the > scan phase takes about constant+(n*otherconst) time, so the whole thing is O(n). Only trouble is that it's O(k), not O(n), because that's how long it takes to scan all the way through your BIG memory. When k is much larger than n, you can sort in O(n log(k)/R) by doing a radix sort in radix R.
alverson@decwrl.DEC.COM (Robert Alverson) (09/26/86)
Clearly, this algorithm is O(k), but...usually, k is a linear function of n, determined before starting the sort. So it is O(n) also. Bob
jqj@gvax.cs.cornell.edu (J Q Johnson) (09/27/86)
wa263@sdcc12.UUCP proposes using large memories to obtain an O(n) sort by using the obvious open-addressing approach. Although I think his general conclusion is valid: that there are many places where space/ time tradeoffs exist in algorithms, the particular case of open address sorting is not a very good example. Note that your O(n) addressing-sort is successful only if the set is dense. Consider sorting a list of n personal names and addresses. Presumably, n<4e10, since we have a bound based on the number of people in the world, but the data requires, say 60 chars to represent it so you need perhaps 300 bits of address, or 1e60 cells. You scan phase is NOT O(n), but rather O(2^keysize). Conclusion: rating sorting algorithms on their space and time requirements as a function only of n (# of elements to be sorted) is faulty; algorithms should be compared based on looking at their performance bases on both n and keysize. Incidentally, there ARE of course sorting algorithms that do better than O(n*log n) on time performance and don't fall into the keysize trap as deeply. Consider all those radix sorts...
radford@calgary.UUCP (Radford Neal) (09/29/86)
RE: Discussion of bucket sorting at a use for huge memories. Bucket sort is not O(n) in time, nor O(k), but O(nlogn). This is because the time required to decode the addresses for your huge memories is logarithmic in the size of the memory. Radford Neal The University of Calgary
desj@brahms.BERKELEY.EDU (David desJardins) (10/01/86)
In article <403@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes: > >Bucket sort is not O(n) in time, nor O(k), but O(nlogn). This is >because the time required to decode the addresses for your huge >memories is logarithmic in the size of the memory. This doesn't really seem like a fair characterization, because the logarithmic penalty for the actual comparisons/moves/calculations are built into all sorting algorithms. It is just as true that it takes time O(log n) for a quicksort to exchange two entries. You have to specify the units to be used, and the usual choice is single machine operations such as comparison or exchange, which take logarithmic time. -- David desJardins
phil@osiris.UUCP (Philip Kos) (10/03/86)
One thing I don't understand about the address sort.. it doesn't seem like it would be sufficiently general to make a useful utility anyway, because I can't see how it could possibly handle duplicate keys. Am I missing something here? Phil Kos ...!decvax!decuac The Johns Hopkins Hospital > !aplcen!osiris!phil Baltimore, MD ...!allegra!umcp-cs "In the end there's still that song, Comes crying like the wind Down every lonely street that's ever been." - Robert Hunter
johnl@ima.UUCP (John R. Levine) (10/04/86)
In article <403@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes: > >Bucket sort is not O(n) in time, nor O(k), but O(nlogn). This is >because the time required to decode the addresses for your huge >memories is logarithmic in the size of the memory. Not really true -- memory addresses are not decoded by a tree but more typically by broadcasting the address and having all of the memory units look for their own addresses in parallel. More realistically, by the time you take into account caches, pipelined buses, interleaved memory and all the other speed up tricks commonly done in hardware, the address decode time is the least of your problems. -- John R. Levine, Javelin Software Corp., Cambridge MA +1 617 494 1400 { ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.EDU The opinions expressed herein are solely those of a 12-year-old hacker who has broken into my account and not those of any person or organization.