sbw@naucse.UUCP (09/18/87)
Can anyone point me to published/public domain stable sorting algorithms that are reasonably time and space efficient? I have an NlogN stable sort that eats memory like cupcakes, and a stable sort that requires only N+C memory that barely beats a bubble sort. Every try at something inbetween has resulted in what I consider failure - typically exemplified by no real gain in speed over my slow one. I know that it can be down in NlogN^2 time, and that Macro SPITBOL uses a stable heapsort, but I don't want to spend anymore of my time on it. C code would be a nice way to describe the algorithm. Please Email me any suggestsions. Thanks! Steve Wampler {...!arizona!naucse!sbw}
g-rh@cca.CCA.COM (Richard Harter) (09/18/87)
In article <460@naucse.UUCP> sbw@naucse.UUCP (Steve Wampler) writes: >Can anyone point me to published/public domain stable sorting >algorithms that are reasonably time and space efficient? > >I have an NlogN stable sort that eats memory like cupcakes, and >a stable sort that requires only N+C memory that barely beats >a bubble sort. Every try at something inbetween has resulted >in what I consider failure - typically exemplified by no real >gain in speed over my slow one. I know that it can be down >in NlogN^2 time, and that Macro SPITBOL uses a stable heapsort, >but I don't want to spend anymore of my time on it. > >C code would be a nice way to describe the algorithm. > >Please Email me any suggestsions. Your requirements are vague, and it makes a real difference. Are your talking about sorting fixed length or variable length records? Is the sort to be in place or is it to produce an output array (file) with the input untouched. Should it handle arbitrary lengths? If an address calculation sort is admissable the sort can be done O(N) with O(N) additional memory. If O(log N) additional memory is acceptable and an in place sort for fixed records is desired it can be done stably in O(N log N) time. If record sizes are variable things are tricky. The best single reference that I know of is Knuth's volume III on sorting. I am not entirely happy with it -- there are important techniques that are only lightly touched -- but it is comprehensive and it does give the algorithms. May I suggest that you repost your request with more detail: (a) Is the data in memory or in a file? If the latter are we talking about sorts of arbitrary length data? If so are we talking about disk sorts or tape sorts? (b) Are you using fixed length records, or variable length records? (c) Are memory constraints are (1) constant, (2) log N, or (3) N? (d) Is it a single-key sort or multi-key? (e) Is address calculation admissable or not admissable. [In address calculation sorts you derive a function on the key that gives the approximate address of a record in the sorted file.] (f) Is it fixed key-length or variable key length? -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
wcs@ho95e.ATT.COM (Bill.Stewart) (09/19/87)
In article <460@naucse.UUCP> sbw@naucse.UUCP (Steve Wampler) writes:
:Can anyone point me to published/public domain stable sorting
:algorithms that are reasonably time and space efficient?
:I have [big-memory fast and small-memory slow algorithms]....
If memory serves me correctly, isn't the Shell sort stable?
Check out the volume of Knuth that deals with sorting and
searching. The analysis of run-time is quite interesting,
but basically it's around nlogn or maybe n**1.5, and it's
about as small and simple as bubble-sort.
:C code would be a nice way to describe the algorithm.
Well, you won't find C code in Knuth; there's "English" and
probably MIX (not even ALGOL, which was the politically correct
language to describe algorithms in at the time.) But it's not
that hard to translate into C.
--
# Thanks;
# Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wc(
lmiller@venera.isi.edu (Larry Miller) (09/19/87)
In article <460@naucse.UUCP> sbw@naucse.UUCP (Steve Wampler) writes: >Can anyone point me to published/public domain stable sorting >algorithms that are reasonably time and space efficient? Why is STABLE necessary? Most any NlogN routine can be written to swap pointers rather than the data itself, so a stable routine shouldn't be that important. Are you looking for absolutely fastest routine? But that is very much data dependent. Quick sort is not stable, but Shell sort is, and is almost as fast for reasonable data sizes, so you might want to look at that. Larry Miller (lmiller@venera.isi.edu -- no uucp)
sbw@naucse.UUCP (Steve Wampler) (09/20/87)
It was requested that I provide more information on my needs, so: I want a stable, minimum storage sort that takes exactly the same arguments as qsort (section 3 of the UNIX man pages). It should be reasonably fast (i.e. not N^2). Now, I have some requests of my own. I'm sorry if this offends anyone, but it's rather bothersome to have to wade through the pages of misinformation I've been getting. Please don't respond unless you: (1) Know what 'stable' means in terms of sorts. (2) Know what 'minimum storage' means. (3) Know of a fast algorithm that meets those criteria. None of the following basic sorting techniques meets (1) and/or (2) above: Shell, Quick, Heap. Any sort can be made stable, the question is at what cost. Please don't quote timings from Knuth for me. For that matter, please don't refer me to Knuth. While it is the best reference around for sorting techniques, the discussion of stableness is minimal, being confined to exercises throughout the text. He does give a hint on writing a NlogNlogN minimal storage stable sort, but that's all, a hint. I am well aware of what Knuth and the majority of data structure texts have to say about stable sorts. The only published (and reasonably fast) stable sorting algorithm I know about runs in N^(3/2) times (excuse me, stable *and* requiring minimal storage). I am hoping for something faster. Remember, in my original posting, I stated that I have an NlogN stable sort that isn't minimal storage (it takes N+N storage, as do most of the obvious such beasties). It's rather trivial to modify a quick sort to do that (it's even more trivial to modify a merge sort to be stable - but not to have be minimal in storage). Please don't offer 'solutions' unless you know them to be just that. Thanks.
g-rh@cca.CCA.COM (Richard Harter) (09/20/87)
In article <3603@venera.isi.edu> lmiller@venera.isi.edu.UUCP (Larry Miller) writes: > >Why is STABLE necessary? Most any NlogN routine can be written to >swap pointers rather than the data itself, so a stable routine shouldn't >be that important... STABLE: A sort is stable if all records with the same key are in the same order in the output file as they were in the input file. Stability is essential for sorting on multiple keys. There are many other applications where stability is desirable. Using swap pointers does not resolve the problem [I will leave it as an exercise for the reader to determine why not.] Incidentally, quicksort can be converted into a stable sort at a price in time without marker bits. The trick is to make a first pass to calculate the size of the partitions and load each partition from the front. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
g-rh@cca.CCA.COM (Richard Harter) (09/22/87)
In article <461@naucse.UUCP> sbw@naucse.UUCP (Steve Wampler) writes: > >It was requested that I provide more information on my needs, so: > >I want a stable, minimum storage sort that takes exactly the >same arguments as qsort (section 3 of the UNIX man pages). >It should be reasonably fast (i.e. not N^2). I wish I could help you more. I don't have any references at hand that meet your needs. I could write such a sort, but I won't do it for free, which doesn't help you. The following may be of use: If I'm not missing anything these are the requirments that you want met: (1) Stable (2) At least O(N log^2(N)) in time and preferably better (3) Fixed internal storage usage (O(1) additional storage) (4) Fixed record size specified in input. (5) Comparison function passed through calling sequence. (6) Data in memory, referenced by calling sequence (7) Sort in place. I can think of two ways to meet them, offhand. One is a modified quicksort, sketched out below. The other is a modified mergesort. There is a trick for doing a mergesort in place. I haven't thought it through thoroughly so I will defer that one. Quicksort can be converted into a stable, O(N log N) time, O(1) space sort. The price is added complexity and a larger time factor. The essential elements are (a) altering the partition algorithm to be stable, and (b) forcing the partition element to be the median. Stabilizing the quicksort partition The quicksort partition algorithm is not stable. To make it stable we make a first pass over the data and count the number of elements less than, equal to, and greater than the test value. We now know the boundaries of the three segments. We make a second pass and count the number in each category within each segment. This gives us a total of 9 sub-segments. Each sub-segment is filled from the front, ping-pong style. I.e. there is a separate index into each sub-segment; when we are placing an element we put into the sub-segment corresponding to its segment origin and destination and pop the element contained there-in. This is a stable, O(N) partition. 1) The fact that it takes three passes rather than one does not alter the fact that it is O(N). 2) This is the general partition algorithm for arbitrary test value. 3) Coding this is not trivial. Forcing the median as partition element Quicksort, in recursive form, is O(1) in space for each invocation; however the number of invocations is log N, whereby the frame space ends up being O(log N). When we convert quicksort to nonrecursive form we end up with a stack of begin-end indices of length log(N). We get rid of this stack by forcing the partition element to be the median of the data set. We no longer need the begin-end stack because all begin-end indices may be calculated explicitly. [This is not entirely trivial -- there is some nasty fiddling with odds and evens to do, but it can be done.] We may use the standard 'find-median' algorithm (with the stabilized partition algorithm). This is still O(N) but has a bad worst case. We can improve the worst case by a variant of the median of medians trick (which can't be used directly because of the storage constraint.) Summary: Quicksort can be converted into a stable partition sort which is time O(N log N) on average and additional space O(1). -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
franka@mmintl.UUCP (Frank Adams) (09/22/87)
In article <1719@ho95e.ATT.COM> wcs@ho95e.UUCP (46133-Bill.Stewart,2G218,x0705,) writes: >If memory serves me correctly, isn't the Shell sort stable? No, it isn't. The best O(n log n) stable sort I know of requires O(n) additional memory. -- Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
RMRichardson.PA@Xerox.COM (Rich) (09/22/87)
From: "Bill.Stewart" <wcs@ho95e.ATT.COM> > If memory serves me correctly, isn't the Shell sort stable? > Check out the volume of Knuth that deals with sorting and > searching. The analysis of run-time is quite interesting, > but basically it's around nlogn or maybe n**1.5, and it's > about as small and simple as bubble-sort. No, the Shell sort is not *necessarily* stable. The Shell sort is, in fact, a series of "sort by insertion" passes over the data, each sort being stable. However, the different passes interleave the strings of data such that you cannot guarantee the final result will be stable. Note: the last pass of a Shell sort is a full sort by insertion looking at each item in the list. It is more efficient than normally because the previous passes have put the total list in "almost" sorted order. Rich
wfp@dasys1.UUCP (William Phillips) (09/22/87)
In article <1719@ho95e.ATT.COM>, wcs@ho95e.ATT.COM (Bill.Stewart) writes: > In article <460@naucse.UUCP> sbw@naucse.UUCP (Steve Wampler) writes: > :Can anyone point me to published/public domain stable sorting > :algorithms that are reasonably time and space efficient? > :I have [big-memory fast and small-memory slow algorithms].... > If memory serves me correctly, isn't the Shell sort stable? > .... > > :C code would be a nice way to describe the algorithm. > Well, you won't find C code in Knuth; there's "English" and > probably MIX (not even ALGOL, which was the politically correct > language to describe algorithms in at the time.) But it's not > that hard to translate into C. It's even easier to translate Ratfor (or Pascal) into C, and there is a very workable Shell sort in _Software Tools_ (or _Software Tools in Pascal_) by Kernighan and Plauger. I translated it into assembler for a mini back in 1977, and it was still in use on that system in 1987 (a very durable sort, I think)! The Shell sort is easy to understand (therefore to debug), reasonably fast (not the fastest, but respectable), and about as compact as you could want. -- William Phillips {allegra,philabs,cmcl2}!phri\ Big Electric Cat Public Unix {bellcore,cmcl2}!cucard!dasys1!wfp New York, NY, USA (-: Just say "NO" to OS/2! :-)
rwhite@nusdhub.UUCP (Robert C. White Jr.) (09/23/87)
In article <20029@cca.CCA.COM>, g-rh@cca.CCA.COM (Richard Harter) writes: > STABLE: A sort is stable if all records with the same key are > in the same order in the output file as they were in the input > file. Use something we called duplicity: make an array of pointers to sctuctures.... each structure contains a pointer to a copy of the key value and a pointer to the same type of sctucture [recursive]. read each value into a structure with any duplicates being pushed into a linked list built on the last element of the sctucture [the pointer] pointer-sort the aray by your favorite method. write each aray element in order to the output with any linked list entries printed in linked order. The value must have either refrences to record numbers [disk-to-disk] or aray indices which represent the full data record. This sort list header method is fast and easy to implement by any fast and dirty sort algorythm. the linked lists provide sthe stability. Rob ("so who asked you anyway?") white.
lmiller@venera.isi.edu.UUCP (09/23/87)
In article <20029@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes: >In article <3603@venera.isi.edu> lmiller@venera.isi.edu.UUCP (Larry Miller) writes: >> >>Why is STABLE necessary? Most any NlogN routine can be written to >>swap pointers rather than the data itself, so a stable routine shouldn't >>be that important... > >Stability is essential for sorting on multiple keys. ^^^^^^^^^ Oh come on. Any first year data structures/algorithms student could devise a routine that will sort on multiple keys, and the sort does not have to be stable. Larry
sbw@naucse.UUCP (Steve Wampler) (09/25/87)
In article <3636@venera.isi.edu>, lmiller@venera.isi.edu (Larry Miller) writes: > In article <20029@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes: > > > >Stability is essential for sorting on multiple keys. > ^^^^^^^^^ > Oh come on. Any first year data structures/algorithms student could devise > a routine that will sort on multiple keys, and the sort does not have to be > stable. True. However, without stability the sort must know it is sorting multiple fields (that's way sort(1) on Unix has those options). If it's to be be more general purpose, then so far as I can tell, it must be stable. I would love to be shown wrong on this, as I seem to be incapable of writing a minimal storage stable sort that is fast on large data sets.
btb@ncoast.UUCP (10/03/87)
i happen to think that stability is a very useful property to have (at least as an option) in a sorting algorithm... coming from a vms environment, which wasn't pretty, but at least was humungous enough to provide "LOTS OF COMMAND OPTIONS" particularly with respect to its directory function and sort function, the SORT utility under MSDOS is a big let down... it may be fast, but it doesn't know about multiple sort keys and it IS NOT STABLE which is a pain when you are trying to maintain directories containing 200+ files... the speed isn't so critical (using the sort utility to sort directory listings (dir | sort /+n...), but the lack of stability makes it impossible to chain sorts together to get a stream sorted the way you need it... you might say that this is a mistake that microsoft made, and i might agree with you, but since i am primarily a scientific applications programmer, writing a do it all sort is not within the scope of my priorities or available time (plus, since i must work in fortran, it is not something i would want to do in that language anyway). a few months back, i participated in a similar discussion, and one guy said that he was able to make a small change to his quicksort routine to allow it to sort with stability... anyone have any comments? somebody else commented that shellsort is not necessarily stable, but i don't understand why not, since essentially if you don't swap on equality things should stay in order, shouldn't they? -- Brad Banko Columbus, Ohio (formerly ...!decvax!cwruecmp!ncoast!btb) btb%ncoast@mandrill.cwru.edu "The only thing we have to fear on this planet is man." -- Carl Jung, 1875-1961
franka@mmintl.UUCP (Frank Adams) (10/06/87)
In article <4791@ncoast.UUCP> btb@ncoast.UUCP (Brad Banko) writes: >somebody else commented that shellsort is not necessarily stable, but i >don't understand why not, since essentially if you don't swap on equality >things should stay in order, shouldn't they? The problem is not with equality of the compared elements, but equality of one of them with an intermediate element. Suppose we have a 3 element array, with keys, in order, 2, 1, and 1. We do a Shell sort with an initial increment of 2. The 2 is compared with the final 1, found larger, and swapped. Now the 1's are out of order. -- Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
levy@ttrdc.UUCP (Daniel R. Levy) (10/06/87)
In article <463@naucse.UUCP>, sbw@naucse.UUCP (Steve Wampler) writes: > If it's to be be > more general purpose, then so far as I can tell, it must be stable. I would > love to be shown wrong on this, as I seem to be incapable of writing a minimal > storage stable sort that is fast on large data sets. I'm a hacker, not a CS guru, so I might be full of hooey on this. But, how about using the standard UNIX qsort(3) routine with a compare() function which, when all else is equal, ranks the items according to their original position in the list? Something like struct datum { TYPE item; int position; } ITEMS[NITEMS]; ... /* fill the array with data */ int i; for (i=0; i<NITEMS; i++) ITEMS[i].position = i; int compare(); qsort((char *)ITEMS, NITEMS, sizeof(struct datum), compare); ... compare(a,b) char *a, *b; { int i; i = DIFFERENCE(((struct datum *)a) -> item, ((struct datum *)b) -> item); if (i) /* <0 or >0 */ return i; else return ((struct datum *)a) -> position - ((struct datum *)b) -> position; /* stable */ } The catch of course is that you need to store NITEMS int values, as well as slowing down the swap operations since the datum size is larger than if it comprised only the data of interest. But the order of time and storage in terms of NITEMS does not change. -- |------------Dan Levy------------| Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa, | an Engihacker @ | vax135}!ttrdc!ttrda!levy | AT&T Computer Systems Division | Disclaimer? Huh? What disclaimer??? |--------Skokie, Illinois--------|
sbw@naucse.UUCP (Steve Wampler) (10/07/87)
In article <4791@ncoast.UUCP>, btb@ncoast.UUCP (Brad Banko) writes: > somebody else commented that shellsort is not necessarily stable, but i > don't understand why not, since essentially if you don't swap on equality > things should stay in order, shouldn't they? The problem arises from the fact that in all but the last pass of the Shell sort, you are comparing elements that aren't adjacent. Exchanging two non-adjacent elements runs the risk that you are changing the relative positions of one of them with a equal-keyed element inbetween. For example, consider the following comparison of e1 and e2: 3 1 4 1 5 9 ^ ^ e1 e2 Exchanging them yields: 1 1 4 3 5 9 ^ ^ e2 e1 but you've just changed the relative positions of the two elements keyed with value '1'.