[comp.sources.wanted] Looking for a better sort

staylor@pmdvax.UUCP (Scott G. Taylor) (10/09/89)

I am looking for a better sort(1) which will exploit free memory
to increase speed.  Actually, I'm looking for one that will do
wnything to increase speed.  My application involves sorting
many (>500,000) records.  Has anyone written a sort which hashes 
or otherwise speeds execution?

					-Scott

-- 
Scott G. Taylor                                    Pmd Resources  (818) 991-0068
{wlbr,mahendo}!snidely!staylor                     31230 Cedar Valley Dr.
                                                   Westlake Village, CA  91361
		        "Vienoti Latvijai!"

ok@cs.mu.oz.au (Richard O'Keefe) (10/09/89)

In article <110@pmdvax.UUCP>, staylor@pmdvax.UUCP (Scott G. Taylor) writes:
> I am looking for a better sort(1) which will exploit free memory
> to increase speed.

Which sort(1) are you using?  The System V sort(1) has an option
	-y$MEMORY_TO_USE_IN_KILOBYTES
I recently posted a small fast sort (called nsort) to comp.sources.misc
-- somewhere along the way it lost a "}"; patch coming --
which slurps the entire file into memory, sorts it, and writes it out;
it is up to twice as fast as sort(1) even with the -y option because
(a) it is using strcmp() rather than a generic key comparison routine
(b) it uses a list merge rather than the misleading misnamed "quick"sort.

A device you might use is to break your file into chunks, read them in,
and sort them using a carefully crafted key comparison routine (nsort is
a good place to start), then use sort(1) just to merge the chunks.

davidc@vlsisj.VLSI.COM (David Chapman) (10/11/89)

In article <110@pmdvax.UUCP> staylor@pmdvax.UUCP (Scott G. Taylor) writes:
>I am looking for a better sort(1) which will exploit free memory
>to increase speed.  Actually, I'm looking for one that will do
>wnything to increase speed.  My application involves sorting
>many (>500,000) records.  Has anyone written a sort which hashes 
>or otherwise speeds execution?

Try Knuth's Replacement Selection Sort, Vol. 3 p. 256.  It's the algorithm
we use to sort data (millions of records).  It works particularly well when
the time spent in comparing keys is small relative to I/O time.  When this
happens it sorts in almost linear time.

Warning: it is not an in-place sort.  You need enough disk space to hold
a temporary file the size of the data file, and possibly room for a third 
copy if you don't want to stomp on the original.

Sorry, I can't send you source code but if you have questions on how it
works I'll be glad to answer them.
-- 
		David Chapman

{known world}!decwrl!vlsisj!fndry!davidc
vlsisj!fndry!davidc@decwrl.dec.com