[comp.lang.c] Looking for portable sort algorithm

davek@lakesys.UUCP (Dave Kraft) (09/20/89)

Hi,
I'm looking for a sort algorithm that is portable
between Turbo C 2.0 and Xenix.  Or, if someone
could explain the qsort routine in TC to me I 
would be happy with that too..  (I know, someone's
gonna say RTFM, but, I can't,  because my brother
is 'borrowing' the manual, and is out of town for
a while, so, I don't have it readily available.)
 
Thanks in advance.

Dave

-- 
davek@lakesys.lakesys.com <OR> uunet!marque!lakesys!davek
-----------------------------------------------------------------------------
"You must be starved, old friend.  Come into my apartments and we'll suffer
through a deep breakfast of pure sunlight." -- From "The Mummery by Love Ananda

jnh@ecemwl.ncsu.edu (Joseph N. Hall) (09/20/89)

In article <1102@lakesys.UUCP> davek@lakesys.UUCP (Dave Kraft) writes:
>Hi,
>I'm looking for a sort algorithm that is portable
>between Turbo C 2.0 and Xenix.  Or, if someone ...

Here's one that is portable between many ANSI-fied dialects.  You will have
to un-ANSI-fy the declarations if you want to use it on an older version of
C, and of course <string.h> may be <strings.h>, etc., that type of thing.

It's not a quicksort; it's a Shell sort, which is slower in the long run but
is faster for 1) small amounts of data or 2) well-ordered data.  You probably
won't notice much difference (unless your library's qsort was hand-coded in
assembly language).  People may argue over my choice of h (I use 2^n - 1),
but I won't consider my code defiled regardless of what you do to it.

On my VAXstation it sorts 10,000 integers in 22 seconds.  qsort takes about
6 seconds.  (alas!)  ssort and qsort both take <1 second to sort 1,000
integers.

See Knuth (v.3, p.85) for more details.  The Shell sort is the most
compact of the effective sorting algorithms, and is interesting (I think)
to look at.  Everybody has seen heapsorts and quicksorts, anyway ...
This is a pointerized version that isn't so pretty to the eye, but has
a minimum of multiplications and complex address calculations.

This code is mine, and I (drum roll please) hereby release it into the
public domain.

---cut here---
#include <stdlib.h>
#include <string.h>

typedef int (*compar_t)(const void *, const void *);

void ssort(void *base, size_t nmemb, size_t size, compar_t compar)

{
	char *K, *Ki, *Kj, *last;
	size_t h, hSize;

	K = (char *) malloc(size);
	last = (char *) base + size * nmemb;

	for (h = (nmemb >> 1) - 1; h > 0; h = ((h + 1) >> 1) - 1) {
		hSize = size * h;
		for (Kj = (char *) base + hSize; Kj < last; Kj += size) {
			for (
				memcpy(K, Kj, size), Ki = Kj - hSize;
				((*compar)(K, Ki) < 0) && (Ki >= (char *) base);
				Ki -= hSize
			) {
				memcpy(Ki + hSize, Ki, size);
			}
			memcpy(Ki + hSize, K, size);
		}
	}

	free(K);

}



v   v sssss|| joseph hall                      || 4116 Brewster Drive
 v v s   s || jnh@ecemwl.ncsu.edu (Internet)   || Raleigh, NC  27606
  v   sss  || SP Software/CAD Tool Developer, Mac Hacker and Keyboardist
-----------|| Disclaimer: NCSU may not share my views, but is welcome to.

jeenglis@nunki.usc.edu (Joe English) (09/21/89)

jnh@ecemwl.UUCP (Joseph N. Hall) writes:

>On my VAXstation [the following code] sorts 10,000 integers in 
>22 seconds.  qsort takes about
>6 seconds.  (alas!)  ssort and qsort both take <1 second to sort 1,000
>integers.

[...]
>           for (
>               memcpy(K, Kj, size), Ki = Kj - hSize;
>               ((*compar)(K, Ki) < 0) && (Ki >= (char *) base);
>               Ki -= hSize
>           ) {
>               memcpy(Ki + hSize, Ki, size);
>           }
>           memcpy(Ki + hSize, K, size);
>       }
>   }

I achieved a speedup of > 30x by switching from
memcpy() to pointer swapping in my sort routine.
That's partly because the memcpy() on the system I was
working on was braindead, but I expect that (for large
sorts) the cost of allocating and building an array of
pointers becomes negligible compared to the gain
achieved by not using memcpy() regardless of the
implementation.

As long as we're on the subject of sorting, I was
wondering why the standard sort routine is implemented
as a quick-sort nearly everywhere, given its poor
worst-case behaviour.  Any guesses?

Also, I've found another sorting paradigm that's more
useful than the array sort in several cases:

	start_sort(comparison_function,sizeof_element);

	while ("there are more keys to be sorted")
	    add_key(&key);

	do_sort();

	while (retrieve_key(&key))
	    "do whatever you want with it."

This is especially good for database sorts, when you're reading from
secondary storage and the whole thing won't fit in memory.  The sort
package can retain as many keys as possible in a buffer, then sort and
dump them to a temporary file for later merge-sort if and when the
buffer fills up.


--Joe English

  jeenglis@nunki.usc.edu