[comp.lang.c] Thanks everyone for help ON SORTING!

rcoahk@chudich.co.rmit.oz (Alvaro Hui Kau) (03/28/91)

Hi,
a few days ago, I send in a request asking how to
implement a sort which can handle everything..

I got many response...
They all are very helpful!
Thanks!

Here are all the response I got...
IF I MISS ANYONE, SORRY...

*****************************************************
From: bhoughto@pima.intel.com (Blair P. Houghton)

It can't be done.  You can always define a type of input
for which "sorting" isn't defined.  For example, strings in
unix are usually sorted according to ASCII collating
sequence, so that A-Z appear before a-z; this is because,
represented as integers in the machine, this is the actual
order in which they appear.  The sort(1) program needs a
flag (usually `-f') to tell it when to sort according to
linguistic rules rather than arbitrary computer rules.

What you seem to want, however, is a program to sort any
of the basic types available in C.  This is easy.  Define
a union in which there is a tag for every one of the type
specifiers you might use:

	union c_basic_types {
	    float f;
	    double d;
	    int i;
	    ...
	};

and a set of constants you can use to flag the function:

#define SORT_FLOAT	1
#define SORT_DOUBLE	2
#define SORT_INT	3
...

and implement a switch to use the proper union tag when
the flag is passed

union c_basic_types
max( union c_basic_types x, union c_basic_types y, int sort_type )
{
    switch ( sort_type ) {
	case SORT_FLOAT:
	    return ( x.f > y.f ) ? x : y ;
	    break;
	case SORT_DOUBLE:
	    return ( x.d > y.d ) ? x : y ;
	    break;
	case SORT_INT:
	    return ( x.i > y.i ) ? x : y ;
	    break;
	...
	default:
	    /* some sort of error message for a type that isn't recognized */
	    exit(-1);
	    break;
    }
}

This is just the max() function; with an array or list of
unions you can implement a full sort.  Look into the
library function qsort(3). You can use this max function in
qsort if you modify it to use a global variable to pass the
sort_type and you have it return 1 when the value in x is
larger and 0 when the value in y is larger

main.c:
    extern int max( union c_basic_types x, c_basic_types y ); /* prototype */
    int sort_type;

    main()
    {
	union c_basic_types a[SIZE];
	extern int sort_type;

	.../* code to fill the array */...

	.../* code to set sort_type to some type */...

	qsort( a, SIZE, sizeof(a[0]), max );

	.../* code to use the sorted array */...
    }

max.c:
    int max( union c_basic_types x, c_basic_types y )
    {
	extern int sort_type;

	switch ( sort_type ) {
	    case SORT_FLOAT:
		return ( x.f > y.f );
		break;
	    case SORT_DOUBLE:
		return ( x.d > y.d );
		break;
	    case SORT_INT:
		return ( x.i > y.i );
		break;
	    ...
	    default:
		/* unrecognized type; print error message and die */
		fputs("Mickle foo and yer mama, too!",stderr);
		exit(-1);
		break;
	}
    }

The basic point here is you can't sort a type for which
you haven't defined the ordering method.  This is the
reason qsort(3) requires such a strange setup to do
its job.

The best I can do is answer your question and tell you _how_
to write it.

				--Blair
				  "It's sorting those damnable
				   '...' types that'll get
				   you every time..."

From: neufeld@aurora.physics.utoronto.ca (Christopher Neufeld)
   Huh? I may be relatively new to C but isn't the stdlib function
qsort() an example of a program which does just that? You pass it the
start of the array of elements to be sorted, the number of elements, and
the length of each elements, and a pointer to a user-defined comparator
function which is passed pointers to two elements, and returns an
integer value. This user-written function defines what the caller means
by "greater than/equal/less than", so any time you can conceive of a
reasonable "sort" order, you can write a function which returns '1' if
the contents of the first pointer are greater than those of the second,
'0' if they're equal, and '-1' if the first points to an elements less
than that to which the second points.
   Even if 'qsort()' is not right for your application, you can
implement any other sort program you like, with a function call to
establish the relative order of two elements.
   If there is a type of input for which "sorting" isn't defined, then
by definition you don't want to sort it. If you have your own definition
of a sort on the elements, you can sort it with that definition, even if
nobody else agrees with you on your rule. You needn't even require that
all the elements have the same type, you could sort a mixture of float
array elements, structures of addresses and phone numbers, and pointers
to functions, if you have some way to decide when one of those is
greater than another of them. For this, qsort() wouldn't work because it
accepts elements only of uniform size in contiguous memory. Your own
sort routine could do it, though.

   To answer the original question, I'd pull out any reference book on
sorting, for example a heap sort, and when you get to the line(s)
reading something along the lines of:
    if (*(a+i) < *(a+j)) {   SWAP(i,j)   }
replace that/those line(s) with:
    if (compar(a+i,a+j)) {   SWAP(i,j)   }
where compar() is a function you write to return non-zero if
*(a+i) < *(a+j)

You can modify this as necessary to check for equality and greater-than
conditions.

	qsort(0 only implements the *sort* alogrithim.  It depends upon calls
	to a user suplied routine (strcmp for example) which can compare two
	objects of the type to be sorted and return the results of the comparison
	to it.

	I would referr you to an excelent article on the subjetc in C users 
	magizine about sxi months ago.  If you can't find it mail me & I will
	look up the issue for you.



-- 
Stan Brown	P. c. Design 	404-363-2303	Ataant Ga.
(emory|gatech|uunet) rsiatl!sdba!stan				"vi forever"

   Recently, (friday, that is) I finished my first program that sort using
qsort.  The problem with sorting ANYTHING with qsort is that it wants an
array.  I had to sort records in a file.  Another problem is that qsort
takes only an unsigned int as the number of arguements and the files I had
to sort could easily be multiple megs (a record was only 14 bytes).     Now
all of this was solved easily using multiple qsorts on records read into
a huge array and then merging the qsorts into the final file. But qsort is
not an end all for sorting ANYTHING by any stretch of the imagination. 
qsort is a big help on almost all fixed size elements.
                        Your friendly Windows developer,
                            Jerry.
----
ProLine:  jludwig@pro-grouch
Internet: jludwig@pro-grouch.cts.com
UUCP:     crash!pro-grouch!jludwig
ARPA:     crash!pro-grouch!jludwig@nosc.mil


	Actually, it is a project. What I wanted is to reproduce the
	capability of qsort().

	To be specific, do you know HOW qsort() achieve the ability to
	sort everything?

Don't Kernighan & Ritchie give a sample implementation?
Basically, qsort() can sort any kind of data because qsort()
never knows what it is sorting.

qsort() was originally specified to be an implementation of
"quicker-sort", which is a variant of "quick-sort".  (Both of
them are bad names, as merge-sort is substantially faster than either.)
As you probably know (and if you didn't, you can find out in almost
any book on algorithms, such as Sedgewick's), quick-sort does two kinds
of things with the array:
	- compare a[i] to a[j]
	- interchange a[i] and a[j]
So we could write a procedure

	void quicksort(int L, int U,
		int (*compare)(int i, int j),
		void (*swap)(int i, int j));

The caller would provide suitable procedures:

	#define NELTS 200
	char *myarray[NELTS];

	int mycompare(int i, int j)
	    {
		return strcmp(myarray[i], myarray[j]);
	    }

	void myswap(int i, int j)
	    {
		char *temp = myarra[i];
		myarray[i] = myarray[j];
		myarray[j] = temp;
	    }

	... quicksort(0, NELTS-1, mycompare, myswap);

The point is that quicksort doesn't do *arbitrary* things to the array,
so that as long as *compare and *swap _behave_ as if they were working
on an array, quicksort doesn't even care whether the array exists or not.

qsort() may do other things to the array (such as assignment one element
to another), so the interface is somewhat different.  qsort() needs to
know where the array starts (base) and how big each element is (width).
But comparison can work in much the same way.

	void qsort(void *base, size_t n_of_elts, size_t width,
	    int (*compare)(const void *e1, const void *e2));

When qsort() wants to compare element i of the array with element j,
it can't rely on C to get the pointer arithmetic right, because base
is void*.  It has to do the scaling itself.  That's why we pass in
width.  So
	(char*)base + i*width		is the ADDRESS of element i
	(char*)base + j*width		is the ADDRESS of element j
and in fact it is these addresses that are passed to compare:

	compare((void*)(i*width + (char*)base),
		(void*)(j*width + (char*)base))

is how two array elements are compared.  For example, if you are
sorting an array of strings, you would pass width as sizeof (char*),
and

	int strcompare(void *a, void *b)
	    { return strcmp(*(char **)a, *(char **)b); }

is the function you would pass to do the comparisons.

I hope this helps.

Message-Id: <9103250645.AA16689@leo.bby.oz.au>

This should work:

void qsort (void *base, size_t nelem,
	    size_t size, int (*cmp)(const void *, const void *)) {
     /* Actually slow sort */
     int i, j;
     char *array = base;
     char *tmp = malloc(size);
     
     for (i = (nelem - 1) * size; i > 0; i -= size) {
	  for (j = 0; j < i; j += size) {
	       if (cmp(array + j, array + j + size) == 1) {
		    memcpy(tmp, array + j, size);
		    memcpy(array + j, array + j + size, size);
		    memcpy(array + j + size, tmp, size);
	       }
	  }
     }

     free(tmp);
}

The sorting algorithm, as you can see, is the obvious slow sort that
is the first one anyone thinks of. You can substitute your own
algorithm. This shows you how to handle the data.

Void * is a `blank' pointer. It points to a location in memory, but it
has no type. Therefore you can't do arithmetic on it. That's why the
sort function needs to be told the size of the type the void * really
points to. Once you know that, you can cast the pointer to char *
(since sizeof(char) is by definition always 1, and char * must have
the same representation as void *), and increment it by multiples of
size. Actually this is the main reason for the rule that void * must
be the same as char *. It is in order to enable arithmetic on generic
types where you know the size but not the type.  Of course if typeof()
were a standard operator, this would be much easier, but it's not.

If you have any questions, just ask.

From wuxing%math.mscs.mu.edu@munnari.oz Wed Mar 27 10:10:24 1991

Look at the manpage for qsort(3).  qsort takes, among other things,
a pointer to the array, the size of the elements in the array, and
a pointer to a compare function.  To compare to elements, qsort()
calls the function with pointers to the objects to be compared.


From ken@csis.dit.csiro.au Wed Mar 27 12:50:40 1991

Your problem reduces to one of finding one or more comparison functions
that can compare any data type you are interested in and some code that
will exchange any data type.

From ken@csis.dit.csiro.au Wed Mar 27 12:52:37 1991

>	I am interested in HOW the system routine
>	handle different variable types..

That's why you have to provide a comparision function and tell qsort
the size of the data type. Don't they teach you about the independence
of sorting algorithm and the data type in CS?

>	Furthermore, WHY char casting is used for
>	the base pointer?

To make lint happy.

From: bhoughto@nevin.intel.com (Blair P. Houghton)

>>It can't be done.  You can always define a type of input
>>for which "sorting" isn't defined.
>>
>   Huh? I may be relatively new to C but isn't the stdlib function
>qsort() an example of a program which does just that? You pass it the

Y'know, it would've helped a little if you'd just bothered to read
down to the end of my posting.  It was qsort-a-rama...

It still doesn't solve the problem of "sorting any data
type", since that problem is impossible to solve (I.e., you
have to know in advance the types that may be passed, which
necessarily omits those types you do not know and which
are, by the definition of the word "any", part of the set
"any data type").

				--Blair
				  "Here, have a NULL-prize..."

From: mcdaniel@adi.com (Tim McDaniel)

In article <1991Mar23.164807.7318@helios.physics.utoronto.ca>
neufeld@aurora.physics.utoronto.ca (Christopher Neufeld) writes:
   Huh? I may be relatively new to C but isn't the stdlib function
   qsort() an example of a program which does just that?

qsort() is a function, not a program, under the usual definition of
"program".  I believe that calling a function a program is a
deplorable confusion of terms.

As indicated before, a C *program* can be coded to sort many common
types.  Specifying the uncommon types is the problem.  Where
available, dynamic linking could be used, as could dynamic loading and
calling of code.  Extra command-line arguments to the program could
specify a sort predicate program, which could be called via system()
with hideous overhead on most systems.

This problem might be more suited for awk, perl, icon, or other
interpreted languages.  The user could then enter the comparison code
directly.

--
   "Of course he has a knife.  We all have knives.  It's 1183, and we're
   all barbarians."
Tim McDaniel                 Applied Dynamics Int'l.; Ann Arbor, Michigan, USA
Internet: mcdaniel@adi.com                UUCP: {uunet,sharkey}!amara!mcdaniel

From: bhoughto@nevin.intel.com (Blair P. Houghton)
>Hi,
>	Furthermore, WHY char casting is used for
>	the base pointer?

Even in ANSI C, you can expect a pointer to char to be a pointer
to a byte.  This makes it a natural choice for a generic pointer
type.  It means that when the pointer is incremented by 1, the
location it points to will be one byte further in the memory.

Notice that you have to give also the number of bytes that
each object occupies.  qsort() uses that number to increment
the pointer and thus point to the next object, not just
the next byte.

In ANSI C, however, there's a special type, `void *', which
is designed specifically to be a generic pointer type, to
be used in exactly this way.

qsort()'s sorting algorithm often isn't the best [*] (the
algorithm you use should depend on the distribution and
organization of your data, so it's easy to define a class
of sets that make a given sorting algorithm thrash and
slog), but this usage of anonymous data methods is a
classic of modular programming.

				--Blair
				  "Waxing didactic this week..."

[*] the algorithm it uses can be different from library
to library:  anything from bubble-sort to hashing.

From: gwyn@smoke.brl.mil (Doug Gwyn)

I think your question boils down to, "Explain the function of the first
argument to the qsort() function."

qsort()'s first argument is a "generic" pointer, formerly of type char*,
in ANSI C of type void*.  The reason for this is of course that the
implementation of qsort() has no way of knowing what the actual data
type will be at run time.  The third argument to qsort() specifies the
byte-width of a single element in the array of whatever actual type is
being sorted, so that qsort() is able to construct generic pointers to
the proper elements that it passes to the user-supplied comparison
function.  qsort() manages the sorting algorithm, while the user-
supplied comparison function gives meaning to comparison of whatever
data types are actually being sorted.  Note that the comparison function
has generic-pointer types for its arguments, which must be converted to
pointers to the actual data types being compared.

From: gwyn@smoke.brl.mil (Doug Gwyn)

I recommend adapting the sort utility from Kernighan & Plauger's "Software
Tools" for external file sorting.

From: cjc@ulysses.att.com (Chris Calabrese)

Yes, qsort() is a sort algorithm, not a way of life.

If you want to sort large ammounts of data, the method mentioned above
is much better - sort chunks and merge them.

Aside from the obvious fact that qsort() can only handle a relatively
small number of objects to be sorted, you'll get much better
performance out of the merge stratagy when you have more than about
2 mega bytes of data (a number arrived at by closely guarded secret
methods :-) using lots of data generated locally).  This number is
independant of physical memory size (assuming you have sizably more
than 2mb), but dependands on relative disk/memory/cpu speeds.

Using this method, you can sort very large data sets in a predictable
and reasonable time.

Name:			Christopher J. Calabrese
Brain loaned to:	AT&T Bell Laboratories, Murray Hill, NJ
att!ulysses!cjc		cjc@ulysses.att.com
Obligatory Quote:	``pher - gr. vb. to schlep.  phospher - to schlep light.philosopher - to schlep thoughts.''

===============================================================================
	Alvaro Hui		|ACSnet		akkh@mullian.oz
    5th Year B.E.\ B.Sc.	|Internet &	akkh@mullian.ee.mu.OZ.AU
   University of Melbourne	|Arpanet	rcoahk@koel.co.rmit.OZ.AU