[comp.lang.c] How to write a sorting program that will sort everything?

rcoahk@chudich.co.rmit.oz (Alvaro Hui Kau) (03/22/91)

Hi,
	Recently I came into the problem of writing
	a sorting program that needs to handle any
	type of input.

	i.e. the input to the sorting program can be
	 integer, floating point, strings, charcters..etc.

	Could any one help me with that?
	Thanks in advance!

rcoahk@melomys.co.rmit.oz.au


===============================================================================
	Alvaro Hui		|ACSnet		akkh@mullian.oz
    5th Year B.E.\ B.Sc.	|Internet &	akkh@mullian.ee.mu.OZ.AU
   University of Melbourne	|Arpanet	rcoahk@koel.co.rmit.OZ.AU 

bhoughto@pima.intel.com (Blair P. Houghton) (03/23/91)

In article <1991Mar22.005700.17663@minyos.xx.rmit.oz.au> rcoahk@chudich.co.rmit.oz (Alvaro Hui Kau) writes:
>Hi,
>	Recently I came into the problem of writing
>	a sorting program that needs to handle any
>	type of input.

It can't be done.  You can always define a type of input
for which "sorting" isn't defined.  For example, strings in
unix are usually sorted according to ASCII collating
sequence, so that A-Z appear before a-z; this is because,
represented as integers in the machine, this is the actual
order in which they appear.  The sort(1) program needs a
flag (usually `-f') to tell it when to sort according to
linguistic rules rather than arbitrary computer rules.

What you seem to want, however, is a program to sort any
of the basic types available in C.  This is easy.  Define
a union in which there is a tag for every one of the type
specifiers you might use:

	union c_basic_types {
	    float f;
	    double d;
	    int i;
	    ...
	};

and a set of constants you can use to flag the function:

#define SORT_FLOAT	1
#define SORT_DOUBLE	2
#define SORT_INT	3
...

and implement a switch to use the proper union tag when
the flag is passed

union c_basic_types
max( union c_basic_types x, union c_basic_types y, int sort_type )
{
    switch ( sort_type ) {
	case SORT_FLOAT:
	    return ( x.f > y.f ) ? x : y ;
	    break;
	case SORT_DOUBLE:
	    return ( x.d > y.d ) ? x : y ;
	    break;
	case SORT_INT:
	    return ( x.i > y.i ) ? x : y ;
	    break;
	...
	default:
	    /* some sort of error message for a type that isn't recognized */
	    exit(-1);
	    break;
    }
}

This is just the max() function; with an array or list of
unions you can implement a full sort.  Look into the
library function qsort(3). You can use this max function in
qsort if you modify it to use a global variable to pass the
sort_type and you have it return 1 when the value in x is
larger and 0 when the value in y is larger

main.c:
    extern int max( union c_basic_types x, c_basic_types y ); /* prototype */
    int sort_type;

    main()
    {
	union c_basic_types a[SIZE];
	extern int sort_type;

	.../* code to fill the array */...

	.../* code to set sort_type to some type */...

	qsort( a, SIZE, sizeof(a[0]), max );

	.../* code to use the sorted array */...
    }

max.c:
    int max( union c_basic_types x, c_basic_types y )
    {
	extern int sort_type;

	switch ( sort_type ) {
	    case SORT_FLOAT:
		return ( x.f > y.f );
		break;
	    case SORT_DOUBLE:
		return ( x.d > y.d );
		break;
	    case SORT_INT:
		return ( x.i > y.i );
		break;
	    ...
	    default:
		/* unrecognized type; print error message and die */
		fputs("Mickle foo and yer mama, too!",stderr);
		exit(-1);
		break;
	}
    }

The basic point here is you can't sort a type for which
you haven't defined the ordering method.  This is the
reason qsort(3) requires such a strange setup to do
its job.

The best I can do is answer your question and tell you _how_
to write it.

				--Blair
				  "It's sorting those damnable
				   '...' types that'll get
				   you every time..."

neufeld@aurora.physics.utoronto.ca (Christopher Neufeld) (03/24/91)

In article <3403@inews.intel.com> bhoughto@pima.intel.com (Blair P. Houghton) writes:
>In article <1991Mar22.005700.17663@minyos.xx.rmit.oz.au> rcoahk@chudich.co.rmit.oz (Alvaro Hui Kau) writes:
>>Hi,
>>	Recently I came into the problem of writing
>>	a sorting program that needs to handle any
>>	type of input.
>
>It can't be done.  You can always define a type of input
>for which "sorting" isn't defined.
>
   Huh? I may be relatively new to C but isn't the stdlib function
qsort() an example of a program which does just that? You pass it the
start of the array of elements to be sorted, the number of elements, and
the length of each elements, and a pointer to a user-defined comparator
function which is passed pointers to two elements, and returns an
integer value. This user-written function defines what the caller means
by "greater than/equal/less than", so any time you can conceive of a
reasonable "sort" order, you can write a function which returns '1' if
the contents of the first pointer are greater than those of the second,
'0' if they're equal, and '-1' if the first points to an elements less
than that to which the second points.
   Even if 'qsort()' is not right for your application, you can
implement any other sort program you like, with a function call to
establish the relative order of two elements.
   If there is a type of input for which "sorting" isn't defined, then
by definition you don't want to sort it. If you have your own definition
of a sort on the elements, you can sort it with that definition, even if
nobody else agrees with you on your rule. You needn't even require that
all the elements have the same type, you could sort a mixture of float
array elements, structures of addresses and phone numbers, and pointers
to functions, if you have some way to decide when one of those is
greater than another of them. For this, qsort() wouldn't work because it
accepts elements only of uniform size in contiguous memory. Your own
sort routine could do it, though.

   To answer the original question, I'd pull out any reference book on
sorting, for example a heap sort, and when you get to the line(s)
reading something along the lines of:
    if (*(a+i) < *(a+j)) {   SWAP(i,j)   }
replace that/those line(s) with:
    if (compar(a+i,a+j)) {   SWAP(i,j)   }
where compar() is a function you write to return non-zero if
*(a+i) < *(a+j)

You can modify this as necessary to check for equality and greater-than
conditions.


-- 
 Christopher Neufeld....Just a graduate student  | Too much self-love just
 neufeld@aurora.physics.utoronto.ca    Ad astra! | makes you jealous of
 cneufeld@{pnet91,pro-cco}.cts.com               | the people who envy you.
 "Don't edit reality for the sake of simplicity" |

bhoughto@nevin.intel.com (Blair P. Houghton) (03/24/91)

In article <1991Mar23.164807.7318@helios.physics.utoronto.ca> neufeld@aurora.physics.utoronto.ca (Christopher Neufeld) writes:
>In article <3403@inews.intel.com> bhoughto@pima.intel.com (Blair P. Houghton) writes:
>>It can't be done.  You can always define a type of input
>>for which "sorting" isn't defined.
>>
>   Huh? I may be relatively new to C but isn't the stdlib function
>qsort() an example of a program which does just that? You pass it the

Y'know, it would've helped a little if you'd just bothered to read
down to the end of my posting.  It was qsort-a-rama...

It still doesn't solve the problem of "sorting any data
type", since that problem is impossible to solve (I.e., you
have to know in advance the types that may be passed, which
necessarily omits those types you do not know and which
are, by the definition of the word "any", part of the set
"any data type").

				--Blair
				  "Here, have a NULL-prize..."

mcdaniel@adi.com (Tim McDaniel) (03/26/91)

In article <1991Mar23.164807.7318@helios.physics.utoronto.ca>
neufeld@aurora.physics.utoronto.ca (Christopher Neufeld) writes:
   Huh? I may be relatively new to C but isn't the stdlib function
   qsort() an example of a program which does just that?

qsort() is a function, not a program, under the usual definition of
"program".  I believe that calling a function a program is a
deplorable confusion of terms.

As indicated before, a C *program* can be coded to sort many common
types.  Specifying the uncommon types is the problem.  Where
available, dynamic linking could be used, as could dynamic loading and
calling of code.  Extra command-line arguments to the program could
specify a sort predicate program, which could be called via system()
with hideous overhead on most systems.

This problem might be more suited for awk, perl, icon, or other
interpreted languages.  The user could then enter the comparison code
directly.

--
   "Of course he has a knife.  We all have knives.  It's 1183, and we're
   all barbarians."
Tim McDaniel                 Applied Dynamics Int'l.; Ann Arbor, Michigan, USA
Internet: mcdaniel@adi.com                UUCP: {uunet,sharkey}!amara!mcdaniel

boyd@necisa.ho.necisa.oz.au (Boyd Roberts) (03/27/91)

In article <3418@inews.intel.com> bhoughto@nevin.intel.com (Blair P. Houghton) writes:
>
>It still doesn't solve the problem of "sorting any data
>type", since that problem is impossible to solve (I.e., you
>have to know in advance the types that may be passed, which
>necessarily omits those types you do not know and which
>are, by the definition of the word "any", part of the set
>"any data type").


Damn right.

So how are going to sort strudels?  Define the pastry comparison operator.

Which has precedence?  Sweet or savory?

Is there a native comparison operator that I can use with a #pragma?

Remember:  Strudels are tricky.


Boyd Roberts			boyd@necisa.ho.necisa.oz.au

``When the going gets wierd, the weird turn pro...''