rcoahk@chudich.co.rmit.oz (Alvaro Hui Kau) (03/28/91)
Hi, a few days ago, I send in a request asking how to implement a sort which can handle everything.. I got many response... They all are very helpful! Thanks! Here are all the response I got... IF I MISS ANYONE, SORRY... ***************************************************** From: bhoughto@pima.intel.com (Blair P. Houghton) It can't be done. You can always define a type of input for which "sorting" isn't defined. For example, strings in unix are usually sorted according to ASCII collating sequence, so that A-Z appear before a-z; this is because, represented as integers in the machine, this is the actual order in which they appear. The sort(1) program needs a flag (usually `-f') to tell it when to sort according to linguistic rules rather than arbitrary computer rules. What you seem to want, however, is a program to sort any of the basic types available in C. This is easy. Define a union in which there is a tag for every one of the type specifiers you might use: union c_basic_types { float f; double d; int i; ... }; and a set of constants you can use to flag the function: #define SORT_FLOAT 1 #define SORT_DOUBLE 2 #define SORT_INT 3 ... and implement a switch to use the proper union tag when the flag is passed union c_basic_types max( union c_basic_types x, union c_basic_types y, int sort_type ) { switch ( sort_type ) { case SORT_FLOAT: return ( x.f > y.f ) ? x : y ; break; case SORT_DOUBLE: return ( x.d > y.d ) ? x : y ; break; case SORT_INT: return ( x.i > y.i ) ? x : y ; break; ... default: /* some sort of error message for a type that isn't recognized */ exit(-1); break; } } This is just the max() function; with an array or list of unions you can implement a full sort. Look into the library function qsort(3). You can use this max function in qsort if you modify it to use a global variable to pass the sort_type and you have it return 1 when the value in x is larger and 0 when the value in y is larger main.c: extern int max( union c_basic_types x, c_basic_types y ); /* prototype */ int sort_type; main() { union c_basic_types a[SIZE]; extern int sort_type; .../* code to fill the array */... .../* code to set sort_type to some type */... qsort( a, SIZE, sizeof(a[0]), max ); .../* code to use the sorted array */... } max.c: int max( union c_basic_types x, c_basic_types y ) { extern int sort_type; switch ( sort_type ) { case SORT_FLOAT: return ( x.f > y.f ); break; case SORT_DOUBLE: return ( x.d > y.d ); break; case SORT_INT: return ( x.i > y.i ); break; ... default: /* unrecognized type; print error message and die */ fputs("Mickle foo and yer mama, too!",stderr); exit(-1); break; } } The basic point here is you can't sort a type for which you haven't defined the ordering method. This is the reason qsort(3) requires such a strange setup to do its job. The best I can do is answer your question and tell you _how_ to write it. --Blair "It's sorting those damnable '...' types that'll get you every time..." From: neufeld@aurora.physics.utoronto.ca (Christopher Neufeld) Huh? I may be relatively new to C but isn't the stdlib function qsort() an example of a program which does just that? You pass it the start of the array of elements to be sorted, the number of elements, and the length of each elements, and a pointer to a user-defined comparator function which is passed pointers to two elements, and returns an integer value. This user-written function defines what the caller means by "greater than/equal/less than", so any time you can conceive of a reasonable "sort" order, you can write a function which returns '1' if the contents of the first pointer are greater than those of the second, '0' if they're equal, and '-1' if the first points to an elements less than that to which the second points. Even if 'qsort()' is not right for your application, you can implement any other sort program you like, with a function call to establish the relative order of two elements. If there is a type of input for which "sorting" isn't defined, then by definition you don't want to sort it. If you have your own definition of a sort on the elements, you can sort it with that definition, even if nobody else agrees with you on your rule. You needn't even require that all the elements have the same type, you could sort a mixture of float array elements, structures of addresses and phone numbers, and pointers to functions, if you have some way to decide when one of those is greater than another of them. For this, qsort() wouldn't work because it accepts elements only of uniform size in contiguous memory. Your own sort routine could do it, though. To answer the original question, I'd pull out any reference book on sorting, for example a heap sort, and when you get to the line(s) reading something along the lines of: if (*(a+i) < *(a+j)) { SWAP(i,j) } replace that/those line(s) with: if (compar(a+i,a+j)) { SWAP(i,j) } where compar() is a function you write to return non-zero if *(a+i) < *(a+j) You can modify this as necessary to check for equality and greater-than conditions. qsort(0 only implements the *sort* alogrithim. It depends upon calls to a user suplied routine (strcmp for example) which can compare two objects of the type to be sorted and return the results of the comparison to it. I would referr you to an excelent article on the subjetc in C users magizine about sxi months ago. If you can't find it mail me & I will look up the issue for you. -- Stan Brown P. c. Design 404-363-2303 Ataant Ga. (emory|gatech|uunet) rsiatl!sdba!stan "vi forever" Recently, (friday, that is) I finished my first program that sort using qsort. The problem with sorting ANYTHING with qsort is that it wants an array. I had to sort records in a file. Another problem is that qsort takes only an unsigned int as the number of arguements and the files I had to sort could easily be multiple megs (a record was only 14 bytes). Now all of this was solved easily using multiple qsorts on records read into a huge array and then merging the qsorts into the final file. But qsort is not an end all for sorting ANYTHING by any stretch of the imagination. qsort is a big help on almost all fixed size elements. Your friendly Windows developer, Jerry. ---- ProLine: jludwig@pro-grouch Internet: jludwig@pro-grouch.cts.com UUCP: crash!pro-grouch!jludwig ARPA: crash!pro-grouch!jludwig@nosc.mil Actually, it is a project. What I wanted is to reproduce the capability of qsort(). To be specific, do you know HOW qsort() achieve the ability to sort everything? Don't Kernighan & Ritchie give a sample implementation? Basically, qsort() can sort any kind of data because qsort() never knows what it is sorting. qsort() was originally specified to be an implementation of "quicker-sort", which is a variant of "quick-sort". (Both of them are bad names, as merge-sort is substantially faster than either.) As you probably know (and if you didn't, you can find out in almost any book on algorithms, such as Sedgewick's), quick-sort does two kinds of things with the array: - compare a[i] to a[j] - interchange a[i] and a[j] So we could write a procedure void quicksort(int L, int U, int (*compare)(int i, int j), void (*swap)(int i, int j)); The caller would provide suitable procedures: #define NELTS 200 char *myarray[NELTS]; int mycompare(int i, int j) { return strcmp(myarray[i], myarray[j]); } void myswap(int i, int j) { char *temp = myarra[i]; myarray[i] = myarray[j]; myarray[j] = temp; } ... quicksort(0, NELTS-1, mycompare, myswap); The point is that quicksort doesn't do *arbitrary* things to the array, so that as long as *compare and *swap _behave_ as if they were working on an array, quicksort doesn't even care whether the array exists or not. qsort() may do other things to the array (such as assignment one element to another), so the interface is somewhat different. qsort() needs to know where the array starts (base) and how big each element is (width). But comparison can work in much the same way. void qsort(void *base, size_t n_of_elts, size_t width, int (*compare)(const void *e1, const void *e2)); When qsort() wants to compare element i of the array with element j, it can't rely on C to get the pointer arithmetic right, because base is void*. It has to do the scaling itself. That's why we pass in width. So (char*)base + i*width is the ADDRESS of element i (char*)base + j*width is the ADDRESS of element j and in fact it is these addresses that are passed to compare: compare((void*)(i*width + (char*)base), (void*)(j*width + (char*)base)) is how two array elements are compared. For example, if you are sorting an array of strings, you would pass width as sizeof (char*), and int strcompare(void *a, void *b) { return strcmp(*(char **)a, *(char **)b); } is the function you would pass to do the comparisons. I hope this helps. Message-Id: <9103250645.AA16689@leo.bby.oz.au> This should work: void qsort (void *base, size_t nelem, size_t size, int (*cmp)(const void *, const void *)) { /* Actually slow sort */ int i, j; char *array = base; char *tmp = malloc(size); for (i = (nelem - 1) * size; i > 0; i -= size) { for (j = 0; j < i; j += size) { if (cmp(array + j, array + j + size) == 1) { memcpy(tmp, array + j, size); memcpy(array + j, array + j + size, size); memcpy(array + j + size, tmp, size); } } } free(tmp); } The sorting algorithm, as you can see, is the obvious slow sort that is the first one anyone thinks of. You can substitute your own algorithm. This shows you how to handle the data. Void * is a `blank' pointer. It points to a location in memory, but it has no type. Therefore you can't do arithmetic on it. That's why the sort function needs to be told the size of the type the void * really points to. Once you know that, you can cast the pointer to char * (since sizeof(char) is by definition always 1, and char * must have the same representation as void *), and increment it by multiples of size. Actually this is the main reason for the rule that void * must be the same as char *. It is in order to enable arithmetic on generic types where you know the size but not the type. Of course if typeof() were a standard operator, this would be much easier, but it's not. If you have any questions, just ask. From wuxing%math.mscs.mu.edu@munnari.oz Wed Mar 27 10:10:24 1991 Look at the manpage for qsort(3). qsort takes, among other things, a pointer to the array, the size of the elements in the array, and a pointer to a compare function. To compare to elements, qsort() calls the function with pointers to the objects to be compared. From ken@csis.dit.csiro.au Wed Mar 27 12:50:40 1991 Your problem reduces to one of finding one or more comparison functions that can compare any data type you are interested in and some code that will exchange any data type. From ken@csis.dit.csiro.au Wed Mar 27 12:52:37 1991 > I am interested in HOW the system routine > handle different variable types.. That's why you have to provide a comparision function and tell qsort the size of the data type. Don't they teach you about the independence of sorting algorithm and the data type in CS? > Furthermore, WHY char casting is used for > the base pointer? To make lint happy. From: bhoughto@nevin.intel.com (Blair P. Houghton) >>It can't be done. You can always define a type of input >>for which "sorting" isn't defined. >> > Huh? I may be relatively new to C but isn't the stdlib function >qsort() an example of a program which does just that? You pass it the Y'know, it would've helped a little if you'd just bothered to read down to the end of my posting. It was qsort-a-rama... It still doesn't solve the problem of "sorting any data type", since that problem is impossible to solve (I.e., you have to know in advance the types that may be passed, which necessarily omits those types you do not know and which are, by the definition of the word "any", part of the set "any data type"). --Blair "Here, have a NULL-prize..." From: mcdaniel@adi.com (Tim McDaniel) In article <1991Mar23.164807.7318@helios.physics.utoronto.ca> neufeld@aurora.physics.utoronto.ca (Christopher Neufeld) writes: Huh? I may be relatively new to C but isn't the stdlib function qsort() an example of a program which does just that? qsort() is a function, not a program, under the usual definition of "program". I believe that calling a function a program is a deplorable confusion of terms. As indicated before, a C *program* can be coded to sort many common types. Specifying the uncommon types is the problem. Where available, dynamic linking could be used, as could dynamic loading and calling of code. Extra command-line arguments to the program could specify a sort predicate program, which could be called via system() with hideous overhead on most systems. This problem might be more suited for awk, perl, icon, or other interpreted languages. The user could then enter the comparison code directly. -- "Of course he has a knife. We all have knives. It's 1183, and we're all barbarians." Tim McDaniel Applied Dynamics Int'l.; Ann Arbor, Michigan, USA Internet: mcdaniel@adi.com UUCP: {uunet,sharkey}!amara!mcdaniel From: bhoughto@nevin.intel.com (Blair P. Houghton) >Hi, > Furthermore, WHY char casting is used for > the base pointer? Even in ANSI C, you can expect a pointer to char to be a pointer to a byte. This makes it a natural choice for a generic pointer type. It means that when the pointer is incremented by 1, the location it points to will be one byte further in the memory. Notice that you have to give also the number of bytes that each object occupies. qsort() uses that number to increment the pointer and thus point to the next object, not just the next byte. In ANSI C, however, there's a special type, `void *', which is designed specifically to be a generic pointer type, to be used in exactly this way. qsort()'s sorting algorithm often isn't the best [*] (the algorithm you use should depend on the distribution and organization of your data, so it's easy to define a class of sets that make a given sorting algorithm thrash and slog), but this usage of anonymous data methods is a classic of modular programming. --Blair "Waxing didactic this week..." [*] the algorithm it uses can be different from library to library: anything from bubble-sort to hashing. From: gwyn@smoke.brl.mil (Doug Gwyn) I think your question boils down to, "Explain the function of the first argument to the qsort() function." qsort()'s first argument is a "generic" pointer, formerly of type char*, in ANSI C of type void*. The reason for this is of course that the implementation of qsort() has no way of knowing what the actual data type will be at run time. The third argument to qsort() specifies the byte-width of a single element in the array of whatever actual type is being sorted, so that qsort() is able to construct generic pointers to the proper elements that it passes to the user-supplied comparison function. qsort() manages the sorting algorithm, while the user- supplied comparison function gives meaning to comparison of whatever data types are actually being sorted. Note that the comparison function has generic-pointer types for its arguments, which must be converted to pointers to the actual data types being compared. From: gwyn@smoke.brl.mil (Doug Gwyn) I recommend adapting the sort utility from Kernighan & Plauger's "Software Tools" for external file sorting. From: cjc@ulysses.att.com (Chris Calabrese) Yes, qsort() is a sort algorithm, not a way of life. If you want to sort large ammounts of data, the method mentioned above is much better - sort chunks and merge them. Aside from the obvious fact that qsort() can only handle a relatively small number of objects to be sorted, you'll get much better performance out of the merge stratagy when you have more than about 2 mega bytes of data (a number arrived at by closely guarded secret methods :-) using lots of data generated locally). This number is independant of physical memory size (assuming you have sizably more than 2mb), but dependands on relative disk/memory/cpu speeds. Using this method, you can sort very large data sets in a predictable and reasonable time. Name: Christopher J. Calabrese Brain loaned to: AT&T Bell Laboratories, Murray Hill, NJ att!ulysses!cjc cjc@ulysses.att.com Obligatory Quote: ``pher - gr. vb. to schlep. phospher - to schlep light.philosopher - to schlep thoughts.'' =============================================================================== Alvaro Hui |ACSnet akkh@mullian.oz 5th Year B.E.\ B.Sc. |Internet & akkh@mullian.ee.mu.OZ.AU University of Melbourne |Arpanet rcoahk@koel.co.rmit.OZ.AU