lhf@aries5 (Luiz H de Figueiredo) (11/19/89)
In a recent thread about sorting strings, someone suggested that qsort was not the way to go because quicksort is not very good for nearly ordered files. My question is: Does qsort have to implement quicksort? My point is that the *name* qsort is historical but qsort can actually be very smart, eg. switching to insertion for small segments and/or pre-scan the file (array) to avoid quadratic behaviour for sorted data. In other words, qsort does not even have to implement quicksort at all! What is more useful to assume is that qsort is *very* efficient, so that we don't have to re-invent the wheel every day. What does the standard say about this? Luiz Henrique de Figueiredo internet: lhf@aries5.uwaterloo.ca Computer Systems Group bitnet: lhf@watcsg.bitnet University of Waterloo Waterloo, Ontario, Canada N2L 3G1
henry@utzoo.uucp (Henry Spencer) (11/19/89)
In article <861@maytag.waterloo.edu> lhf@aries5 (Luiz H de Figueiredo) writes: >In a recent thread about sorting strings, someone suggested that qsort was >not the way to go because quicksort is not very good for nearly ordered files. > Does qsort have to implement quicksort? ... >What does the standard say about this? The standard just says that qsort is a sort algorithm. It can be anything from bubblesort to an adaptive partitioning sort and still conform. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
eggert@ata.twinsun.com (Paul Eggert) (10/06/90)
Suppose I pass to qsort() a comparison function that isn't a total order. May qsort() dump core? For example, NN 6.4.10 passes the function order_subj_date() to qsort(). This function is not a total order: there are values A,B,C such that A<B, B<C, and C<A according to this function. (This bug is fixed in NN 6.4.11.) SunOS 4.1's qsort() dumps core sometimes in this situation. It seems to me that a conforming qsort() should be allowed to dump core here, but I don't see where the ANSI C standard permits this. While we're on the subject, what happens if the comparison function has side effects, benign or otherwise? How are the sequence points defined here? For example, suppose I qsort() an N-item array using a comparison function that increments a global counter starting at zero; must the counter be at least N-1 when qsort() returns? Similar questions apply to bsearch().
msb@sq.sq.com (Mark Brader) (10/09/90)
Paul Eggert (eggert@ata.twinsun.com) writes: > Suppose I pass to qsort() a comparison function that isn't a total order. > May qsort() dump core? I think this is a Yes. 4.10.5.2 says ... # ... a comparison function pointed to by compar, which is called with # two arguments that point to the objects being compared. The function # shall return an integer less than, equal to, or greater than zero if # the first argument is considered to be respectively less than, equal # to, or greater than the second. As there is no special language to say otherwise, I think that the terms "less than", "equal to", and "greater than" must be taken to have their ordinary meanings, including the property that "less than" is transitive. The general rule in 4.1.6 then applies: # If an argument to a function has an invalid value ... the behavior # is undefined. A pointer to a function that doesn't have the required property is clearly an invalid value, so the behavior is undefined and it may indeed dump core. > While we're on the subject, what happens if the comparison function has side > effects, benign or otherwise? Nothing prohibits this. > How are the sequence points defined here? For > example, suppose I qsort() an N-item array using a comparison function that > increments a global counter starting at zero; must the counter be at least > N-1 when qsort() returns? The general rules for sequence points apply. From 3.3.2.2: # ... there is a sequence point before the actual call [to a function]. And from 3.6: # A full expression is an expression that is not part of another # expression. ... The end of a full expression is a sequence point. Thus, if the comparison function does "++count;" or "bar = foo * count++;" then there is a sequence point before the call to the comparison function and another at the end of the "++count;" or "bar = foo * count++;". This means that each of the incrementings of "count" is properly separated, so no problem there. On the other hand, there is no guarantee that the library function qsort() actually calls the comparison function. The following, while no doubt silly, is a permissible implementation as far as I can tell. (No, I haven't tested the code, and there may be minor errors. This is just to give an idea of how a sort function might not call its comparison function.) { static int (*prev_compar) (const void *,const void *); static void *prev_result; static size_t prev_nmemb, prev_size; if (prev_result && compar == prev_compar && (compar == strcmp || compar == memcmp) && nmemb == prev_nmemb && size == prev_size && memcmp (base, prev_result, size * nmemb) == 0) return; /* it's already sorted */ prev_compar = compar; prev_nmemb = nmemb; prev_size = size; free (prev_result); /* ANSI C, no NULL check needed */ prev_result = malloc (size * nmemb); ... /* the actual sort goes here */ if (prev_result) memcpy (prev_result, base, size * nmemb); } The reason for the check that compar is either memcmp() or strcmp() is that if it was a user-written function then qsort() couldn't know whether its behavior was conditional on some global variable. But this does seem to mean that the count is not guaranteed to be at least N-1. > Similar questions apply to bsearch(). Similar answers apply, with one important exception. The comparison function is no longer required to induce a full ordering on the data, but only a partial ordering. 4.10.5.1 says: # The comparison function pointed to by compar is called with two # arguments that point to the key object and to an array element, # in that order. The function shall return an integer less than, # equal to, or greater than zero if the key object is considered, # respectively, to be less than, to match, or to be greater than # the array element. The array shall consist of: all the elements # that compare less than, all the elements that compare equal to, # and all the elements that compare greater than the key object, # in that order. Thus if K is the key, and A < K and K < B, and A, K, and B occur in the array in that order, then it is irrelevant if the comparison function would indicate that B < A. The Standard imposes requirements on "(*compar) (K, A)" and "(*compar) (K, B)" but not on "(*compar) (A, B)". Footnote 129 is attached to the above text and says that "In practice, the entire array is sorted according to the comparison function." Footnotes are not, of course, part of the actual standard. -- Mark Brader "Actually, $150, to an educational institution, Toronto turns out to be about the same as a lower amount." utzoo!sq!msb, msb@sq.com -- Mark Horton This article is in the public domain.
flee@guardian.cs.psu.edu (Felix Lee) (10/10/90)
>> While we're on the subject, what happens if the comparison function >> has side effects, benign or otherwise? > Nothing prohibits this. A pathological comparison function might shuffle the array being sorted each time it's called, in which case qsort() may never terminate. Is this prohibited? And I notice that 4.3 BSD's qsort uses static variables. Is the comparison function allowed to call qsort recursively? -- Felix Lee flee@cs.psu.edu
manning@nntp-server.caltech.edu (Evan Marshall Manning) (10/10/90)
flee@guardian.cs.psu.edu (Felix Lee) writes: >A pathological comparison function might shuffle the array being >sorted each time it's called, in which case qsort() may never >terminate. Is this prohibited? I myself have used qsort with a comparison function which used rand() to generate a positive or negative number. I was prepared for it to go on forever, but it terminated with the array fully shuffled. *************************************************************************** Your eyes are weary from staring at the CRT for so | Evan M. Manning long. You feel sleepy. Notice how restful it is | is to watch the cursor blink. Close your eyes. The |manning@gap.cco.caltech.edu opinions stated above are yours. You cannot | manning@mars.jpl.nasa.gov imagine why you ever felt otherwise. | gleeper@tybalt.caltech.edu
flee@guardian.cs.psu.edu (Felix Lee) (10/10/90)
I spoke hastily. Even if comparison is random, any reasonable sorting algorithm "makes progress" and will do no more than the worst-case number of comparisons. qsort() in 4.3 BSD however will have a decent chance of running past the bounds of the array in its final insertion sort pass. Here's a pathological comparator that guarantees this behavior: int cmp(a, b) char * a; char * b; { return b - a; } -- Felix Lee flee@cs.psu.edu