[comp.std.c] qsort

lhf@aries5 (Luiz H de Figueiredo) (11/19/89)

In a recent thread about sorting strings, someone suggested that qsort was
not the way to go because quicksort is not very good for nearly ordered files.

My question is:
	Does qsort have to implement quicksort?

My point is that the *name* qsort is historical but qsort can actually be
very smart, eg. switching to insertion for small segments and/or pre-scan the
file (array) to avoid quadratic behaviour for sorted data.
In other words, qsort does not even have to implement quicksort at all!
What is more useful to assume is that qsort is *very* efficient, so that we
don't have to re-invent the wheel every day.

What does the standard say about this?


Luiz Henrique de Figueiredo		internet: lhf@aries5.uwaterloo.ca
Computer Systems Group			bitnet:   lhf@watcsg.bitnet
University of Waterloo
Waterloo, Ontario, Canada  N2L 3G1

henry@utzoo.uucp (Henry Spencer) (11/19/89)

In article <861@maytag.waterloo.edu> lhf@aries5 (Luiz H de Figueiredo) writes:
>In a recent thread about sorting strings, someone suggested that qsort was
>not the way to go because quicksort is not very good for nearly ordered files.
>	Does qsort have to implement quicksort? ...
>What does the standard say about this?

The standard just says that qsort is a sort algorithm.  It can be anything
from bubblesort to an adaptive partitioning sort and still conform.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

eggert@ata.twinsun.com (Paul Eggert) (10/06/90)

Suppose I pass to qsort() a comparison function that isn't a total order.
May qsort() dump core?

For example, NN 6.4.10 passes the function order_subj_date() to qsort().  This
function is not a total order: there are values A,B,C such that A<B, B<C, and
C<A according to this function.  (This bug is fixed in NN 6.4.11.)  SunOS 4.1's
qsort() dumps core sometimes in this situation.  It seems to me that a
conforming qsort() should be allowed to dump core here, but I don't see where
the ANSI C standard permits this.

While we're on the subject, what happens if the comparison function has side
effects, benign or otherwise?  How are the sequence points defined here?  For
example, suppose I qsort() an N-item array using a comparison function that
increments a global counter starting at zero; must the counter be at least N-1
when qsort() returns?

Similar questions apply to bsearch().

msb@sq.sq.com (Mark Brader) (10/09/90)

Paul Eggert (eggert@ata.twinsun.com) writes:
> Suppose I pass to qsort() a comparison function that isn't a total order.
> May qsort() dump core?

I think this is a Yes.  4.10.5.2 says ...

#  ... a comparison function pointed to by compar, which is called with
#  two arguments that point to the objects being compared.  The function
#  shall return an integer less than, equal to, or greater than zero if
#  the first argument is considered to be respectively less than, equal
#  to, or greater than the second.

As there is no special language to say otherwise, I think that the
terms "less than", "equal to", and "greater than" must be taken
to have their ordinary meanings, including the property that "less
than" is transitive.

The general rule in 4.1.6 then applies:

#  If an argument to a function has an invalid value ... the behavior
#  is undefined.

A pointer to a function that doesn't have the required property is
clearly an invalid value, so the behavior is undefined and it may
indeed dump core.

> While we're on the subject, what happens if the comparison function has side
> effects, benign or otherwise?

Nothing prohibits this.

> How are the sequence points defined here?  For
> example, suppose I qsort() an N-item array using a comparison function that
> increments a global counter starting at zero; must the counter be at least
> N-1 when qsort() returns?

The general rules for sequence points apply.  From 3.3.2.2:

#  ... there is a sequence point before the actual call [to a function].

And from 3.6:

#  A full expression is an expression that is not part of another
#  expression.  ...  The end of a full expression is a sequence point.

Thus, if the comparison function does "++count;" or "bar = foo * count++;"
then there is a sequence point before the call to the comparison function
and another at the end of the "++count;" or "bar = foo * count++;".  This
means that each of the incrementings of "count" is properly separated, so
no problem there.

On the other hand, there is no guarantee that the library function qsort()
actually calls the comparison function.  The following, while no doubt
silly, is a permissible implementation as far as I can tell.  (No, I haven't
tested the code, and there may be minor errors.  This is just to give an
idea of how a sort function might not call its comparison function.)

	{
		static int (*prev_compar) (const void *,const void *);
		static void *prev_result;
		static size_t prev_nmemb, prev_size;

		if (prev_result && compar == prev_compar
			&& (compar == strcmp || compar == memcmp)
			&& nmemb == prev_nmemb && size == prev_size
			&& memcmp (base, prev_result, size * nmemb) == 0)

			return;		/* it's already sorted */
		prev_compar = compar;
		prev_nmemb  = nmemb;
		prev_size   = size;
		free (prev_result);	/* ANSI C, no NULL check needed */
		prev_result = malloc (size * nmemb);
			...	/* the actual sort goes here */

		if (prev_result) memcpy (prev_result, base, size * nmemb);
	}

The reason for the check that compar is either memcmp() or strcmp() is
that if it was a user-written function then qsort() couldn't know whether
its behavior was conditional on some global variable.

But this does seem to mean that the count is not guaranteed to be at
least N-1.

> Similar questions apply to bsearch().

Similar answers apply, with one important exception.  The comparison
function is no longer required to induce a full ordering on the data,
but only a partial ordering.  4.10.5.1 says:

#  The comparison function pointed to by compar is called with two
#  arguments that point to the key object and to an array element,
#  in that order.  The function shall return an integer less than,
#  equal to, or greater than zero if the key object is considered,
#  respectively, to be less than, to match, or to be greater than
#  the array element.  The array shall consist of: all the elements
#  that compare less than, all the elements that compare equal to,
#  and all the elements that compare greater than the key object,
#  in that order.

Thus if K is the key, and A < K and K < B, and A, K, and B occur in
the array in that order, then it is irrelevant if the comparison function
would indicate that B < A.  The Standard imposes requirements on
"(*compar) (K, A)" and "(*compar) (K, B)" but not on "(*compar) (A, B)".

Footnote 129 is attached to the above text and says that "In practice,
the entire array is sorted according to the comparison function."
Footnotes are not, of course, part of the actual standard.

-- 
Mark Brader		"Actually, $150, to an educational institution,
Toronto			 turns out to be about the same as a lower amount."
utzoo!sq!msb, msb@sq.com				-- Mark Horton

This article is in the public domain.

flee@guardian.cs.psu.edu (Felix Lee) (10/10/90)

>> While we're on the subject, what happens if the comparison function
>> has side effects, benign or otherwise?

> Nothing prohibits this.

A pathological comparison function might shuffle the array being
sorted each time it's called, in which case qsort() may never
terminate.  Is this prohibited?

And I notice that 4.3 BSD's qsort uses static variables.  Is the
comparison function allowed to call qsort recursively?
--
Felix Lee	flee@cs.psu.edu

manning@nntp-server.caltech.edu (Evan Marshall Manning) (10/10/90)

flee@guardian.cs.psu.edu (Felix Lee) writes:

>A pathological comparison function might shuffle the array being
>sorted each time it's called, in which case qsort() may never
>terminate.  Is this prohibited?

I myself have used qsort with a comparison function which used
rand() to generate a positive or negative number.  I was prepared
for it to go on forever, but it terminated with the array fully
shuffled.

***************************************************************************
Your eyes are weary from staring at the CRT for so | Evan M. Manning
long.  You feel sleepy.  Notice how restful it is  |      is
to watch the cursor blink.  Close your eyes.  The  |manning@gap.cco.caltech.edu
opinions stated above are yours.  You cannot       | manning@mars.jpl.nasa.gov
imagine why you ever felt otherwise.               | gleeper@tybalt.caltech.edu

flee@guardian.cs.psu.edu (Felix Lee) (10/10/90)

I spoke hastily.  Even if comparison is random, any reasonable sorting
algorithm "makes progress" and will do no more than the worst-case
number of comparisons.

qsort() in 4.3 BSD however will have a decent chance of running past
the bounds of the array in its final insertion sort pass.  Here's a
pathological comparator that guarantees this behavior:
	int cmp(a, b) char * a; char * b;
	{ return b - a; }
--
Felix Lee	flee@cs.psu.edu