[comp.lang.c++] Comparison functions for qsort

chip@tct.uucp (Chip Salzenberg) (12/25/90)

[This discussion is relevant to both the C and C++ languages; hence
 the cross-post.  Please direct followups appropriately.]

According to rfg@NCD.COM (Ron Guilmette):
>Both qsort() and bsearch() have a similar need to be fed a pointer to a
>function which itself takes two parameters of type pointer-to-T, where
>the type T is the same for both of the two parameters (but is never
>actually the type `void').

Bzzt.  Thanks for playing, though.

Both qsort() and bsearch() require a pointer to a function which does
*actually* take two parameters of type pointer-to-void.  The fact that
the function in question will almost immediately cast those pointers
into pointer-to-T is utterly irrelevant.

Remember that qsort() cannot know the real type of the objects it is
sorting.  Remember also (as noted in an unquoted portion of Ron's
article) that on word-addressed machines, pointer-to-T and
pointer-to-void may differ in representation and/or size.  It is
therefore obvious that qsort() is unable to make a correct call to a
function that has pointer parameters of any type other than
pointer-to-void (or, equivalently, pointer-to-char).

If you declare a comparison function that take parameters of type
pointer-to-T, and pass the address of that function to qsort(), and
qsort() still works, you're quite fortunate (or unfortunate, depending
on how you look at it).  But that doesn't change the fact that such a
practice is not, nor can it ever be, portable.

>But what if that compare routine is stashed away in a precompiled library?

Then the person who wrote it doesn't know what he's doing, and you
should stop using his code immediately.

>In short, IT IS IMPOSSIBLE TO WRITE QSORT() ENTIRELY IN ANSI C OR IN C++
>IN SUCH A WAY THAT IT IS COMPLETELY PORTABLE.

This statement is false.  It is entirely possible to do so, thanks to
ANSI's guarantee that pointer-to-char and pointer-to-void will always
have identical representations.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
"Please don't send me any more of yer scandalous email, Mr. Salzenberg..."
		-- Bruce Becker

barmar@think.com (Barry Margolin) (12/26/90)

In article <277640DD.4A70@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
>>In short, IT IS IMPOSSIBLE TO WRITE QSORT() ENTIRELY IN ANSI C OR IN C++
>>IN SUCH A WAY THAT IT IS COMPLETELY PORTABLE.
>This statement is false.  It is entirely possible to do so, thanks to
>ANSI's guarantee that pointer-to-char and pointer-to-void will always
>have identical representations.

I think Chip is correct about C, but not necessarily about C++.  Actually,
qsort() itself can be written portably in C++, but it's hard to write most
comparison functions portably.  The problem is that C++ has restrictions on
casting from void* into <type>*.  So, you can pass the void* parameters to
the comparison function, but it may not be able to do anything with them.

--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

chip@tct.uucp (Chip Salzenberg) (12/26/90)

According to barmar@think.com (Barry Margolin):
>The problem is that C++ has restrictions on casting from void* into <type>*.

I can find no such restriction in E&S.  In section 4.6 (pages 35-38) I
find this relevant passage:

"[Commentary] ANSI C allows an IMPLICIT conversion from a pointer to
|void| to a pointer to another object type (but not to a pointer to
function type); in C++ a |void*| cannot be assigned to an object of
any type other than |void*| WITHOUT AN EXPLICIT CAST." [Emphasis mine.]

So a comparison function may legally and safely cast a |void*| back to
its original type, though it will take an explicit cast to do so.

Casting from |A*| to |void*| to |B*|, of course, is Right Out in both
C and C++.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
"Please don't send me any more of yer scandalous email, Mr. Salzenberg..."
		-- Bruce Becker

warsaw@nlm.nih.gov (Barry A. Warsaw) (12/26/90)

	barmar> The problem is that C++ has restrictions on casting from
	barmar> void* into <type>*.

Specifically, what are these restrictions?

	barmar> -- Barry Margolin, Thinking Machines Corp.

-Barry
NAME:  Barry A. Warsaw         INET: warsaw@nlm.nih.gov
TELE:  (301) 496-1936          UUCP: uunet!nlm.nih.gov!warsaw

rmartin@clear.com (Bob Martin) (12/27/90)

In article <277640DD.4A70@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
>[This discussion is relevant to both the C and C++ languages; hence
> the cross-post.  Please direct followups appropriately.]

	This followup involves only C++ and the newsgroup has been adjusted
	accordingly.
>
>Both qsort() and bsearch() require a pointer to a function which does
>*actually* take two parameters of type pointer-to-void.  The fact that
>the function in question will almost immediately cast those pointers
>into pointer-to-T is utterly irrelevant.

	I dissagree.  As Ron said in his article.  It is always possible
	to cast a pointer to anything to a pointer to void.  But it is
	not guaranteed that you can cast a pointer to void back to 
	its original.  Case in point is a class derived from multiple bases:

		class AB : public A, public B;

	Now take the following variable:
		AB *ab;

	The following two casts work just fine:  
		(A *)ab;  (B *)ab;

	But the following, though it may compile, is likely not to work properly
	because it miscalculates the address of at least one of the base classes.
		(A *)(void *)ab; or (B *)(void *)ab;

	Yet if qsort is to sort objects of type AB, then this is precicely what
	the void* implementation of compar would be asked to do!  Worse yet
	the compiler can't catch this error and will thus allow corrupted pointers
	loose in your program.  Amost certainly the program will crash.

>
>Remember that qsort() cannot know the real type of the objects it is
>sorting.  

	Again I agree with Ron.  qsort/bsearch should be templated functions.  
	Better yet they should be member functions embedded in some kind of
	container class.

-- 
+-Robert C. Martin-----+:RRR:::CCC:M:::::M:| Nobody is responsible for |
| rmartin@clear.com    |:R::R:C::::M:M:M:M:| my words but me.  I want  |
| uunet!clrcom!rmartin |:RRR::C::::M::M::M:| all the credit, and all   |
+----------------------+:R::R::CCC:M:::::M:| the blame.  So there.     |

barmar@think.com (Barry Margolin) (12/27/90)

In article <WARSAW.90Dec26104259@warsaw.nlm.nih.gov> warsaw@nlm.nih.gov writes:
>	barmar> The problem is that C++ has restrictions on casting from
>	barmar> void* into <type>*.
>Specifically, what are these restrictions?

A week or so ago someone mentioned in this list that there is a problem
such conversions and multiple inheritance.  However, I can't find any
mention of it in the "C++ Primer".

I think the problem he described was when a class D is derived from B1 and
B2, and you have the following declarations:

	B1* b1p;
	B2* b2p;
	D* dp;
	void* vp;

	vp = (void*) dp;// OK, explicit cast is optional
	dp = (D*) vp;	// OK, requires explicit cast
	b1p = (B1*) vp; // ??
	b2p = (B2*) vp; // ??

One of the last two conversions is questionable because it requires the
pointer to be offset to the beginning of the appropriate base portion of
the object.  However, the offset depends on the actual type of the object
that vp is pointing to.  Casting back to the type from which the void* was
cast is easy, but casting to a base class effectively requires a virtual
function call, and it's not clear that this is done automatically.

This comes up because the comparison function may be designed to compare
objects of a base class, so it will presumably first cast its void*
argument to a <base>* and then do whatever it does.

Actually, there *is* a solution.  The routine that casts to void* (the
caller of the qsort-like function) can first cast to the appropriate
<base>*, e.g.

	vp = (void*) ((B1*) dp);
	b1p = (B1*) vp;

This works because C++ always knows how to convert a derived pointer to any
of its base class pointers.  However, it's really inconvenient for the
kinds of functions we're discussing (qsort() and bsearch()) because it
doesn't allow arrays/trees to be operated on in place; the caller must
build a second array/tree containing void* pointers whose values come from
the above double cast, pass this to the function, and then cast the results
back to the original types.

--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

barmar@think.com (Barry Margolin) (01/03/91)

In article <1990Dec26.160636.15566@clear.com> rmartin@clear.com (Bob Martin) writes:
>	I dissagree.  As Ron said in his article.  It is always possible
>	to cast a pointer to anything to a pointer to void.  But it is
>	not guaranteed that you can cast a pointer to void back to 
>	its original.  

This was a point I made in my article a week or two ago, and I was
corrected.  You can always cast a pointer to void back to its original
type.  However, if you cast T* to void*, you can't always cast the void* to
<parent-of-T>*.

>		       Case in point is a class derived from multiple bases:
>
>		class AB : public A, public B;
>
>	Now take the following variable:
>		AB *ab;
>
>	The following two casts work just fine:  
>		(A *)ab;  (B *)ab;
>
>	But the following, though it may compile, is likely not to work properly
>	because it miscalculates the address of at least one of the base classes.
>		(A *)(void *)ab; or (B *)(void *)ab;

But the following do work:

	(A *) (AB *) (void *) ab;
	(B *) (AB *) (void *) ab;

>	Yet if qsort is to sort objects of type AB, then this is precicely what
>	the void* implementation of compar would be asked to do!  

No, that is only true if you are using a compar function intended for
sorting As or Bs.  If it is expecting ABs then it will work fine.
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

mat@mole-end.UUCP (Mark A Terribile) (01/03/91)

> >Both qsort() and bsearch() require a pointer to a function which [takes]
> >two parameters of type pointer-to-void.  ... that the function ... will
> >almost immediately cast [them to] pointer-to-T is utterly irrelevant.
 
> 	...  It is always possible to cast a pointer to anything to a pointer
>	to void.  But it is not guaranteed that you can cast a pointer to
>	void back to its original.  ...

Eh?  So long as it's not a pointer-to-function, it sure is guaranteed possible,
or I'm a monkey's uncle.  (Which I ain't.  So there.)
 
> >Remember that qsort() cannot know the real type of the objects it is
> >sorting.  
 
> 	Again I agree with Ron.  qsort/bsearch should be templated functions.  
> 	Better yet they should be member functions embedded in some kind of
> 	container class.

I think that we've been railroaded up a type system branch line and out onto
a siding with no end-of-track warning and no bumber.

Fust of all, the ``array'' 'pon which qsort() and it's ilk work must be a
true homogeneous array.  In other words, everything's gotta be the same size
so the exchanges will work and everything's gotta have enough the same
structure that the comparison functions will work.  This may mean that qsort()
or its buddies is to be given an array of pointers.

Second, the real conflict isn't between  void*  and  XYZ*  and it has nothing
to do with MI.  NOTHING, NOTHING, NOTHING, NOTHING.  NO THING.  NOT A THING IN
THIS WORLD.  NOTHING.  The MI problem occurs when you have pointers that are
stored as  derived*  and interpreted as base* .  That problem *COULD* occur
here, but it's not an essential part of the problem.  In other words, it's a
herring that has gone beyond red and is well on its way to scarlet.

The problem is that  qsort()  wants to use a generic pointer  ( void* )  to
address its `stuff' and so it wants to call a compare function that looks
like

		int compare( void*, void* )

while the function will necessarily be of type

		int compare( const Stuff*, const Stuff* )

(or  const Stuff**, const Stuff** , or whatever).

Thus, qsort takes (or SHOULD take) as a parameter an argument of the form

		int (*cmpr_fun)( void*, void* )

and we want to pass it an argument of the form

		int (*cmpr_fun)( const Stuff*, const Stuff* )

This means that we must coerce (by cast) a pointer-to-function of one type
to the type of the other pointer-to-function.

I want to believe that this is harmless; I am inclined to believe that on
most machines it is, so long as we supply the compare function and it knows
what of what  Stuff  we are asking  qsort()  to speak.  (What if it's not
harmless?  More a little later ...)

What is probably DISASTROUS is to use the ellipsis in declaring  qsort() .
If  qsort() 's argument is of the form   int (*cmpr_fun)( ... )  the function
may get called (and under some MS-DOS compilers, *WILL* get called) with an
entirely different calling sequence, such that the function will misunderstand
its arguments and even its call/return sequence.

What compilers and why?  The Glockenspiel port of  cfront , at least, and
because ordinary member functions of fixed signature can be called using the
Pascal calling sequence, which is smaller and faster than the C calling
sequence (because the size of the argument list to be released in the Pascal
calling sequence is known definitely to the callED function, which can
specify in the return statement that the stack is to be so adjusted--and this
is not possible in the C calling sequence, in which the callING function must
strip the arguments).  (Unfortunately, if the Pascal calling sequence is used,
the function is marked as Pascal in the object file and the frotzenglarken
linker insists on ignoring case when comparing function names.)

In other words, Sun Microsystems has indeed made a horrible mistake.  You've
just got to trust the user to pun the types correctly here, at least until
we get a better solution.  The ellipsis is dead wrong.

	Once again, for those who came late:

		The ellipsis in the prototype of the compare function
		pointer in the prototype for  qsort()  (or any other generic
		sorting or searching function) is ABSOLUTELY, POSITIVELY,
		IRRETREVIEVABLY, D-E-A-D DEAD WRONG.

But isn't this a matter for the ANSI committee, as well as for Sun?  I wish
them well, I sure do ...

Templates can provide a better solution, provided that it's not necessary to
either (a) compile in a whole new version of the  qsort()  code for each type
to be handled, or (b) introduce additional out-of-line function calls into
each internal step of the sort.

But templates don't answer the problem completely.  What happens if you
have something like a database engine that has to handle whatever types and
sizes are thrown at it without recompilation?  Then you need the flexibility
provided by the current  qsort() .

There's another problem, and a it's real killer.  What if  void*  is different
in size and even in how (e.g., in what register or stack) it must be passed
from  Stuff* ?  Then you CAN'T pass an
int (*cmpr_fun)( const Stuff*, const Stuff* )  to  qsort()  (which,
remember, wants an  int (*cmpr_fun)( void*, void* ) .  This means that if
your compare function is  int strcmp( const char*, const char* ) , you really
DO have to wrap it in a function like

	int str_as_void_cmp( const void* s1, const void* s2 )
	{
		return strcmp( (const char*) s1, (const char*) s2 );
	}

Furthermore, it won't do any good to declare it inline, because a REAL pointer
and REAL operations are needed.

Again, templates can help.

It would also be nice (and may become important in the world of C++ and
templates) if global optimizers realized that all  F()  does is call  G()
with exactly the same (to the machine) parameters, and replace the call to
F()  outright  with the call to  G() .  This requires optimization AFTER
symbol resolution but before relocation--in other words, optimization
BETWEEN THE LINKING PHASES.  Such may be a real need in C++ with templates.
Of course, if the function  F()  is declared inline, the code generator has
information sufficient to perform the optimization, IF its optimizer can make
use of the information.  Again, it may become important to do so soon.
(Instead of removing the call to  F() , it may be possible to turn  F() into
a jump into a point inside G() and perhaps just after G()'s prologue--if the
appropriate label is available.)

Such optimization would allow us to write the `correct' code (correct in that
we explicitly type-pun and take responsibility for the consequences) while
allowing the code to run without extra cost on machines on which that is
indeed possible.
-- 

 (This man's opinions are his own.)
 From mole-end				Mark Terribile

Bruce.Hoult@bbs.actrix.gen.nz (01/05/91)

Barry Margolin writes:
>Bob Martin writes:
>>		       Case in point is a class derived from multiple bases:
>>
>>		class AB : public A, public B;
>>
>>	Now take the following variable:
>>		AB *ab;
>>
>>	The following two casts work just fine:  
>>		(A *)ab;  (B *)ab;
>>
>>	But the following, though it may compile, is likely not to work properly
>>	because it miscalculates the address of at least one of the base classes.
>>		(A *)(void *)ab; or (B *)(void *)ab;
> 
>But the following do work:
>
>	(A *) (AB *) (void *) ab;
>	(B *) (AB *) (void *) ab;

That doesn't solve the problem in question, since the compare function will
most likely know that it is comparing A's or B's, but not that they are
embedded in AB's.  After all, if you're writing an A or B compare function
that *does* know this, then you might as well write a special one for AB's
anyway.

What *will* work is...

    (A*) (void*)(A*) ab;    or
    (B*) (void*)(B*) ab;
-- 
Bruce.Hoult@bbs.actrix.gen.nz   Twisted pair: +64 4 772 116
BIX: brucehoult                 Last Resort:  PO Box 4145 Wellington, NZ

hansen@pegasus.att.com (Tony L. Hansen) (01/06/91)

Lest everyone gets lost in the discussions on what's good and bad with
qsort()'s (and bsearch()'s) definition, I'm going to iterate here the proper
way to use qsort() which avoids all problems on all machine types and works
everywhere:

    o	First off, the proper prototypes are:

	void *bsearch(const void*key, const void *base,
	    size_t nmemb, size_t size,
	    int (*compar)(const void*, const void*));

	void qsort(void *base, size_t nmemb, size_t size,
	    int (*compar)(const void*, const void*));

	These functions are required to be declared within <stdlib.h>.
	This is from Section 4.10.5.1 and 4.10.5.2 of the ANSI C standard,
	which is also a base document for the draft ANSI C++.

    o	For a given type T of which an array is being searched or sorted,
	the comparison function must be declared:

	int Tcomparison(const void *a, const void *b)
	{
	    const T *Ta = (const T*)a;
	    const T *Tb = (const T*)b;
	    // now do whatever you want to compare *Ta and *Tb
	}

    o	Note that the type cast MUST be to type T and not to any base class
	of T. (This is particularly a requirement when T is multiply
	inherited.) After casting to T*, you may THEN cast to a base class of
	T if you wish, but you can't go directly to that base class from the
	void*.

Once templates become available, it will be possible to create a template
version of qsort() which looks something like this:

	template <class T>
	void qsort(T *base, size_t nmemb, int (*compar)(const T*, const T*));

(It should even be possible to implement the template qsort() using the void*
version.)

					Tony Hansen
				att!pegasus!hansen, attmail!tony
				    hansen@pegasus.att.com
				      tony@attmail.com

hansen@pegasus.att.com (Tony L. Hansen) (01/06/91)

<< From: chip@tct.uucp (Chip Salzenberg)
<< Both qsort() and bsearch() require a pointer to a function which does
<< *actually* take two parameters of type pointer-to-void.  The fact that
<< the function in question will almost immediately cast those pointers
<< into pointer-to-T is utterly irrelevant.

< From: rmartin@clear.com (Bob Martin)
< I dissagree.  As Ron said in his article.  It is always possible to
< cast a pointer to anything to a pointer to void.  But it is not
< guaranteed that you can cast a pointer to void back to its original.
< Case in point is a class derived from multiple bases:
<	class AB : public A, public B;
< Now take the following variable:
<	AB *ab;
< The following two casts work just fine:  
<	(A *)ab;  (B *)ab;
< But the following, though it may compile, is likely not to work
< properly because it miscalculates the address of at least one of the
< base classes.
<	(A *)(void *)ab; or (B *)(void *)ab;
< Yet if qsort is to sort objects of type AB, then this is precicely what
< the void* implementation of compar would be asked to do!  Worse yet the
< compiler can't catch this error and will thus allow corrupted pointers
< loose in your program.  Amost certainly the program will crash.

Bob, you're seriously mixing apples and oranges, and getting caught in the
middle. You're right, you cannot take a void* created from an AB* and cast it
directly to a B* and expect things to work out right. SO DON'T DO IT! No
comparison function passed to qsort() should ever do that! You're passing an
AB* cast to a void*, so cast the void* back to an AB*!  If you choose to go
on from there to an A* or a B*, fine. But don't go directly from the void* to
an A* or B*. In other words, declare your comparison function like this:

	int comparison(const void *a, const void *b)
	{
	    const AB *ABa = (const AB*)a;
	    const AB *ABb = (const AB*)b;
	    // compare *ABa with *ABb
	}

If you do so, everything works out just fine.

See another article of mine on a brief description of a template definition
of qsort().

					Tony Hansen
				att!pegasus!hansen, attmail!tony
				    hansen@pegasus.att.com
				      tony@attmail.com

jimad@microsoft.UUCP (Jim ADCOCK) (01/08/91)

ARM  pg 36 states that a pointer to any non-const, non-volatile object type
can be converted to a void*

ARM pg 69 states that a void* is considered a pointer to object type.

ARM pg 69 states that a B* can be converted to a D* if an unambiguous 
conversion exists and B is not a virtual base.

ARM pg 67 states that any conversion not mentioned on pgs 67 to 71 is
an error.

ARM pg 68 states that if you haven't yet defined a class, you can use
it in a pointer cast, in which case "no assumptions are made about class
lattices."  I leave it to the reader how a compiler performs such a cast 
without making any assumptions -- perhaps a random number is to be generated 
for the results of such a cast?

ARM pgs 67-71 do not otherwise cover possible explicit conversions from
a void* to a pointer to an object of some class.  Therefor, I claim
either:

1) Casts from a void* to a pointer to an object of some class already 
defined are in error.

or

2) The definitions given by ARM pgs 67-71 are in error.

and either:

3) No assumptions may be made about casts from a void* to a pointer to an
object of a class not yet defined.

or

4) The definitions given by ARM in these regards are in error.

chip@tct.uucp (Chip Salzenberg) (01/09/91)

According to jimad@microsoft.UUCP (Jim ADCOCK):
>I claim either:
>
>1) Casts from a void* to a pointer to an object of some class already 
>defined are in error.
>or
>2) The definitions given by ARM pgs 67-71 are in error.

From the ARM, page 36 [commentary]:
"ANSI C allows an IMPLICIT conversion from a pointer to |void| to a
pointer to another object type (but not to a pointer to function type
...); in C++ a |void*| cannot be assigned to an object of any type
other than |void*| WITHOUT AN EXPLICIT CAST."  (Emphasis mine.)

So if you _do_ use an explicit cast, you may convert a |void*| to an
object of some class already defined.

If the text of the ARM does not explicitly support this point, then it
would seem there is an oversight in the text.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
       "If Usenet exists, then what is its mailing address?"  -- me
             "c/o The Daily Planet, Metropolis."  -- Jeff Daiell

rfg@NCD.COM (Ron Guilmette) (01/13/91)

In article <465@mole-end.UUCP> mat@mole-end.UUCP (Mark A Terribile) writes:
+> >Both qsort() and bsearch() require a pointer to a function which [takes]
+> >two parameters of type pointer-to-void.  ... that the function ... will
+> >almost immediately cast [them to] pointer-to-T is utterly irrelevant.
+ 
...
+Second, the real conflict isn't between  void*  and  XYZ*  and it has nothing
+to do with MI.  NOTHING, NOTHING, NOTHING, NOTHING.  NO THING.  NOT A THING IN
+THIS WORLD.  NOTHING.  The MI problem occurs when you have pointers that are
+stored as  derived*  and interpreted as base* .  That problem *COULD* occur
+here, but it's not an essential part of the problem.  In other words, it's a
+herring that has gone beyond red and is well on its way to scarlet.

As you say, that problem can occur in the context under discussion, and
I think that it is a real problem and not a red herring.

+What is probably DISASTROUS is to use the ellipsis in declaring  qsort() .
+If  qsort() 's argument is of the form   int (*cmpr_fun)( ... )  the function
+may get called (and under some MS-DOS compilers, *WILL* get called) with an
+entirely different calling sequence, such that the function will misunderstand
+its arguments and even its call/return sequence.
...
+In other words, Sun Microsystems has indeed made a horrible mistake.

OK.  We are in agreement about that.  Doug L. ?  Where are you where we need
you?

Will this be fixed in a future release?

+Templates can provide a better solution...

Again, we agree.

+But templates don't answer the problem completely.  What happens if you
+have something like a database engine that has to handle whatever types and
+sizes are thrown at it without recompilation? ...

Perhaps such things are bad candidates for being coded in a quasi-
strongly-typed language like C++.

-- 

// Ron Guilmette  -  C++ Entomologist
// Internet: rfg@ncd.com      uucp: ...uunet!lupine!rfg
// Motto:  If it sticks, force it.  If it breaks, it needed replacing anyway.

rfg@NCD.COM (Ron Guilmette) (01/13/91)

In article <60328@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK) writes:
>ARM pg 69 states that a void* is considered a pointer to object type.

I see no such statements anywhere in section 5.4.

Which ARM are you reading?

Anyway, if it did say that, it would not be entirely accurate.  A void*
type is actually a pointer to an *incomplete* object type.

-- 

// Ron Guilmette  -  C++ Entomologist
// Internet: rfg@ncd.com      uucp: ...uunet!lupine!rfg
// Motto:  If it sticks, force it.  If it breaks, it needed replacing anyway.

jimad@microsoft.UUCP (Jim ADCOCK) (01/15/91)

In article <278A3278.1241@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
|According to jimad@microsoft.UUCP (Jim ADCOCK):
|>I claim either:
|>
|>1) Casts from a void* to a pointer to an object of some class already 
|>defined are in error.
|>or
|>2) The definitions given by ARM pgs 67-71 are in error.
|
|From the ARM, page 36 [commentary]:
|"ANSI C allows an IMPLICIT conversion from a pointer to |void| to a
|pointer to another object type (but not to a pointer to function type
|...); in C++ a |void*| cannot be assigned to an object of any type
|other than |void*| WITHOUT AN EXPLICIT CAST."  (Emphasis mine.)
|
|So if you _do_ use an explicit cast, you may convert a |void*| to an
|object of some class already defined.

I disagree.  1) the text you quote is in the form of a prohibition.  
A prohibition cannot create permissions that do not otherwise 
exist.  2) the text you quote is a commentary, and thus not binding
anyway.

The commentary implies that one _might_ have some rights in some cases
to cast from void using explicit cast -- but does not imply where and
when and with what restrictions those rights might exist.

|If the text of the ARM does not explicitly support this point, then it
|would seem there is an oversight in the text.

Agreed that this is the most probable conclusion -- but, this conclusion
doesn't clarify what the oversight was, nor how the ansification 
[isofication?] committee will fix it.

mat@mole-end.UUCP (Mark A Terribile) (01/21/91)

> +Second, the real conflict isn't between  void*  and  XYZ*  and it has
> +nothing to do with MI. ...  ... it's a herring that has gone beyond red ...
 
> As you say, that problem can occur in the context under discussion, and
> I think that it is a real problem and not a red herring.

A real problem, yes.  To the question at hand?  A distraction, a false trail,
a wrong tree, a red herring.  The question at hand is interesting on its own.

 
> +But templates don't answer the problem completely.  What happens if you
> +have something like a database engine that has to handle whatever types and
> +sizes are thrown at it without recompilation? ...
> 
> Perhaps such things are bad candidates for being coded in a quasi-
> strongly-typed language like C++.

I think that *their datatypes* are poor candidates for incorporation into
the program.  But if you can write code to deal with arbitrary binary data
of abitrary size, and then can take the risk of it being, or not being,
valid object-bits when you look at it as an object, you can certainly code
it.

Really, the only language for which handling arbitrary blocks of data (and
sometimes trusting them to be a valid pointer or object or whatever) is not
a messy case to handle PROPERLY is assembler, and that's because everything
in assembler is messy; no single case is messy on its own.  OK, LISP will
do.  But LISP is an assembler too; an assembler for a hypothetical machine,
'tis true, but an assembler nonetheless.
-- 

 (This man's opinions are his own.)
 From mole-end				Mark Terribile

jimad@microsoft.UUCP (Jim ADCOCK) (01/29/91)

In article <1991Jan21.022720.20707@cbnewsk.att.com> hansen@pegasus.att.com (Tony L. Hansen) writes:
>I'll repeat:  The key to all of this is that whenever calling qsort(), you
>always have to pass it a function which casts from the const void* back to
>the true pointer that was passed in.  And once you do that, you can then cast
>to any base pointer you want!

I don't doubt that this should work on just about any sane compiler out
there today, _but_ I see nothing in ARM that gives one permission to 
explicitly cast back from a void* back to the original object* type.
What C++ does permit is casting back from a pointer to a   
smaller object.  I don't see anywhere ARM defines that the effective size
of a void object, were such allowed to exist, is smaller than any "real" object.
However, most all systems would make a char the smallest object, so one
_should_ be able to cast to/from a char*.  Thus, I claim if one needs
a generic object pointer, one should use a char*, not a void*.  If
you want to represent a generic address, but not imply an object is 
referred to, then feel free to use a void*.  Since section 5.4 does not
explicitly allow casting back from a void*, but most all compilers allow it,
I claim a cast back from void* is an un-constrained error -- IE a programming
error compilers are not required to diagnose.

jimad@microsoft.UUCP (Jim ADCOCK) (02/05/91)

In article <1991Feb3.195156.151@cbnewsk.att.com| hansen@pegasus.att.com (Tony L. Hansen) writes:
|Then it's a good thing that the ANSI C++ committee chose to use both the
|cfront 2.1 reference manual (which is essentially E&S) AND the ANSI C
|standard as their base documents.  Obviously there are a few things that are
|missing from the cfront 2.1 reference manual. If you use both documents,
|you'll find the necessary constraints. In fact, ANSI C (C S3.2.2.3) states
|explicitly:
|
|    A pointer to any incomplete or object type may be converted to a pointer
|    to void and back again; the result shall compare equal to the original
|    pointer.
|
|Another place (C S3.1.2.5) makes the guarantee that a void* and char* must
|have the same respresentation and alignment requirements.  Nothing in E&S
|eliminates these requirements.  E&S does explicitly add (C++ S5.4) that a
|void* is an object pointer.
|
|I think this should calm down anyone's fears that you can't use a void* as
|the "generic pointer". However, the ANSI C++ committee definitely needs to
|combine both sets of constraints into the standards document which eventually
|comes out.
|
|Anyone implementing a C++ compiler cannot just look at E&S. It's the
|combination of that and ANSI C which will be the eventual standard, not just
|one.

Hanson's position is that one can assume that any permissions granted by
ANSI-C not granted by ARM represents an omission from ARM.  I disagree.
The ANSI-C++ committee's stated position is that the ARM is the primary
source, and the ANSI-C document is secondary.  This raises the question
of whether differences between ARM and the ANSI-C document are intentional,
or accidental.  These differences will hopefully, someday, be resolved by
the ANSI-C++ committee.  Until then, someone wanting to write portable
C++ code would do well to only write code to the explicit permissions
contained in ARM.

There are already, well documented, well thought out, differences between
ANSI-C and C++ in regards to void pointers.  I see no reason to assume then,
in this case, that ANSI-C and C++ will be the same.  Again, I encourage
people to consider ARM and the working papers of the ANSI-C++ committee
to be the defacto definition of C++ until we get the final ANSI-C++ document
some years hence.

rmartin@clear.com (Bob Martin) (02/07/91)

From: jimad@microsoft.UUCP (Jim ADCOCK)
>< [... some stuff deleted ...]
>< I see nothing in ARM that gives one permission to
>< explicitly cast back from a void* back to the original object* type.
>

If you refer to ARM section 4.6 there is an peculiar statement which reads:

	"in C++ a void* cannot be assigned to an object of any type other than
	a void* without an explicit cast."

If we remove the double negative it becomes:

	"in C++ a void* can be assigned to an object of any type other than
	a void* only with an explicit cast."

This is roundabout, but I think it refutes Jim's statement.

In article <1991Feb3.195156.151@cbnewsk.att.com> hansen@pegasus.att.com 
(Tony L. Hansen) writes:
[ ... lots of stuff about how ANSI C allows conversion of void* to any
other pointer type...  followed by this quote from ANSI C...]
>
>    A pointer to any incomplete or object type may be converted to a pointer
>    to void and back again; the result shall compare equal to the original
>    pointer.
>

Again in ARM 4.6 Stroustrup and Ellis argue against allowing this in C++
without explicit casts on the basis that it opens up a hole in the type
system.  

-- 
+-Robert C. Martin-----+:RRR:::CCC:M:::::M:| Nobody is responsible for |
| rmartin@clear.com    |:R::R:C::::M:M:M:M:| my words but me.  I want  |
| uunet!clrcom!rmartin |:RRR::C::::M::M::M:| all the credit, and all   |
+----------------------+:R::R::CCC:M:::::M:| the blame.  So there.     |

chip@tct.uucp (Chip Salzenberg) (02/13/91)

According to jimad@microsoft.UUCP (Jim ADCOCK):
>There are already, well documented, well thought out, differences between
>ANSI-C and C++ in regards to void pointers.  I see no reason to assume then,
>in this case, that ANSI-C and C++ will be the same.

But can anyone imagine a reasonable implementation on which the
conversion from |T*| to |void*| cannot be reversed?
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
 "I want to mention that my opinions whether real or not are MY opinions."
             -- the inevitable William "Billy" Steinmetz

jimad@microsoft.UUCP (Jim ADCOCK) (02/15/91)

In article <1991Feb6.222244.23132@clear.com> rmartin@clear.com (Bob Martin) writes:
|From: jimad@microsoft.UUCP (Jim ADCOCK)
|>< [... some stuff deleted ...]
|>< I see nothing in ARM that gives one permission to
|>< explicitly cast back from a void* back to the original object* type.
|>
|
|If you refer to ARM section 4.6 there is an peculiar statement which reads:
|
|	"in C++ a void* cannot be assigned to an object of any type other than
|	a void* without an explicit cast."
|
|If we remove the double negative it becomes:
|
|	"in C++ a void* can be assigned to an object of any type other than
|	a void* only with an explicit cast."
|
|This is roundabout, but I think it refutes Jim's statement.

I disagree.  Usually language references and standards consist of
explicit permissions and denials.  Only by taking all the explicit
permissions and denials in a language reference or standard can one
make sense of the authors intent.  When you invert the double negative,
you are attempting to turn an explicit denial into an implicit permission.
In this, I disagree.  Explicit denials cannot imply permissions -- they
always only restrict what a programmer is allowed to do.  If one could
imply implicit permissions from denial statements, you'd find the 
manual highly contradictory!

The original statement says that you cannot convert a pointer from a 
void* without an explicit cast -- it says nothing about where or when
such explicit casts are allowed.

Section 5.4 does explain what casts are permitted.  It also makes an
extrodinarily strong statement that any type conversion not mentioned
in that section and not explicitly defined by the user is an error!

Now the second paragraph on page 68 says that a pointer to one object
type may be explicitly converted to a pointer to another object type
[subject to other restrictions in section 5.4]  It also explicitly states
that a void* is a pointer to object type.  It also says that a pointer 
can be converted to an object of the same or smaller size and back again
without change.  Now a char* represents a pointer to the smallest object,
so a char* would seem to be preferred for such back casting.  Where are 
the "void" size and alignments specified?

The question that remains in my mind is exactly what is to be interpreted
from the "[subject to other restrictions in section 5.4]" statement --
given that it is already stated that anything not explicitly permitted
in 5.4 is denied, and given therefore that there *aren't* other restrictions
in section 5.4 -- since everything else is explicitly denied, the wording
of section 5.4 only consists of permissions.  What then, is the meaning of the
"[subject to other restrictions ....]" statement  ????

The bottom line is eventually one has to admit that the ARM is indeed
a reference manual, and not a standards specification.  Hopefully, all
these little details will be nailed down by the ansification effort.

hansen@pegasus.att.com (Tony L. Hansen) (02/16/91)

< From: jimad@microsoft.UUCP (Jim ADCOCK), Message-ID: <70450@microsoft.UUCP>
< Hansen's position is that one can assume that any permissions granted
< by ANSI-C not granted by ARM represents an omission from ARM.  I
< disagree.  The ANSI-C++ committee's stated position is that the ARM is
< the primary source, and the ANSI-C document is secondary.  This raises
< the question of whether differences between ARM and the ANSI-C document
< are intentional, or accidental.  These differences will hopefully,
< someday, be resolved by the ANSI-C++ committee.  Until then, someone
< wanting to write portable C++ code would do well to only write code to
< the explicit permissions contained in ARM.
<
< There are already, well documented, well thought out, differences
< between ANSI-C and C++ in regards to void pointers.  I see no reason to
< assume then, in this case, that ANSI-C and C++ will be the same.
< Again, I encourage people to consider ARM and the working papers of the
< ANSI-C++ committee to be the defacto definition of C++ until we get the
< final ANSI-C++ document some years hence.

< From: jimad@microsoft.UUCP (Jim ADCOCK), Message-ID: <70670@microsoft.UUCP>
< The bottom line is eventually one has to admit that the ARM is indeed a
< reference manual, and not a standards specification.  Hopefully, all
< these little details will be nailed down by the ansification effort.

There are definitely places where the ARM (and the cfront 2.1 reference
manual) is obviously not a a full standards specification. My understanding
of the ANSI C++ stand on ANSI C is that where the cfront 2.1 reference manual
(not the ARM) doesn't cover something, the ANSI C rule shines through. You
and I disagree as to whether the ANSI C rule should shine through in this
case. We definitely agree that the cfront 2.1 reference manual is
insufficient and that the final ANSI C++ standard must resolve the issue. All
current C++ compilers and common practice follow the ANSI C rule in this one
area. I hope that the next ANSI C++ draft (due out soon) will address this
issue and resolve this question.

(The following may be stating the obvious to some of you.) Note, as time goes
on, the ARM will inevitably become out of date. Reprints include corrections,
but pretty soon, the only true document which can be considered as any
statement of the current ANSI C++ position will be the current draft
standard.  Whether the ARM will continue to try to reflect the "current
draft" is a subject that Peggy and Bjarne will have to address. If it IS kept
up, then we'll be in the position of continually asking "which printing of
the ARM are you looking at?" If not, then we'll be in the same position that
we were in during ANSI C's development: people will continually have problems
gaining access to the current draft.

					Tony Hansen
				att!pegasus!hansen, attmail!tony
				    hansen@pegasus.att.com
				      tony@attmail.com

jimad@microsoft.UUCP (Jim ADCOCK) (02/20/91)

In article <27B94945.5977@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
|According to jimad@microsoft.UUCP (Jim ADCOCK):
|>There are already, well documented, well thought out, differences between
|>ANSI-C and C++ in regards to void pointers.  I see no reason to assume then,
|>in this case, that ANSI-C and C++ will be the same.
|
|But can anyone imagine a reasonable implementation on which the
|conversion from |T*| to |void*| cannot be reversed?

Yes, I can imagine such implementations.  If C++ is implemented on top
of "object-oriented" architectures, it may well be that pointers to 
primitive types have a totally different representation that pointers 
to "Object" types.  A void* might then be considered roughly equivalent
to a char*, or alternativelu a void* might be implemented simply as a mapping 
from a given pointer to a unique value -- it might be almost impossible to 
invert the mapping.

Systems using a Modula3-like approach might have T* be a traced pointer,
converting to a void* makes it untraced, converting back to a traced
pointer doesn't necessarily leave you in a happy state again!  Better to
not allow the inverse conversion then, lest the programmer falsely believe
the program is going to work....

So, I'd like to see the pointer conversions allowed on primitive types
be kept separate from the pointer conversions allowed on "Object" types.
This would not mean that C++ compilers on Un*x-like machines could not
support conversion to/from void* willy-nilly.  It would just mean that
portable code could not rely on such conversions if it is to run someday
on a non-Un*x-like machine.

Again, Rekusive is one example of a non-Un*x-like machine, that is designed
with the intent of supporting "Object Oriented" programming.  If people
read about it, perhaps we can get away from the Un*x-like prejudice that the
*only* way one can implement an architecture is as a very long array of
bytes.  The Cray and 80x86 machines offer two other "successful"
counter-examples.  Again, I claim that the very-long-array-of-bytes model
of CPU architectures doesn't have much to do with "object oriented programming."
Let's try to keep from being unnecessarily tied to overly-restrictive
CPU models.

chip@tct.uucp (Chip Salzenberg) (02/26/91)

According to jimad@microsoft.UUCP (Jim ADCOCK):
>If C++ is implemented on top of "object-oriented" architectures,
>it may well be that pointers to primitive types have a totally
>different representation that pointers to "Object" types.

This kind of distinction, while conceptually possible, would seem to
rule out a user's writing "operator new" and "operator delete", since
they generate or use pointers of type |void*| which subsequently will
be or previously have been used as "Object" addresses.

>A void* might then be considered roughly equivalent to a char* ...

Exactly equivalent, if compatibility with C libraries matters.

>...perhaps we can get away from the Un*x-like prejudice that the
>*only* way one can implement an architecture is as a very long array
>of bytes.

Please note that I am a long-time user of the '286 in large model, and
I am quite aware that a successful implementation of C or C++ does not
require a "very long array of bytes."  That issue, however, has almost
zero relevance to the convertability of a |void*|.

jimad@microsoft.UUCP (Jim ADCOCK) (02/28/91)

In article <27C95508.15D0@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
|According to jimad@microsoft.UUCP (Jim ADCOCK):
|>If C++ is implemented on top of "object-oriented" architectures,
|>it may well be that pointers to primitive types have a totally
|>different representation that pointers to "Object" types.
|
|This kind of distinction, while conceptually possible, would seem to
|rule out a user's writing "operator new" and "operator delete", since
|they generate or use pointers of type |void*| which subsequently will
|be or previously have been used as "Object" addresses.

No.  A C++ compiler in such a situation can accept definitions of
operator new and operator delete with the void*, but in generating
actual code return the true pointer type -- in actual application,
new returns a pointer to an actual object type, not a void*.  In the 
worse case schenerio, a C++ compiler might actually have to effectively
turn operator new into a template family of functions returning the
correct pointer types.  Or more likely, the compiler might have to generate two
different kinds of operator new -- one that returns a primitive pointer,
and one that returns an "Object" pointer.  But this is all doable, and
in no way violates the language.

|>A void* might then be considered roughly equivalent to a char* ...
|
|Exactly equivalent, if compatibility with C libraries matters.

Since when did C libraries use operator new and delete ???

|>...perhaps we can get away from the Un*x-like prejudice that the
|>*only* way one can implement an architecture is as a very long array
|>of bytes.
|
|Please note that I am a long-time user of the '286 in large model, and
|I am quite aware that a successful implementation of C or C++ does not
|require a "very long array of bytes."  That issue, however, has almost
|zero relevance to the convertability of a |void*|.

I disagree.  Somewhat similar to large model, an "object oriented"
CPU might go with a object#:member-inside-object# pointer representation.
Which is somewhat like a segment:offset representation.  A void* might
represent a conversion of such a object:member pair into an underlying
"hardwired" address of such a machine's random access memory.  Inverting
such an address back into object:member pair might be most prohibitive.

So, I believe people who support free-wheeling pointer-type hacking have
a Un*x-like processor model in mind.  They can't understand why some pointer
-type hacks *ought* to be prohibited -- since such pointer-type hacking is
*obviously* mearly conceptual.  *Obviously* it doesn't change any "bits"
in the pointer rep.

Its not *obvious* to me at all that we aren't signing C++ death warrent some
years down the line if "object-oriented" CPUs start catching on [like they
*ought* to :-]

Thus, I'd like to see C++ be pretty conservative about what kinds of 
type-hacking conforming compilers are *required* to support.  This in 
no way keeps un*x-like machines from supporting a greater variety of
type-hacking, if they so desire.