[comp.lang.c++] char* vs void*

ark@alice.att.com (Andrew Koenig) (12/26/90)

In article <277640DD.4A70@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:

> This statement is false.  It is entirely possible to do so, thanks to
> ANSI's guarantee that pointer-to-char and pointer-to-void will always
> have identical representations.

How's that again?  Why do you think that char* and void* have to have
identical representations?
-- 
				--Andrew Koenig
				  ark@europa.att.com

chip@tct.uucp (Chip Salzenberg) (12/26/90)

According to ark@alice.UUCP ():
>In article <277640DD.4A70@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
>> It is entirely possible to [write a portable implementation of qsort()],
>> thanks to ANSI's guarantee that pointer-to-char and pointer-to-void will
>> always have identical representations.
>
>How's that again?  Why do you think that char* and void* have to have
>identical representations?

The ANSI C standard requires that |char *| and |void *| have identical
representations.  This requirement bows to existing practice.  After
all, before ANSI came along, the return type of malloc() was |char *|.
So pre-ANSI C's generic pointer type was, of necessity, |char *|.

The section of E&S discussing differences between C and C++ makes no
mention of any change to this requirement; thus it is part of C++ too.

This equivalence is important because pointers of type |void *| do not
support the pointer arithmetic required to access and rearrange the
elements of the array to be sorted.  Such pointer arithmetic requires
the use of |char *| values.  The equivalence guarantee means that a
portable qsort() may convert pointers from |void *| to |char *| and
back without fear of loss of information.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
"Please don't send me any more of yer scandalous email, Mr. Salzenberg..."
		-- Bruce Becker

dbrooks@penge.osf.org (David Brooks) (12/28/90)

In article <2778A795.6E71@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
>
>The ANSI C standard requires that |char *| and |void *| have identical
>representations.  This requirement bows to existing practice.  After
>all, before ANSI came along, the return type of malloc() was |char *|.
>So pre-ANSI C's generic pointer type was, of necessity, |char *|.

How's that again, again?

I don't understand.  In the case of void*, what is being represented?
I mean, what semantic commonality is there between the object popinted
at by a char* and the unobject pointed at by a void* that allows you
to make this statement?

Do you mean "can both be converted to the same integer type and back
without loss of information?"  Do you mean "can be converted to each
other and back without loss of information?"

Damn: my question-mark key just wore out.
-- 
David Brooks				dbrooks@osf.org
Systems Engineering, OSF		uunet!osf.org!dbrooks
In Memoriam: Chris Naughton, aged 16, killed by a drunk driver Dec 22, 1990

wirzeniu@cs.Helsinki.FI (Lars Wirzenius) (12/29/90)

In article <17590@paperboy.OSF.ORG> dbrooks@osf.org (David Brooks) writes:
>In article <2778A795.6E71@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
>>
>>The ANSI C standard requires that |char *| and |void *| have identical
>>representations.  This requirement bows to existing practice.  After
>> [deleted]
>
>How's that again, again?
>
>I don't understand.  In the case of void*, what is being represented?

Pointers of type |char *| and |void *| are capable of pointing to *any*
data object and that the way the compiler stores those types of pointers
(their representation, that is) is the same.  Given two pointer
variables,

	void *void_ptr;
	char *char_ptr;

and some data object, say data_object, and you assign the address of
that object to both pointers, their bit patterns will be exactly the
same: 

	void_ptr = &data_object;
	char_ptr = (char *) &data_object;
	printf("%d", memcmp(&void_ptr, &char_ptr, sizeof(void *)));

(the sizes of the two pointers are the same).

The difference between |char *| and |void *| is that you can mix 
|void *| with pointers to other types without having to use casts. With
|char *|, and any other pointer type except |void *|, you need the casts
to make sure everything works on all machines.

Lars Wirzenius    wirzeniu@cs.helsinki.fi    wirzenius@cc.helsinki.fi

mcdaniel@adi.com (Tim McDaniel) (12/29/90)

In the ANSI C standard, the concept of "representation" is not related
to semantics.  "Representation" refers to bit-level layout.

ANSI C standard, section 3.1.2.5, "Types" (page 25, lines 25-28 in
the ANSI edition):

   A pointer to void shall have the same representation and alignment
   requirements as a pointer to a character type.  Similarly, pointers
   to qualified or unqualified versions of compatable types shall have
   the same representation and alignment requirements.{16}  Pointers
   to other types need not have the same representation or alignment
   requirements.

Footnote {16} (page 24; the introduction says that "the footnotes ...
are not part of the standard"):

   16. The same representation and alignment requirements are meant
       to imply interchangeability as arguments to functions, return
       values from functions, and members of unions.

(page 24, lines 15-16):

   The three types char, signed char, and unsigned char are
   collectively called the _character_types_.

The "Types" section also requires identical representation and
alignment for certain other types.  "Compatable" is defined elsewhere
in the standard.

--
Tim McDaniel                 Applied Dynamics Int'l.; Ann Arbor, Michigan, USA
Work phone: +1 313 973 1300                        Home phone: +1 313 677 4386
Internet: mcdaniel@adi.com                UUCP: {uunet,sharkey}!amara!mcdaniel

chip@tct.uucp (Chip Salzenberg) (12/29/90)

According to dbrooks@osf.org (David Brooks):
>In article <2778A795.6E71@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
>>The ANSI C standard requires that |char *| and |void *| have identical
>>representations.
>
>I don't understand.  In the case of void*, what is being represented?

The address of an object.  ("Object" explicitly excludes functions.)

>I mean, what semantic commonality is there between the object pointed
>at by a char* and the unobject pointed at by a void* that allows you
>to make this statement?

Pointers of types |char *| and |void *| must both be able to hold the
address of any object.  The reason why this is so differs for the two
types.  For |char *| it's existing practice, and for |void *| it's the
decree of ANSI.

Existing practice for |char *| requires it to be able to hold the
address of any object, since any object type may be allocated with
malloc(), and pre-ANSI malloc() returned a |char *|.  Were ANSI C a
brand new language, |char *| might not have had that property, but
given existing practice, it does.  We can't afford to break all those
dusty decks.

When ANSI invented |void *|, they made it capable of holding any
pointer.  Since |void *| can hold any pointer, and |char *| can hold
any pointer, there would be no point (:-)) in an implementation giving
them different representations.  So ANSI went that final step and
_required_ that their representations be identical.

>Do you mean "can be converted to each other and back without loss of
>information?"

Yes, and that's the important point for this discussion.  A portable
qsort() can cast its |void *| array parameter to |char *| to do
pointer arithmetic, then cast the results of the pointer arithmetic
back to |void *| to generate the comparison function's arguments.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
"Please don't send me any more of yer scandalous email, Mr. Salzenberg..."
		-- Bruce Becker

henry@zoo.toronto.edu (Henry Spencer) (12/30/90)

In article <17590@paperboy.OSF.ORG> dbrooks@osf.org (David Brooks) writes:
>>The ANSI C standard requires that |char *| and |void *| have identical
>>representations...
>
>Do you mean "can both be converted to the same integer type and back
>without loss of information?"  Do you mean "can be converted to each
>other and back without loss of information?"

He means "if you take any `char *' and convert it to a `void *', or vice
versa, and then inspect the bits of both in memory, they must be identical".
The intent is specifically that code expecting one can be handed the other,
across an interface without complete type-checking (e.g. a function call
without a prototype in scope), and everything will still work fine provided
that other constraints of the code are satisfied.
-- 
"The average pointer, statistically,    |Henry Spencer at U of Toronto Zoology
points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu   utzoo!henry

bobj@glinj.gli.com (Robert Jacobs) (01/03/91)

>Pointers of types |char *| and |void *| must both be able to hold the
>address of any object.  The reason why this is so differs for the two
>types.  For |char *| it's existing practice, and for |void *| it's the
>decree of ANSI.

>When ANSI invented |void *|, they made it capable of holding any
>pointer.  Since |void *| can hold any pointer, and |char *| can hold
>any pointer, there would be no point (:-)) in an implementation giving
>them different representations.  So ANSI went that final step and
>_required_ that their representations be identical.

>>Do you mean "can be converted to each other and back without loss of
>>information?"

>Yes, and that's the important point for this discussion.  A portable
>qsort() can cast its |void *| array parameter to |char *| to do
>pointer arithmetic, then cast the results of the pointer arithmetic
>back to |void *| to generate the comparison function's arguments.

What I can't understand is that ANSI decided that one cannot do pointer
arithmitic on a void*, like a char*. Because of this, new work is forced
to use char* as the generic pointer that can do pointer arithmitic.
This continues the existing practice that char* is special. Void*
should be allowed to do pointer arithmitic so that char* does not
have to be used for new work.

But I guess my opinion is too late to be in any version of ANSI C
or C++. Too bad.

Robert Jacobs                                   | 
General Logistics International Inc.            |
bobj@gli.com                                    |

donald@cae780.csi.com (Donald Maffly) (01/04/91)

In article <1991Jan2.171429.11566@glinj.gli.com> bobj@glinj.gli.com (Robert Jacobs) writes:
>
>>Pointers of types |char *| and |void *| must both be able to hold the
>>address of any object.  The reason why this is so differs for the two
>>types.  For |char *| it's existing practice, and for |void *| it's the
>>decree of ANSI.
>>
>> [ ... ]
>>
>
>What I can't understand is that ANSI decided that one cannot do pointer
>arithmitic on a void*, like a char*. Because of this, new work is forced
>to use char* as the generic pointer that can do pointer arithmitic.
>This continues the existing practice that char* is special. Void*
>should be allowed to do pointer arithmitic so that char* does not
>have to be used for new work.


Why did ANSI decide that one cannot do pointer arithmetic on a void* 
pointer, you ask.

Well, BY DEFINITION, void* means that 
we are dealing with a pointer to an object of unknown type and size.  
In pointer arithmetic, how could a C compiler possibly know how much 
to decrement or increment a pointer by if it didn't know the size of 
the object it was pointing to? 
(rhetorical).

For example, if we had a dynamically allocated array of objects of
unknown size, and "ptr" of type (void*) points to some object in the array,
what sense would "ptr++" make?  Surely, the intent is to reset the
pointer to point to the next object in the array.  But without the
contextual knowledge of the size of each object, the expression "ptr++" 
is not well defined.

I haven't been following this char* vs. void* discussion very closely,
so excuse me if I have missed something crucial and I am way off base.

Donald Maffly

barmar@think.com (Barry Margolin) (01/04/91)

In article <11223@cae780.csi.com> donald@cae780.UUCP (Donald Maffly) writes:
>Why did ANSI decide that one cannot do pointer arithmetic on a void* 
>pointer, you ask.
>
>Well, BY DEFINITION, void* means that 
>we are dealing with a pointer to an object of unknown type and size.  
>In pointer arithmetic, how could a C compiler possibly know how much 
>to decrement or increment a pointer by if it didn't know the size of 
>the object it was pointing to? 

Well, they could have defined it in terms of the units that sizeof measures
in.  For instance, if you have

	typedef ... foo_struct;
	foo_struct[N] foo;
	void *vp;
	foo_struct *fp;

	fp = &foo_struct[0];
	vp = (void *) fp;
	vp += sizeof foo_struct;
	fp = (foo_struct *) vp;

At this point, fp == &foo[1] .

This currently works if you replace "void" with "char", because sizeof char
is required to be 1.  They could have specified that sizeof void must
also be 1, and then all the uses of "char" as the generic basic memory
object could be abandoned.  This would pave the way for future revisions of
the C standard to allow sizeof char > 1 (e.g. to get rid of the char/w_char
distinction).
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

chip@tct.uucp (Chip Salzenberg) (01/04/91)

According to bobj@glinj.gli.com (Robert Jacobs):
>What I can't understand is that ANSI decided that one cannot do pointer
>arithmitic on a void*, like a char*. Because of this, new work is forced
>to use char* as the generic pointer that can do pointer arithmitic.
>This continues the existing practice that char* is special.

Perhaps ANSI figured that since |char *| already serves as the generic
pointer that supports pointer arithmetic, making |void *| do the same
would render |void *| redundant and therefore useless.

As the language stands now, your choice of |char *| or |void *|
communicates to the compiler and to human readers your intentions
as to pointer arithmetic (|char *| if yes, |void *| if no).
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>
"Please don't send me any more of yer scandalous email, Mr. Salzenberg..."
		-- Bruce Becker