[net.lang.c] uses of void

tps@sdchem.UUCP (Tom Stockfisch) (08/11/86)

I want to be able to use "void *" in a generic sorting function and I tried
something like

	sort( (void *)&foo[0], (void *)&foo[20], sizeof(foo), compare_foos );
	.
	.
	.
	void
	sort( beg, end, size, compare )
		void	*beg, *end;
		int	size;
		int	(*compare)();
	{
		int	nelements =	(end - beg) / size;

		for ( p = beg;  p < end;  p += size )
			...;
		...
	}

My compiler (4.2BSD running on a Celerity) when into an infinite loop trying
to parse

		(end - beg)

I (naively?) thought that pointer arithmetic would work with "void *", and
that it would work in the same abstract units as "sizeof()", so that all
previous generic "char *" kludges could be replaced by "void *".

What are the (proposed and current) legal operations for "void *" besides
assignment, casting, and passing as a parameter?

-- Tom Stockfisch, UCSD Chemistry
-- 

-- Tom Stockfisch, UCSD Chemistry

guy@sun.UUCP (08/11/86)

> I want to be able to use "void *" in a generic sorting function and I tried
> something like
> 	.
> 	.
> 	.
> My compiler (4.2BSD running on a Celerity) when into an infinite loop trying
> to parse
> 
> 		(end - beg)

Unfortunately, PCC doesn't really implement "void *".  Other compilers may
not do so either (I don't know if Celerity's compiler is PCC-based or not).
PCC doesn't catch it as illegal, but it doesn't handle it properly either
(the internal coding for the type "void *" is also used as a special
internal indication; since, at the time, the language didn't say "void *"
was OK, *obviously* nobody was going to use it so *obviously* there was no
need to make sure nobody did, right? :-().

> I (naively?) thought that pointer arithmetic would work with "void *", and
> that it would work in the same abstract units as "sizeof()", so that all
> previous generic "char *" kludges could be replaced by "void *".
> 
> What are the (proposed and current) legal operations for "void *" besides
> assignment, casting, and passing as a parameter?

There are no currently legal operations involving "void *" since K&R doesn't
describe "void *" and since PCC, at least, doesn't really implement it.
(It's not impossible to implelement in PCC, but you do have to shuffle the
type coding around to allow another bit for the base type or find some other
value to use for what the encoding of "void *" is used for now.)  The ANSI C
draft of August 11, 1985 doesn't directly say anything about subtracting two
pointers of type "void *".  What it *does* say is

	If two pointers that do not point to members of the same
	array object are subtracted, the behavior is undefined.

and since you can't define an array of "void", no two pointers to "void" can
point to members of the same array object (note: "X points to Y" is
different from "X contains the address of Y", since if X points to Y X must
have the type "pointer to type of Y") so the meaning of the difference
between two pointers to "void" is undefined.

If you use

	int	nelements = ((long)end - (long)beg) / size;

it might work, although it's not guaranteed by the language (things might
get messy if you try raw arithmetic on an implementation where pointers
contain data other than a machine address, like type fields, ring numbers,
etc., etc.).  Note that "qsort" takes a count of elements, rather than a
pointer to the last element; this interface is easier to describe without
stepping outside the bounds of C.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

karl@haddock (08/16/86)

sdchem!tps writes:
>sort( (void *)&foo[0], (void *)&foo[20], sizeof(foo), compare_foos );

First, I think you want sizeof(foo_type) or sizeof(foo[0]).

>void *beg, *end;
>int   nelements = (end - beg) / size;

>I (naively?) thought that pointer arithmetic would work with "void *", and
>that it would work in the same abstract units as "sizeof()", so that all
>previous generic "char *" kludges could be replaced by "void *".

The "abstract units" of sizeof() are "char" by definition.  (Too many users
have been assuming sizeof(char)==1, so it's official in C++ and ANSI C.  I
wish it had been measured in BITS in the first place!)  Since "void *" means
"pointer to object of unknown type", you can't do pointer arithmetic with it.

What you really want is "((char *)end - (char *)beg) / size".  (Don't use int
or long!)

Karl W. Z. Heuer (ihnp4!ima!haddock!karl), The Walking Lint

gwyn@brl-smoke.ARPA (Doug Gwyn ) (08/18/86)

In article <86900012@haddock> karl@haddock writes:
>The "abstract units" of sizeof() are "char" by definition.  (Too many users
>have been assuming sizeof(char)==1, so it's official in C++ and ANSI C.

Don't be too sure that this won't change by the time of the final X3J11
standard.  The internationalization issue has yet to be resolved, and
my sense is that the cleanest way to do that will to be to distinguish
between the basic data chunk size and a datum that can hold a character
representation.  I'm sure there will be a lot of sentiment for using
chars in both r^oles, even though that would force an incredible
amount of kludgery in many places in the practical use of the language..

guy@sun.uucp (Guy Harris) (08/19/86)

> >The "abstract units" of sizeof() are "char" by definition.  (Too many users
> >have been assuming sizeof(char)==1, so it's official in C++ and ANSI C.
> 
> Don't be too sure that this won't change by the time of the final X3J11
> standard.  The internationalization issue has yet to be resolved, and
> my sense is that the cleanest way to do that will to be to distinguish
> between the basic data chunk size and a datum that can hold a character
> representation.

Not clear.  Widening "char" to support Kanji would be disruptive and cut
performance in applications that need not deal with individual Kanji symbols
as such (rather than treating a 16-bit Kanji symbol as a pair of 8-bit
"characters", as, for instance, the C compiler could in most instances).
The best proposals I've seen are the ones from AT&T-IS that propose a
separate "long char" data type for, well, long characters.  If adding a new
data type is distasteful, one could "typedef" "short" into such a data type,
although you might have to engage in some jiggery-pokery to initialize a
"string" of "short"s from a Kanji string.

You can still distinguish between the unit of storage and the datum used to
hold character representations of, say, Kanji; however, you don't have to do
this by saying "char" will hold those representations, but can provide a new
data type for this.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

sam@think.COM (Sam Kendall) (08/19/86)

In article <3121@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB)) writes:
>In article <86900012@haddock> karl@haddock writes:
>>The "abstract units" of sizeof() are "char" by definition.  (Too many users
>>have been assuming sizeof(char)==1, so it's official in C++ and ANSI C.
>
>Don't be too sure that this won't change by the time of the final X3J11
>standard.

No way.  Too much code would break if sizeof (char) wasn't 1.  A
nearly infinite set of these would have to be fixed:

	malloc(strlen(s)+1)

Also plenty of uses of sizeof to count chars in a char array, and
other things like this.

---
Sam Kendall			sam@godot.think.com
Thinking Machines Corp.		{harvard, ihnp4, seismo}!think!sam

gwyn@BRL.ARPA (VLD/VMB) (08/21/86)

The idea is not to insist that all C implementations support 16-bit
character codes, but to permit that as a deliberate implementation
decision.  I would hope that the systems I use continue to have
8-bit chars, although I do have to say that quite often I need
to know more about a "letter" than just its class name ("A").  For
example, these days I work a lot with typesetting, bitmap graphics,
and so forth, and sometimes the size and font style of a character
are about as important as its class name.

Of course, it is possible to come up with any number of kludges to
cope with non-traditional DP ideas of characters, and many people
have done so.  The desire, I think, in specifying the C language is
to not have such kludges intrude into implementations where they
are neither wanted nor needed.  As an example of the problems, the
AT&T 16-bit proposal requires strcpy() to handle "escapes", whereas
what one really ought to insist on is that strcpy() handles chars as
in the following simple semantic definition of its function:

char *strcpy( char *dest, char *src )
{
	char	*retval = dest;

	while ( (*dest++ = *src++) != '\0' )
		;

	return retval;
}

If one adopts a kludge approach, then either strcpy() can no longer
be used to copy a string of text characters, or strcpy() no longer
has such a simple implementation.  If "char" is able to hold 16 bits,
then the semantics of strcpy() can continue to be the simple model
shown above, and strcpy() can copy strings of text characters (this
assumes that one would always use 16 bits per character, even if the
7-bit ASCII code subset could be used for some of them).

The worst example I have heard so far about international character
set kludgery is the assertion that strcmp() should be useful for
sorting native-language text into "dictionary order".  Anyone who
knows much about dictionaries (particularly oriental ones) should
appreciate how na"ive that approach is.

One vendor would really like to see a requirement to support in
effect multiple 8-bit translation tables, because that vendor has
already taken that particular approach and would have a competitive
edge if it were made mandatory.  I've found that most major UNIX
system vendors are currently grappling with internationalization
issues, and all have taken different approaches.  So far they all
smack of kludgery to me, although some are better than others.

What concerns me is, if some simple, clean sufficient solution to the
extended code set issue is not made part of the official C language
specification, demands from ISO will lead to an officially-required
kludge approach that will adversely impact even simple ASCII-based
implementations.