[comp.std.c] switch

bvs@light.uucp (Bakul Shah) (07/13/88)

dpANS says the expression in ``_s_w_i_t_c_h (expression)''
must be an integer valued expression.  Any chance of
getting this changed ``must be an integer or ptr valued
expression''?  The current restriction forces one to use
the messier and longer sequence of ``if..then..else if
...'' for pointers.  That is, instead of

	switch (ptr)
	{
	case PTR1:
		foo; bar1; break;
	case PTR2: case PTR3:
		foo; bar2; break;
	default:
		foo; barx; break;
	}

the following must be used:

	if (ptr == PTR1) {
		foo; bar1;
	} else if (ptr == PTR2 || ptr == PTR3) {
		foo; bar2;
	} else {
		foo; barx;
	}

The latter can get quite unwieldy when a ptr has to be
compared with a lot of constants and each case is non-
simple.  Worse, if you wanted to fallthrough one case to
another after doing some work, you'd have to use a
spaghetti of gotos -- hopefully, no one does this.

Note that the switch stmt can be used by casting the ptr
to a long or an int, but I don't know if this is safe on
all architectures -- (casts) should be avoided where
possible.

The change in specification should be to state that a
_s_w_i_t_c_h expression must be either an int or a ptr, and
the _c_a_s_e const expression must be compatible with it.

Except for the stupid default fallthrough, a switch stmt
can be thought of as equivalent to a sequence of
``if..then..else if ... '', where each if-expr is an
equality comparison and anything that can be compared
for equality should be allowed as a switch expressio.

I noticed this when a dpANS compiler we use recently
tightened its type system.  As a resule some switch code
to deal with SIG_DFL, SIG_IGN etc. wouldn't compile
anymore (and I hate to change other people's sensible
and clear code).

Could it be that even though K&R1 forbade mixing ints
with ptrs and specified that only integer expression be
used in a switch expression, all old compilers continued
to treat the switch expression `sensibly' AND _somehow_
the committee overlooked this?

-- Bakul Shah <..!{ucbvax,sun,uunet}!amdcad!light!bvs>

sullivan@vsi.UUCP (Michael T Sullivan) (07/13/88)

In article <1988Jul12.105547.13268@light.uucp>, bvs@light.uucp (Bakul Shah) writes:
> dpANS says the expression in ``_s_w_i_t_c_h (expression)''
> must be an integer valued expression.  Any chance of
> getting this changed ``must be an integer or ptr valued
> expression''?  The current restriction forces one to use
> the messier and longer sequence of ``if..then..else if
> ...'' for pointers...
> 
> Note that the switch stmt can be used by casting the ptr
> to a long or an int, but I don't know if this is safe on
> all architectures -- (casts) should be avoided where
> possible.

I don't know about casting should be avoided.  I'm looking at shmop(2) manual
page for our 3B2 (shared memory operation) and it says:

	"Shmat {which is a char* function -mts} returns the data segment
	start address of the attached shared memory segment...

	Otherwise, a value of -1 is returned and errno is set to indicate
	the error."

Somebody at AT&T must think casting a pointer to an int (or a long) isn't
such a bad idea, unless I'm missing something (it has been known to happen).

-- 
Michael Sullivan			{uunet|attmail}!vsi!sullivan
V-Systems, Inc.  Santa Ana, CA		sullivan@vsi.com
ons, workstations, workstations, workstations, workstations, workstations, work

haahr@phoenix.Princeton.EDU (Paul Gluckauf Haahr) (07/13/88)

in article <1988Jul12.105547.13268@light.uucp>,
	bvs@light.UUCP (Bakul Shah) writes:
> dpANS says the expression in ``_s_w_i_t_c_h (expression)''
> must be an integer valued expression.  Any chance of
> getting this changed ``must be an integer or ptr valued
> expression''?

the problem with pointer valued constants is that they are generally
not constant until link time.  at compile time, most pointer constants
are of the form ``&extern_identifier[constant_expression]'' or some
such, which are relocatable symbols.  therefore, the compiler would not
know about the value of a case expression.  one could in theory have
the compiler generate the cascading if...else if...else... chain from
the switch code, but there would be no way for the compiler to recognize
duplicate pointer values in the switch table until link time.

the only other form of portable constant pointer expression is the
null pointer, which the compiler will know the value of (0).  so,
portably, i believe that
	switch (pointer-expression) {
	case 0:		... ; break;
	default:	... ; break;
	}
but that's just an if statement.

unix uses, in the signal code, ``(int (*)()) small-integer constant'',
but that's not really all that clean.  (is this well-defined and unique
in all ansi compilers?  i'm not sure that it is for all architectures.)
to handle this sort of code in a switch statement, cast everything to
int (or, more accurately, a wide enough integral type).  this is one of
very few standard cases i can think of with more than one compile time
integer constant.

one could of course do the hard work of generating the switch table
at link time rather than at compile time, but that would be a lot of
work, and require compiler-writers to write linkers also.  one more
argument to start doing code generation at link time.

paul haahr
princeton!haahr or haahr@princeton.edu

Paul_L_Schauble@cup.portal.com (07/14/88)

No, there's no chance of getting the standard changed. Lobby your favorite
compiler writer.

And while we're at it, seems to me as though the expression in
  switch (Type)
should allow any Type for which the == operator is defined.

    Paul

guy@gorodish.Sun.COM (Guy Harris) (07/14/88)

> I don't know about casting should be avoided.  I'm looking at shmop(2) manual
> page for our 3B2 (shared memory operation) and it says:

SHMOP(2), hell, look at BRK(2):

	Upon successful completion, "brk" returns a value of 0 and
	"sbrk" {which is a char * function} returns the old break value.
	Otherwise, a value of -1 is returned...

> Somebody at AT&T must think casting a pointer to an int (or a long) isn't
> such a bad idea, unless I'm missing something (it has been known to happen).

It probably wasn't as bad an idea back when UNIX was first being done in C;
pointers and integers both fit into 16 bits on a PDP-11, so everybody "knew"
this was safe.  It was still a bad idea, though; "sbrk" should have returned a
null pointer instead.

Unfortunately, we're stuck with the sins of the past, such as system calls
returning -1 even when they're returning pointers and special signal handler
values such as "(int (*)())1" (changed to "(void (*)...", which still doesn't
fix the problem in question), so C implementors on UNIX systems are obliged to
ensure that you can cast -1, 1, and maybe some other integral values to
pointers and have it "work right".

All the January 11, 1988 C draft appears to say is

3.3.4 Cast operators

	...

	A pointer may be converted to an integral type.  The size of integer
	required and the result are implementation-defined.  If the space
	provided is not long enough, the behavior is undefined.

	An arbitrary integer may be converted to a pointer.  The result is
	implementation-defined.

I don't see the guarantee from K&R that "The mapping (from integer to pointer)
always carries an integer converted from a pointer back to the same pointer"
anywhere in the places I looked in the draft.

As such, I would avoid all pointer casts except the unavoidable ones (such as
the -1 to "result of 'sbrk' type" casts).

tps@chem.ucsd.edu (Tom Stockfisch) (07/14/88)

In article <1988Jul12.105547.13268@light.uucp> bvs@light.UUCP (Bakul Shah) writes:
>dpANS says the expression in ``_s_w_i_t_c_h (expression)''
>must be an integer valued expression.  Any chance of
>getting this changed ``must be an integer or ptr valued
>expression''?  The current restriction forces one to use
>the messier and longer sequence of ``if..then..else if
>...'' for pointers.  That is, instead of
>
>	switch (ptr)
>	{
>	case PTR1:
>		foo; bar1; break;
>	case PTR2: case PTR3:
>		foo; bar2; break;
>	default:
>		foo; barx; break;
>	}

The problem is not the ptr expression, it's the PTR* constants.
In the original
K&R, this was disallowed.  There are different classes of constants in C --
some are more "constant" than others.  Initialization constants can be
addresses of static or extern variables, but case constants must be 
arithmetic constants, possibly involving non-cast operators.  See Appendix
A, sec 15.  The reason for the constraint is that usually the linker
assigns the actual addresses to variables:  the compiler can't know
their value, and the situation is just as if you tried to do

	int	i;
	switch( expr )
	{
	case i:
	...
	}

>the following must be used:
>
>	if (ptr == PTR1) {
>		foo; bar1;
>	} else if (ptr == PTR2 || ptr == PTR3) {
>		foo; bar2;
>	} else {
>		foo; barx;
>	}

Them's the breaks.

>Note that the switch stmt can be used by casting the ptr
>to a long or an int, but I don't know if this is safe on
>all architectures -- (casts) should be avoided where
>possible.

You can cast "ptr" just fine, but neither the constant addresses
nor casts can appear in the case constructs.

>Could it be that even though K&R1 forbade mixing ints
>with ptrs and specified that only integer expression be
>used in a switch expression, all old compilers continued
>to treat the switch expression `sensibly' AND _somehow_
>the committee overlooked this?
>-- Bakul Shah <..!{ucbvax,sun,uunet}!amdcad!light!bvs>

Our Berkeley compiler would not compile this, and gave the interesting
diagnostic "duplicate case in switch" for the following code:

	extern int	*ptr, a, b, c;

	switch( ptr )
	{
	case	a:
		...
	case	b:
		...
	case	c:
		...
	default:
		...
	}

The explanation for the diagnostic is that the compiler inserted the null
pointer as a place holder for all unresolved extern address references,
so that all three cases wound up being the same (zero) when the compiler
went to create the jump table.

-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

rbutterworth@watmath.waterloo.edu (Ray Butterworth) (07/14/88)

In article <59881@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) writes:
> As such, I would avoid all pointer casts except the unavoidable ones (such as
> the -1 to "result of 'sbrk' type" casts).

To anyone that is documenting such functions:

Please be explicit about how to test the return value.

The man pages almost always say "returns -1 ...",
and too often I've seen code that tests the value with
something like:       if ((int)function() == -1) ...

This happens to work sometimes, but it is still wrong.
It should be:         if (function() == (type*)-1) ...

The man page should explicitly mention this cast,
and not assume that it is obvious to everyone.

(To see that it should be obvious, consider the source for
 the function.  It must "return (type*)-1;", so you want to
 compare the result with (type*)-1.  Doing it the other way
 compares -1 with (int)(type*)-1, and there is no guarantee
 that the double cast will result in -1.)

eao@anumb.UUCP (e.a.olson) (07/14/88)

In article <755@vsi.UUCP> sullivan@vsi.UUCP (Michael T Sullivan) writes:
>In article <1988Jul12.105547.13268@light.uucp>, bvs@light.uucp (Bakul Shah) writes:
>I don't know about casting should be avoided.  I'm looking at shmop(2) manual
>page for our 3B2 (shared memory operation) and it says:
...
>	Otherwise, a value of -1 is returned and errno is set to indicate
>	the error."
>
>Somebody at AT&T must think casting a pointer to an int (or a long) isn't
>such a bad idea, unless I'm missing something (it has been known to happen).

    I think that this might be due more to the convention that all
    section 2 calls return a -1 for error (and the fact that
    shmop feels that zero is fine to return) than to anything to
    do with how AT&T feels about casting a pointer to an int.

    In any case, as long as the system call and the caller share the
    same machine, and both the system call and the caller cast from
    a pointer to an int (or a long), then whatever -1 happens to be,
    both will agree on its value.

chris@mimsy.UUCP (Chris Torek) (07/15/88)

In article <7329@cup.portal.com> Paul_L_Schauble@cup.portal.com writes:
>And while we're at it, seems to me as though the expression in
>  switch (Type)
>should allow any Type for which the == operator is defined.

The reason switch works the way it does is that it `wants to'
compile into a computed goto:

	switch (i) {
	case 1: ... break;
	case 2: ... break;
	   .
	   .
	   .
	case 37: ... break;
	default: ... break;
	}

compiles to

	reg = i-1
	(unsigned)(reg - 36)
	>= 0 ? goto default
	goto cases[reg]

A more general version appears in Mesa:

    SELECT expression FROM
    { expression => statements }*
    END

(syntax very approximate).  All `expression's are arbitrary; the
language is defined such that the first `expr =>' that matches the
selected expression takes effect.  It is up to the optimiser to notice
that all the `expr =>'s are constant (if indeed they are) and turn this
into a computed goto; if some of the `expr's are non-constant, the
whole thing must usually be compiled as a series of if/else tests.

The construct does have some advantages.  For instance, instead of

	IF      bool-exp1 THEN stmts1
	ELSE IF bool-exp2 THEN stmts2
	ELSE IF bool-exp3 THEN stmts3
	   .
	   .
	   .
	ELSE		       stmtsn

one can write

	SELECT TRUE FROM
		bool-exp1 => stmts1
		bool-exp2 => stmts2
		bool-exp3 => stmts3
		   .
		   .
		   .
		TRUE =>	     stmtsn
	END

and of course a series of `if not bool-exp' statements can be written
even more simply as `SELECT FALSE FROM bool-exp ...' (although then
the `default' must be written `FALSE => ...').
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

g-rh@cca.CCA.COM (Richard Harter) (07/15/88)

In article <12484@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:

>A more general version appears in Mesa:

    SELECT expression FROM
    { expression => statements }*
    END

>(syntax very approximate).  All `expression's are arbitrary; the
>language is defined such that the first `expr =>' that matches the
>selected expression takes effect.  It is up to the optimiser to notice
>that all the `expr =>'s are constant (if indeed they are) and turn this
>into a computed goto; if some of the `expr's are non-constant, the
>whole thing must usually be compiled as a series of if/else tests.

Turning a list of constants into a form suitable for a computed goto
is non trivial if the constants are widely scattered in value.  A smart
compiler could construct a hash table to map the constants into a tight
integer range.  If the compiler does do this, then the SELECT-FROM-END
construct will produce code superior to that likely to be produced by
a human being when you have a long list.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

chris@mimsy.UUCP (Chris Torek) (07/15/88)

In article <12484@mimsy.UUCP> I wrote some pseudo-assembly:
>	reg = i-1
>	(unsigned)(reg - 36)
>	>= 0 ? goto default
>	goto cases[reg]

I have no idea why I wrote that.  That should be:

	reg = i-1
	if ((unsigned)reg > 36) goto default
	goto cases[reg]
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

bvs@light.uucp (Bakul Shah) (07/15/88)

In article <253@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes:
>The problem is not the ptr expression, it's the PTR* constants.
>In the original
>K&R, this was disallowed.  There are different classes of constants in C --

Use of link time constants as case expression would be nice,
but I was only thinking of true ptr constants, such as

	(int_f)0x1234 or (int_f)1
where
	int_f is a ptr to a function returning int.

Lookup definition of SIG_DFL etc in signal.h for an example.
kern_sig.c in 4.3 BSD uses ``switch (u->u_signal)'' (I don't
recall what V7 did -- probably the cast was done somewhere
else).  I mention the 4.3 use to merely point out that old
PCC compilers accept switch (ptr).

As long as one can create true constants for use as case
labels, any object that can be compared for equality should
be legal in switch(expr).  I see not allowing ptrs in
switch(expr) as an unnecessary restriction.  Relaxing it
will legitimize what many compilers already do.

More importantly, `switch(expr) { case ... }' is easier to
read/write than a string of `if .. then else if ...'.  If
the main function of a language is to help simplify
programming, it should _do_ so; especially in this case where
the benefit comes free.

-- Bakul Shah <..!{ucbvax,sun,uunet}!amdcad!light!bvs>

PS: casting a ptr back to a long is not safe because on some
machines ptrs are longer than 4 bytes and long ints are not,
so you'd lose some information.

ron@topaz.rutgers.edu (Ron Natalie) (07/15/88)

Brk and sbrk are not part of C.  They shouldn't even be part
of UNIX.  You can't write portable code with them.  The concept
of a single linear address space for data and another for subroutine
linkage is not a universal concept.  Use Malloc.

-Ron

g-rh@cca.CCA.COM (Richard Harter) (07/15/88)

In article <Jul.15.08.40.53.1988.20506@topaz.rutgers.edu> ron@topaz.rutgers.edu (Ron Natalie) writes:
>Brk and sbrk are not part of C.  They shouldn't even be part
>of UNIX.  You can't write portable code with them.  The concept
>of a single linear address space for data and another for subroutine
>linkage is not a universal concept.  Use Malloc.

	And, as noted in ~.lang.c by others, if you don't like malloc,
and want to write your own, use malloc as your primitive to get space 
from the system. 


-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

smryan@garth.UUCP (Steven Ryan) (07/16/88)

>I have no idea why I wrote that.  That should be:
>
>	reg = i-1
>	if ((unsigned)reg > 36) goto default

We're professionals--let some programming aide drudge sweat the details.

Actually, there's a cute way to fold default cases to a specific index. I
don't remember and I don't feel like working it out, but it's another example
of substituting a simple, pipe smashing jump with complicated inline sequence.

another fine specification from
  s m ryan

lenoil@Apple.COM (Robert Lenoil) (07/16/88)

More basic than pointers, how about allowing other arithmetic types in switch
statements?  It's really inefficient to do int comparisons when the actual
thing you're CASEing on is a byte.  I think that the type of the case labels
should be whatever the type of the switch expression; thus if I say "switch
(foo)" where foo is a char, all the case statements are cast to char.

On a similar note, it annoys me that enums can only be int-valued, when 99%
of the time they'll fit in a char.  On memory-tight code, this causes me to
use #defines instead of enums, so that I can use a byte instead of an int.
The same goes for bitfields.  I'd change the enum syntax to "enum type {...}",
where type is optional (and defaults to int), but can be any numeric type.
Similarly, I should be able to use any integer type in a bitfield definition.

Robert Lenoil
Apple Computer, Inc.

smryan@garth.UUCP (Steven Ryan) (07/17/88)

>More basic than pointers, how about allowing other arithmetic types in switch
>statements?  It's really inefficient to do int comparisons when the actual
>thing you're CASEing on is a byte.  I think that the type of the case labels
>should be whatever the type of the switch expression; thus if I say "switch
>(foo)" where foo is a char, all the case statements are cast to char.

[Not sure, but why not butt in anyway?]

I didn't think C had char expressions.  I thought every such expression
become some kind of int.

karl@haddock.ISC.COM (Karl Heuer) (07/18/88)

In article <14036@apple.Apple.COM> lenoil@apple.apple.com.UUCP (Robert Lenoil) writes:
>It's really inefficient to do int comparisons when the actual thing you're
>CASEing on is a byte.  I think that the type of the case labels should be
>whatever the type of the switch expression...

This is a Quality of Implementation issue.  A good optimizer will notice the
size of the switch parameter, and take appropriate action.

>On a similar note, it annoys me that enums can only be int-valued, when 99%
>of the time they'll fit in a char.

"Each enumerated type shall be compatible with an integer type; the choice of
type is implementation-defined" [Jan88 dpANS, 3.5.2.2].  I believe this allows
the implementation to use bytes for byte-sized enumerations.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

henry@utzoo.uucp (Henry Spencer) (07/23/88)

In article <5255@haddock.ISC.COM> karl@haddock.ima.isc.com (Karl Heuer) writes:
>In article <14036@apple.Apple.COM> lenoil@apple.apple.com.UUCP (Robert Lenoil) writes:
>>It's really inefficient to do int comparisons when the actual thing you're
>>CASEing on is a byte.  I think that the type of the case labels should be
>>whatever the type of the switch expression...
>
>This is a Quality of Implementation issue...

Moreover, it is a machine-specific optimization.  Some machines are very
clumsy at byte handling; it is not at all impossible that on some of them
byte comparisons are considerably *less* efficient than int comparisons.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/29/88)

In article <755@vsi.UUCP> sullivan@vsi.UUCP (Michael T Sullivan) writes:
>I don't know about casting should be avoided.  I'm looking at shmop(2) manual
>page for our 3B2 (shared memory operation) and it says:
>	"Shmat {which is a char* function -mts} returns the data segment
>	start address of the attached shared memory segment...
>	Otherwise, a value of -1 is returned and errno is set to indicate
>	the error."
>Somebody at AT&T must think casting a pointer to an int (or a long) isn't
>such a bad idea, unless I'm missing something (it has been known to happen).

That's a well-known serious design botch on AT&T's part.
At the time of the UNIX/RT-UNIX/TS merge, when the IPC
stuff first was wedged into USG UNIX 3.0 to make 4.0,
the problem with this practice should have been evident
(unlike the situation with brk(2)).  Therefore it's really
inexcusable.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/29/88)

In article <19876@watmath.waterloo.edu> rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
>and too often I've seen code that tests the value with
>something like:       if ((int)function() == -1) ...
>This happens to work sometimes, but it is still wrong.
>It should be:         if (function() == (type*)-1) ...

Nope, neither of these is necessarily going to work.
That's why it's a poor idea in the first place.

The -1 value is really set up by the common library
error module AS AN INTEGER return value.  There is
no guaranteed portable way to pick this up without
damaging it or losing the ability to correctly capture
the non-error return value.  You can try using a union
for this but I still doubt that it will always work.