[comp.lang.c] forward references in typedefs

george@giza.cis.ohio-state.edu (George M. Jones) (07/20/89)

I have noticed what I consider anomolous behavior when using forward references
in structs & typedefs and would would like to know whether things behave as
they do because (a) that's just the way the compilers choose to implement
a fuzzy point or (b) there is some definition (ansi, K&R) that say this is
[in]correct behavior.


The following code compiles (Sun, GCC, Green Hills) just fine as expected:

  struct one
    {
      struct two_t *foo;
    } one_t;
  
  struct two
    {
      struct one_t *bar;
    } two_t;


However all the compilers I tried appear to be choaking on the forward 
reference to two_t in the following chunk of code

  typedef struct one
    {
      two_t *foo;
    } one_t;
  
  typedef struct two
    {
      one_t *bar;
    } two_t;
  4:35pm> cc -c bad.c
  "bad.c", line 3: syntax error at or near variable name "two_t"
  "bad.c", line 4: zero sized structure

Any definitive answers from the knowledgable clientel of this newsgroup ?

Thanks,
---George Jones



-=-
OSU Computer & Inf. Science 2036 Neil Ave.,Columbus,Ohio 43210. 614-292-7325
george@cis.ohio-state.edu or ...!osu-cis!george
What would Christmas be without friends like you !

henry@utzoo.uucp (Henry Spencer) (07/20/89)

In article <55480@tut.cis.ohio-state.edu> George M. Jones <george@cis.ohio-state.edu> writes:
>I have noticed what I consider anomolous behavior when using forward references
>in structs & typedefs and would would like to know whether things behave as
>they do because (a) that's just the way the compilers choose to implement
>a fuzzy point or (b) there is some definition (ansi, K&R) that say this is
>[in]correct behavior.

The latter.  The language permits forward references with structure tags
(specifically, "struct foo *x;" when foo has not yet been seen) but not
the equivalent with typedefs.  That's just the way it is.

(Oh, you want to know *why*?  Basically, because typedefs are a royal pain
to parse at the best of times, and forward references just complicate the
issue hopelessly.  That keyword "struct" on the front simplifies things
enormously.  It also constrains the forward reference to be to a struct,
which makes things easier because although different types of pointers
sometimes have different representations [notably, the representation of
"char *" can differ from other pointers], it's very unusual for different
struct pointers to have different representations, so the compiler knows
how big a pointer it's got even if it's not quite sure what it points to.)
-- 
$10 million equals 18 PM       |     Henry Spencer at U of Toronto Zoology
(Pentagon-Minutes). -Tom Neff  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

sritacco@hpdml93.HP.COM (Steve Ritacco) (07/20/89)

I've fought with this many times myself.

It appears that this is a fact of life when dealing
with K&R C.  Every compiler I've ever used complains
about that code.  You simply need to define the structs
and then typedef them, at least that is the way I have
always gotten around this.

walter@hpclwjm.HP.COM (Walter Murray) (07/21/89)

George M. Jones writes:

>   typedef struct one
>     {
>       two_t *foo;
>     } one_t;
>   typedef struct two
>     {
>       one_t *bar;
>     } two_t;

This code is incorrect by the syntax rules of ANSI C.  The identifier
'two_t' by itself is not a valid type-specifier in the declaration
of 'foo', unless it has previously been defined as a typedef name.

This is probably what you want:

  typedef struct one one_t;
  typedef struct two two_t;
  struct one
    {
      two_t *foo;
    } ;
  struct two
    {
      one_t *bar;
    } ;

Walter Murray
---
Not speaking for Hewlett-Packard or X3J11
-------------

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/21/89)

In article <55480@tut.cis.ohio-state.edu> George M. Jones <george@cis.ohio-state.edu> writes:
>The following code compiles (Sun, GCC, Green Hills) just fine as expected:
>  struct one
>    {
>      struct two_t *foo;
>    } one_t;
>  struct two
>    {
>      struct one_t *bar;
>    } two_t;

If you were to change the member types to point to "struct two" and
"struct one" types, it would be correct.  The lack of a diagnostic at
the end of the translation unit ("struct two_t and struct one_t missing
declarations") can be considered a compiler deficiency.

C deliberately allows forward struct references in such contexts.
It may not have been completely clear in K&R 1st Edition, but it's
officially specified in the proposed Standard.

>However all the compilers I tried appear to be choaking on the forward 
>reference to two_t in the following chunk of code
>  typedef struct one
>    {
>      two_t *foo;
>    } one_t;

Sure; "two_t" is neither a C keyword nor a defined type at the point
that it is first used.  The compiler cannot know that it is eventually
going to be a structure type, unless it were to scan ahead (which is
contrary to the intent of C).

You might consider writing something like:
	typedef struct two two_t;	/* incomplete type */
	typedef struct one {
		two_t *foo;
	) one_t;
	struct two {			/* completes the type */
		one_t *bar;
	};

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/21/89)

In article <1989Jul20.152935.14872@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>... it's very unusual for different
>struct pointers to have different representations, ...

Some of us think that the language rules effectively force all struct
pointers to be conformable.

dfp@cbnewsl.ATT.COM (david.f.prosser) (07/21/89)

In article <55480@tut.cis.ohio-state.edu> George M. Jones <george@cis.ohio-state.edu> writes:
>I have noticed what I consider anomolous behavior when using forward references
>in structs & typedefs and would would like to know whether things behave as
>they do because (a) that's just the way the compilers choose to implement
>a fuzzy point or (b) there is some definition (ansi, K&R) that say this is
>[in]correct behavior.
>
>
>The following code compiles (Sun, GCC, Green Hills) just fine as expected:
>
>  struct one
>    {
>      struct two_t *foo;
>    } one_t;
>  
>  struct two
>    {
>      struct one_t *bar;
>    } two_t;
>

There is either a pair of typos or a misunderstanding in the above.
Structure tags are in a different namespace from regular identifiers.
The structures "struct one_t" and "struct two_t" have never been
defined.  Moreover, since a "_t" ending is usually used for typedef
names, my guess is that typedefs were intended for these, instead of
regular objects.

	typedef struct one { struct two *foo; } one_t;
	typedef struct two { struct one *bar; } two_t;

>
>However all the compilers I tried appear to be choaking on the forward 
>reference to two_t in the following chunk of code
>
>  typedef struct one
>    {
>      two_t *foo;
>    } one_t;
>  
>  typedef struct two
>    {
>      one_t *bar;
>    } two_t;
>  4:35pm> cc -c bad.c
>  "bad.c", line 3: syntax error at or near variable name "two_t"
>  "bad.c", line 4: zero sized structure
>
>Any definitive answers from the knowledgable clientel of this newsgroup ?

Since "two_t" in the above is completely unknown, the compiler has no
idea what is intended.  At least the "shape" of the type being described
by the typedef name must be declared prior to use of a typedef name.
If I were writing this sort of code, I'd probably use typedef names for
what ANSI C calls "incomplete types":

	typedef struct one one_t;
	typedef struct two two_t;

At this point, sizeof(one_t) and sizeof(two_t) are not known, but
sizeof(one_t *) and sizeof(two_t *) are known.  Note that this implies
that all structure (and union) pointers must be the same size.  As this
pointer, the following declarations are valid:

	struct one { two_t *foo; };
	struct two { one_t *bar; };

The main advantage gained by separating the typedef declaration from the
structure is that a single common header file can contain the typedefs
while headers for particular modules can contain the complete structure.
Only those parts of the program that need to know the members of the
structure must include the detailed header.

Dave Prosser	dfp@attunix.att.com

wen-king@cit-vax.Caltech.Edu (King Su) (07/23/89)

In article <1196@cbnewsl.ATT.COM> dfp@cbnewsl.ATT.COM (david.f.prosser) writes:

>The main advantage gained by separating the typedef declaration from the
<structure is that a single common header file can contain the typedefs
>while headers for particular modules can contain the complete structure.
<Only those parts of the program that need to know the members of the
>structure must include the detailed header.

I have often wondered about this.  Is it legal?  Since pointers for
different data objects are not required to have the same binary format
or even storage size, how would a compiler allocate space and initialize
a pointer of an unknown type?  Consider:

master_header.h:

    typedef struct s1 S1;
    typedef struct s2 S2;
    typedef struct s3 S3;

one_program_file.c:

    #include "master_header.h"

    struct s3 { S1 *pointer_s1;
		S2 *pointer_s2;
		int  a;
		....            } s;

When compiling "one_program_file.c", how does the compiler figure out
the size of 's' and the offset of 'a', given that 'pointer_s1' and
'pointer_s1' are of unknown size.  How does the compiler initialize
's' (assuming static variable) if the format for the pointers are
unknown.  Or is it true that pointers to structures are always
of the same size and format?

-- 
/*------------------------------------------------------------------------*\
| Wen-King Su  wen-king@vlsi.caltech.edu  Caltech Corp of Cosmic Engineers |
\*------------------------------------------------------------------------*/

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/23/89)

In article <11341@cit-vax.Caltech.Edu> wen-king@cit-vax.UUCP (Wen-King Su) writes:
-I have often wondered about this.  Is it legal?  Since pointers for
-different data objects are not required to have the same binary format
-or even storage size, how would a compiler allocate space and initialize
-a pointer of an unknown type?

All pointers-to-structures have the same size, that's how.

cowan@marob.masa.com (John Cowan) (07/25/89)

In article <1989Jul20.152935.14872@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>...it's very unusual for different
>struct pointers to have different representations, so the compiler knows
>how big a pointer it's got even if it's not quite sure what it points to.)

For "very unusual" read "illegal".  As someone (Chris Torek?) said earlier
in this group, "all 'struct' pointers must 'smell' the same."  If this
behavior is not guaranteed, forward references to structs would be 
unimplementable in one pass.

Ufcawss, if you talk to the people who wrote VMS C, they'll tell you that
all one-pass implementations of C are unacceptable!  (This has something to
do with generating good code for the 'switch' statement.)
-- 
Internet/Smail: cowan@marob.masa.com	Dumb: uunet!hombre!marob!cowan
Fidonet:  JOHN COWAN of 1:107/711	Magpie: JOHN COWAN, (212) 420-0527
		Charles li reis, nostre emperesdre magnes
		Set anz toz pleins at estet in Espagne.

chris@mimsy.UUCP (Chris Torek) (07/25/89)

In article <24CB9E07.9547@marob.masa.com> cowan@marob.masa.com (John Cowan)
writes:
>... As someone (Chris Torek?) said earlier in this group, "all 'struct'
>pointers must 'smell' the same."  If this behavior is not guaranteed,
>forward references to structs would be unimplementable in one pass.

It was Doug Gwyn, but this is otherwise correct.

>Ufcawss, if you talk to the people who wrote VMS C, they'll tell you that
>all one-pass implementations of C are unacceptable!  (This has something to
>do with generating good code for the 'switch' statement.)

They are both right and wrong.  One-pass code generation means there
are few opportunities for optimsiation; but generating good switches
is easy.  Simply emit a branch at the top of the switch, code at each
of the labels, a branch at the bottom, and then generate the code
that actually implements the switch.  E.g., given

	switch (foo) {
	case 1: case1(); break;
	case 2: case2(); break;
	case 3: case3(); break;
	case 999: case999(); break;
	}
	onward();

one generates code of the form:

		jump	L1
	L3:	call	case1_
		jump	L2
	L4:	call	case2_
		jump	L2
	L5:	call	case3_
		jump	L2
	L6:	call	case999_
		jump	L2
	L1:	move	foo,r0
		compare	r0,#3
		jeq	L5
		jgt	L7
		compare	r0,#1
		jeq	L3
		jgt	L4	# here is the clever part
		jump	L2
	L7:	compare	r0,#999
		jeq	L6
	L2:	call	onward_

This sort of code is exactly what one gets from a one-pass compiler,
except that they tend not to have a clever part. :-)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

wjr@ftp.COM (Bill Rust) (07/25/89)

In article <18729@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <24CB9E07.9547@marob.masa.com> cowan@marob.masa.com (John Cowan) >writes:
||... As someone (Chris Torek?) said earlier in this group, "all 'struct'
||pointers must 'smell' the same."  If this behavior is not guaranteed,
||forward references to structs would be unimplementable in one pass.
|
|It was Doug Gwyn, but this is otherwise correct.
|
||Ufcawss, if you talk to the people who wrote VMS C, they'll tell you that
||all one-pass implementations of C are unacceptable!  (This has something to
||do with generating good code for the 'switch' statement.)
|
|They are both right and wrong.  One-pass code generation means there
|are few opportunities for optimsiation; but generating good switches
|is easy.  Simply emit a branch at the top of the switch, code at each
|of the labels, a branch at the bottom, and then generate the code
|that actually implements the switch.  E.g., given
|
| [Clever example deleted]


Excuse me. I am not a compiler writer but even I know that there are
two ways to implement switches: jump tables and compare and execute
when equal.  Jump tables (ie jmp switch_list[ix] where ix is some easy
transformation of the switch variable) are much more efficient when
the range of values that the switch variable can assume is limited (ie
switch variable in range of 1 to 20 and it assumes 75% of values). The
only way to tell if a jump table is better than compare and jump is to
see what the range of the switch variable is and how many values it
actually assumes. This is very difficult to do in a one pass compiler.
The previous correspondent completely ignored this in his response.

Bill Rust (wjr@ftp.com)

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/26/89)

In article <686@ftp.COM> wjr@ftp.UUCP (Bill Rust) writes:
>The only way to tell if a jump table is better than compare and jump is to
>see what the range of the switch variable is and how many values it
>actually assumes. This is very difficult to do in a one pass compiler.

I don't see that it's particularly hard to do.  The compiler generates
a preamble that branches to a postamble to be generated later (after the
entire switch has been seen), and for each case as it is encountered the
case value is stashed in a compiler internal table (only 257 labels need
be accommodated) and an entry point label output followed by the case code
then a jump to a second deferred label which will follow the postamble.
The postamble can be a jump table, or a sequence of comparisons (hopefully
ordered as a binary tree instead of linear), or any other algorithm the
compiler decided was best for the particular set of case labels.

karzes@mfci.UUCP (Tom Karzes) (07/27/89)

In article <686@ftp.COM> wjr@ftp.UUCP (Bill Rust) writes:
>Excuse me. I am not a compiler writer but even I know that there are
>two ways to implement switches: jump tables and compare and execute
>when equal.

There are more ways to implement switch statements than this.  For example,
if you have a sparse but large set of case values, it may make sense to
use an optimized hash table to look up the branch targets.  Even for the
"compare and execute when equal" case there are several possibilities.
If there are only a couple of values, it makes sense to simply check them
individually and execute the default case when there's no match.  For a
larger set of values, it might make sense to do a binary search (which
of course is log(n)), assuming you didn't want to use an optimized hash
table.  I'm sure there are other possibilites as well, although these
are the ones you're most likely to actually see implemented.

dg@lakart.UUCP (David Goodenough) (07/27/89)

henry@utzoo.uucp (Henry Spencer) sez:
> The latter.  The language permits forward references with structure tags
> (specifically, "struct foo *x;" when foo has not yet been seen) but not
> the equivalent with typedefs.  That's just the way it is.
> 
> (Oh, you want to know *why*?  Basically, because typedefs are a royal pain
> to parse at the best of times, and forward references just complicate the
> issue hopelessly.

Also, have you considered the following:

typedef	a *b;

typedef b *a;

There, now your compiler is _REALLY_ confused :-)
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com		  	  +---+

dg@lakart.UUCP (David Goodenough) (07/28/89)

wjr@ftp.COM (Bill Rust) sez:
> Excuse me. I am not a compiler writer but even I know that there are
> two ways to implement switches: jump tables and compare and execute
> when equal.

That makes three - see below, plus the hashing technique used in the V6
unix cc, on the PDP11 I used at university makes 4. Plus the concept
(when switching on a character value) of generating a string of recognised
characters, calling index, turning the pointer returned back into and index
and using _THAT_ to index into a table (which is in fact a variation on
my method #3 below) Of course, compiler purists will say that I have to
do the comparisons as integer values, but don't knock it, because it does
work.

> Jump tables (ie jmp switch_list[ix] where ix is some easy
> transformation of the switch variable) are much more efficient when
> the range of values that the switch variable can assume is limited (ie
> switch variable in range of 1 to 20 and it assumes 75% of values). The
> only way to tell if a jump table is better than compare and jump is to
> see what the range of the switch variable is and how many values it
> actually assumes. This is very difficult to do in a one pass compiler.
> The previous correspondent completely ignored this in his response.

Hold on a minute. Doing the necessary work is very _EASY_ in a one pass
compiler. Just like Chris Torek said, you jump at the top, build your
switch code with labels, creating the table of values as you go. Now, at
the bottom you have a table full of values. You can do anything you like
with it - simple index lookup (jump table[ix]), tabled compare (method 3):

	dw	label1, value1
	dw	label2, value2

with a search comparing value1 to the expression, or anything else that
takes your fancy. All in one pass.
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com		  	  +---+

mccaugh@s.cs.uiuc.edu (08/10/89)

 I couldn't help but side with the suggestion of "quadratic" even if it was
 misguided, since:

 * a 'quadrille' is a "square" dance, but consists of 5 or 6 rhythms
 * a 'quadrangle' and a 'quadrilateral' are both 4-sided
 * a 'quadrennium' is a period of 4 years

 Need I go on? THe point being that 'quadratic' means 'square', and it is
 an inference that the latter means "to the power 2".