[comp.lang.c] Problems with GCC and/or VAX LINK

ccdn@levels.sait.edu.au (DAVID NEWALL) (03/08/89)

I recently tried compiling the VAX LZCMP compression program on our
VAX.  The VAX is running VMS 5.0 and the C compiler is GNU C (don't
know what version).  I encountered a few problems:

1.  GCC doesn't allow mismatched quotation marks.  For example:
        /* this isn't going to work */
    doesn't compile because the comment has a single quotation mark.
    Interestingly, the following does compile:
        /* this is going to work, even though the quotes (')
           aren't matched on the _same_ line */
    I think this behaviour is poor -- quotes shouldn't need to be
    matched in comments.  Neither should they need matching in code
    that is #ifdef'd out.

    I've seen arguments over this behaviour before (some people claim
    it is the one true way; others disagree).  For the record, I think
    that comments are just that:  comments.  And so it shouldn't matter
    how many quotation marks they contain.  Code which is #ifdef'd out
    should be treated similarly.  I'd like this problem fixed.  (Please).

2.  The LZCMP program includes a DCL command table, which is generated
    by the VMS command "$ SET COMMAND/OBJECT".  This command table is
    linked with the C program, and referenced (in the program) via a
    "globalref" variable; the declaration looks like this:
        globalref dcl_table;    /* this is the DCL command table */
    I assumed that "globalref" meant the same as "extern".  This turns
    out not to be the case.  It seems that VMS has a "global" class for
    symbols, and that extern variables aren't "global".  It also turns
    out that extern functions _are_ global -- what I am saying is that
    "extern dcl_table" didn't work (dcl_table didn't point to the right
    place), but "extern dcl_table()" did!

    How do I reference (VAX) global symbols from a C program, assuming
    I want to compile with GCC?

3.  My investigations into "globalref" high-lighted a problem with either
    the VMS linker, or with both GCC and VAX C.  Essentialy, I can compile,
    link and execute the following program:
        extern v1;
        int v2;
        main() {
                printf("&v1=%d\n&v2=%d\n", &v1, &v2);
                exit(1);
        }
    Compiling with GCC, I get &v1 == &v2.  Compiling with VAX C I get
    &v1 + 4 == &v2.  In either case, I think it's wrong.  I think that
    I should get a linker error complaining about an undefined external
    variable (v1).

    Aren't I right?  Shouldn't the above program generate an error?  In
    which case, is this a bug in GCC (and VAX C) or in the VAX linker?
    (I think it's a fault of the linker).

David Newall                     Phone:  +61 8 343 3160
Unix Systems Programmer          Fax:    +61 8 349 6939
Academic Computing Service       E-mail: ccdn@levels.sait.oz.au
SA Institute of Technology       Post:   The Levels, South Australia, 5095

bobmon@iuvax.cs.indiana.edu (RAMontante) (03/10/89)

ccdn@levels.sait.edu.au (DAVID NEWALL) <1680@levels.sait.edu.au> :
-
-1.  GCC doesn't allow mismatched quotation marks.  For example:
-        /* this isn't going to work */
-    doesn't compile because the comment has a single quotation mark.

gcc v1.34 under ULTRIX has no problems with a single quote in a comment.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (03/11/89)

In article <1680@levels.sait.edu.au> ccdn@levels.sait.edu.au (DAVID NEWALL) writes:
-1.  GCC doesn't allow mismatched quotation marks.  For example:
-        /* this isn't going to work */
-    I've seen arguments over this behaviour before ...

I don't see any grounds for argument.  C has always permitted arbitrary
text in a comment, terminated when "*/" first occurs.  Syntax checking
of the contents of a comment is simply wrong.

-3.  My investigations into "globalref" high-lighted a problem with either
-    the VMS linker, or with both GCC and VAX C.  Essentialy, I can compile,
-    link and execute the following program:
-        extern v1;
-        int v2;
-        main() {
-                printf("&v1=%d\n&v2=%d\n", &v1, &v2);
-                exit(1);
-        }
-    Compiling with GCC, I get &v1 == &v2.  Compiling with VAX C I get
-    &v1 + 4 == &v2.  In either case, I think it's wrong.  I think that
-    I should get a linker error complaining about an undefined external
-    variable (v1).

Technically that's correct.  No storage has been allocated for `v1'.
A common extension found in many C implementations (especially on UNIX)
is to treat "extern" in such a context like Fortran COMMON, allocating
at link time the largest storage thus referenced, if no explicit
storage definition has been provided.  It sounds like VAX C follows that
COMMON model.  I don't know of any way that the reported GCC behavior
could be considered correct.

guy@auspex.UUCP (Guy Harris) (03/12/89)

>    I've seen arguments over this behaviour before (some people claim
>    it is the one true way; others disagree).

I cannot imagine why there should be any arguments over this behavior
any more.  The December 7, 1988 dpANS (is that draft now the pANS) says,
quite explicitly:

	3.1.9 Comments

	  ... The contents of a comment are examined only to identify
	multibyte characters and to find the characters */ that
	terminate it.

So, with regards to ANSI C, there is only one side to that debate, not
two.

K&R I doesn't explicitly indicate that the compiler shouldn't match
quotation marks in comments, so if an implementer wants to be a real
asshole he or she can insist on matched quotation marks in comments
without violating the letter of K&R I, I guess. 

However, I'm *extremely* surprised that GCC cares; I don't even think
twice about using apostrophes in comments, and I've seen lots of code
that uses them.

schmidt@siam.ics.uci.edu (Doug Schmidt) (03/12/89)

In article <1153@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
++ However, I'm *extremely* surprised that GCC cares; I don't even think
++ twice about using apostrophes in comments, and I've seen lots of code
++ that uses them.

Lest GCC (cccp actually) acquire the totally undeserved reputation as
``the compiler than can't grok apostrophes in comments'' let me remark
that the current version (1.34) certainly does not have this problem,
nor have I *ever* seen this problem with GCC.  I've been collecting
bug reports since version 1.18 (which appeared over a year ago), and
have not seen this problem mentioned before.  I would be interested
in hearing which release of GCC this purported problem occurred in.

Running ``gcc -v'' will provide the relevant information.

thanks,

   Doug
--
schmidt@ics.uci.edu (ARPA) |   Per me si va nella citta' dolente.
office: (714) 856-4043     |   Per me si va nell'eterno dolore.
                           |   Per me si va tra la perduta gente.
                           |   Lasciate ogni speranza o voi ch'entrate.

jfc@athena.mit.edu (John F Carr) (03/12/89)

>	3.1.9 Comments
>	  ... The contents of a comment are examined only to identify
>	multibyte characters ...                          ^^^^^^^^^^^
        ^^^^^^^^^^^^^^^^^^^

Why?

--
   John Carr             "When they turn the pages of history,
   jfc@Athena.mit.edu     When these days have passed long ago,
   bloom-beacon!          Will they read of us with sadness
   athena.mit.edu!jfc     For the seeds that we let grow?"  --Neil Peart

arndt@ZYX.SE (Arndt Jonasson) (03/13/89)

In article <9776@bloom-beacon.MIT.EDU> jfc@athena.mit.edu (John F Carr) writes:
>>	3.1.9 Comments
>>	  ... The contents of a comment are examined only to identify
>>	multibyte characters ...                          ^^^^^^^^^^^
>        ^^^^^^^^^^^^^^^^^^^
>
>Why?

So that the sequence '/' '*' isn't mistaken for the end of the comment
if it should occur "out of phase", so to speak. If there are multibyte
characters in the comment (the usual scheme I am aware of is to let
the first byte of a multi-byte character have its high bit set), the
character '/' might occur as the second component of such a character.
Then, if the next character happens to be an ordinary one-byte '*',
you will appear to have reached an end-of-comment, unless you keep
track of multi-byte characters.

All this amounts to in practice (at least for the purpose of parsing
comments) is to use a slightly more sophisticated character reader,
which always returns a multi-byte character, with the usual mono-byte
characters as a special case.
-- 
Arndt Jonasson, ZYX Sweden AB, Styrmansgatan 6, 114 54 Stockholm, Sweden
email address:   arndt@zyx.SE   or      <backbone>!mcvax!enea!zyx!arndt

henry@utzoo.uucp (Henry Spencer) (03/14/89)

In article <9776@bloom-beacon.MIT.EDU> jfc@athena.mit.edu (John F Carr) writes:
>>	3.1.9 Comments
>>	  ... The contents of a comment are examined only to identify
>>	multibyte characters ...                          ^^^^^^^^^^^
>        ^^^^^^^^^^^^^^^^^^^
>
>Why?

Because the ASCII code for '*' or '/' might possibly appear as part of
a multibyte character, so the compiler cannot get comment delimiters
right without knowing about multibyte characters.
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

ath@helios.prosys.se (Anders Thulin) (03/14/89)

In article <1989Mar13.163731.21908@utzoo.uucp> henry@utzoo.uucp (Henry Spencer)
says (paraphrased):

...that the contents of a comment is examined to identify multibyte
...characters only in order to recognize '*/' as multibyte characters

This seems to imply that there is a standard multibyte character set,
recognized by conforming compilers. Is there?
-- 
Anders Thulin			INET : ath@prosys.se
ProgramSystem AB		UUCP : ...!{uunet,mcvax}!enea!prosys!ath
Teknikringen 2A			PHONE: +46 (0)13 21 40 40
S-583 30 Linkoping, Sweden	FAX  : +46 (0)13 21 36 35

ccdn@levels.sait.edu.au (DAVID NEWALL) (03/15/89)

ccdn@levels.sait.edu.au (DAVID NEWALL) <1680@levels.sait.edu.au> :
> 1.  GCC doesn't allow mismatched quotation marks.  For example:
>         /* this isn't going to work */
>     doesn't compile because the comment has a single quotation mark.

***** FLAME ON *****

GCC does NOT complain about mismatched quotation marks in *comments*.
It does complain about them in `#ifdef 0'd code.  The example should
have been:

#ifdef 0
    This isn't going to work
#endif

So there!

***** FLAME OFF *****

I got confused.  Sorry.

Mind you, this is still a problem.  A program section which is
conditionally  _not_ compiled should be treated like a comment -- it
shouldn't matter what's inside the #ifdef/#endif.


David Newall                     Phone:  +61 8 343 3160
Unix Systems Programmer          Fax:    +61 8 349 6939
Academic Computing Service       E-mail: ccdn@levels.sait.oz.au
SA Institute of Technology       Post:   The Levels, South Australia, 5095

henry@utzoo.uucp (Henry Spencer) (03/16/89)

In article <399@helios.prosys.se> ath@prosys.se writes:
>...that the contents of a comment is examined to identify multibyte
>...characters only in order to recognize '*/' as multibyte characters
>
>This seems to imply that there is a standard multibyte character set,
>recognized by conforming compilers. Is there?

No.  However, ANSI C provides for the possibility that a specific compiler
might implement such a thing.  The standard imposes a few minor constraints
on it (notably, a '\0' cannot appear within a multibyte character), but
does not dictate precisely how it works.
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

scs@adam.pika.mit.edu (Steve Summit) (03/16/89)

Quotes and comments have been discussed to death, but I haven't
seen any discussion of the globalref issue.  (Perhaps it was
confined to gnu.gcc or comp.os.vms, where I wouldn't have seen
it.  Apologies for any redundancy.)

In article <1680@levels.sait.edu.au> ccdn@levels.sait.edu.au (DAVID NEWALL) writes:
>I recently tried compiling the VAX LZCMP compression program on our
>VAX.  The VAX is running VMS 5.0 and the C compiler is GNU C (don't
>know what version).  I encountered a few problems:
>
>2.  The LZCMP program includes a DCL command table, which is...
>    referenced (in the program) via a
>    "globalref" variable; the declaration looks like this:
>        globalref dcl_table;    /* this is the DCL command table */
>    I assumed that "globalref" meant the same as "extern".  This turns
>    out not to be the case.  It seems that VMS has a "global" class for
>    symbols, and that extern variables aren't "global".  It also turns
>    out that extern functions _are_ global -- what I am saying is that
>    "extern dcl_table" didn't work (dcl_table didn't point to the right
>    place), but "extern dcl_table()" did!

External linkage from C under VMS is a real can of worms.  Given
some fixed historical precedents, the globalref/globaldef stuff
(and the resulting workarounds under compilers which don't have
it) is probably necessary, albeit messy and nonstandard.

First, you need to know that VMS object files, and the VMS
linker, deal with multiple Program SECTionS, or psects.  Unix
uses two or three analogous segments ("text," "data," and "bss"),
but VMS typically deals with many.  Psects have a number of
attributes: whether they're executable, whether they're writable,
what their alignment is, and -- here's an interesting one --
whether multiple psects of the same name, contributed from
different object files, concatenate or overlay each other.
Psects that overlay each other might sound useless at best or
dangerous at worst, but it turns out they're just what you want
for, say, Fortran COMMON blocks.

When you say

	extern int x;
or
	int x = 3;

under DEC's VMS C compiler, you don't get a conventional defined
or undefined global symbol in a data psect.  You get a new psect,
named "X", of size sizeof(int), marked with the overlay
attribute.  All global variables named "x" in all modules
therefore end up sharing the same storage, as expected.  I don't
know for certain why this somewhat unusual and unexpected
implementation of C global variables was chosen, but I suspect it
had to do either with

     1.	An attempt to maintain compatibility with one of the
	weaker models for C external linkage, suggested but not
	required by K&R, but which many existing C programs
	assume.  (The "strong" model is "exactly one defining
	instance;" i.e. all but one declaration of a global
	variable must use the word "extern."  Unfortunately,
	various "common" models are, er, common, such as
	programs that say

		int x = 3;

	in one module and

		int x;

	in another.)

     2.	An attempt to make linking of C and Fortran modules easy,
	by mapping C externs to Fortran COMMON blocks.

Given the "common psect" implementation for conventional C
externs (and, for better or worse, that is the implementation),
if what you want is a regular defined global symbol in a data
psect, you've got to use globaldef (or globalref to reference
it), for that is exactly what globaldef and globalref do.

I doubt it would be easy to add these to gcc, since they show up
in the grammar.  gcc probably had to go with the common psect
model for regular externs for compatibility with VAX11C.

The reason that

	extern dcl_table();

worked is that functions do deal with conventional defined
symbols (as opposed to named psects).  Since all you did with
dcl_table was (I presume) pass its address back to the CLI
routines, the C compiler never had to generate any code other
than to push the address, so the fact that it was (incorrectly)
declared as a function didn't matter.  This is a nice workaround,
which I hadn't seen before (Did you invent it?  Congratulations!)
and it is probably the correct thing to use.

>3.  My investigations into "globalref" high-lighted a problem with either
>    the VMS linker, or with both GCC and VAX C.  Essentialy, I can compile,
>    link and execute the following program:
>        extern v1;
>        int v2;
>    ...
>                printf("&v1=%d\n&v2=%d\n", &v1, &v2);
>
>    Compiling with GCC, I get &v1 == &v2.  Compiling with VAX C I get
>    &v1 + 4 == &v2.  In either case, I think it's wrong.  I think that
>    I should get a linker error complaining about an undefined external
>    variable (v1).

VAX C worked because of the way overlayed psects work -- each
declaration of (in this case) v1, whether a "defining instance"
or not, generates a reference to a psect named "V1", so even if
there never is a defining instance, the psect gets created.  This
is mildly surprising, but no more so than some of the screwball
things the Unix compilers and linkers have always let you get
away with.  (The other day I discovered that Ritchie's pdp11
compiler accepts

	extern int x = 3;

although I don't know what it means.)

gcc probably ended up with &v1==&v2 because of a misunderstanding
or bug in its implementation of the named psect nonsense.

The big problem with implementing C externs as named psects is
that the linker won't then search for undefined externals (if it
did, the "expected" error for an undefined v1 would have resulted
from the above example).  Instead, undefined externals spring
into existence, as noted, without (here is the killer) being
loaded from libraries.  (This issue would have qualified for a
"frequently asked questions" list on comp.os.vms when last I
followed it.)  That is, if you have

	extern int x;

in an explicitly-loaded object file, and an object in a library
containing only

	int x = 3;

that library member won't get loaded, and x will remain 0.  The
solutions are either to request the library member explicitly, or
to use globalref/globaldef, or to add to the library member a
definition of some other required symbol (such as a function,
which links conventionally) to force the member to be loaded.

At one point I heard that a future version of the VMS linker
would be able to search for psects, perhaps to solve this
problem; that may have been implemented by now.

If you think globalref and globaldef are weird, have you looked
at globalvalue?  A totally unfamiliar concept to C programmers,
though useful in a VMS environment.  If you say

	globalvalue int SS$_NORMAL;
	int retval = SS$_NORMAL;

you'll end up with something like

	movl $1, _retval

rather than

	.extern _SS$_NORMAL
	movl _SS$_NORMAL, _retval	; no $, no immediate constant,
					; SS$_NORMAL is here an address

That is, compiler will generate code not to dereference a
location whose address the linker will fill in, but to use an
absolute value (which the linker will fill in).  Under Unix
predefined magic constants are typically implemented with
#defines in standard header files; under VMS the linker will fill
them in from the standard libraries.  It turns out that you can
simulate globalvalue with the same kind of trick as for globalref --
you could say

	extern int SS$_NORMAL();
	int x = SS$_NORMAL;

and presto (ignoring type clash warnings) x would be set to 1.
(This does not mean that globalref and globalvalue are equivalent
and therefore redundant; the globalref workaround replaced
something like

	globalref dcl_table;
	cli$xxx(..., &dcl_table, ...);

with

	extern dcl_table();
	cli$xxx(..., dcl_table, ...);

Note the ampersand; globalref and globalvalue differ by a level
of indirection.)

                                            Steve Summit
                                            scs@adam.pika.mit.edu

henry@utzoo.uucp (Henry Spencer) (03/17/89)

In article <1732@levels.sait.edu.au> ccdn@levels.sait.edu.au (DAVID NEWALL) writes:
>... A program section which is
>conditionally  _not_ compiled should be treated like a comment -- it
>shouldn't matter what's inside the #ifdef/#endif.

ANSI C disagrees with you, for moderately good reasons:  scanning for
#endif is a non-trivial process and demands some small degree of structure
in ifdefed-out material.
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

gwyn@smoke.BRL.MIL (Doug Gwyn ) (03/17/89)

In article <1732@levels.sait.edu.au> ccdn@levels.sait.edu.au (DAVID NEWALL) writes:
>Mind you, this is still a problem.  A program section which is
>conditionally  _not_ compiled should be treated like a comment -- it
>shouldn't matter what's inside the #ifdef/#endif.

More precisely, it should be scanned for preprocessor # directives
so it can properly tell where the matching #else or #endif is.
The only other form of syntax checking performed on a skipped group
is the parsing into "preprocessing tokens".  The pANS explicitly
states that an unmatched ' or " produces "undefined behavior", so
GCC is allowed to complain about it (or accept it as just a character).

This tokenizing behavior is considerably more structured than many
existing C preprocessors support (notably the Reiser cpp).  One is
not allowed to have arbitrary garbage #if 0'ed out; it must consist
of a sequence of valid preprocessing tokens.

karl@haddock.ima.isc.com (Karl Heuer) (03/17/89)

In article <1732@levels.sait.edu.au> ccdn@levels.sait.edu.au (DAVID NEWALL) writes:
>A program section which is conditionally _not_ compiled should be treated
>like a comment -- it shouldn't matter what's inside the #ifdef/#endif.

The following is legal C code:
	char *x="\
	#endif" /*
	#ifdef foo */
It has always been legal to enclose such code in an #if/#endif.  Your
suggestion would break this.

If you want to prevent sections of code from being compiled, use #if/#endif.
If you want to include descriptive text, use a comment.

Btw, you said `#ifdef 0'; this should be `#if 0' (or that godawful pdp11ism
`#ifdef notdef')

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

rpd@cs.cmu.edu (Richard Draves) (03/17/89)

Could someone help with the following questions about the proposed ANSI C
standard?  I've checked K&R2, with inclusive results.

First, I'd like to know if a #if in one file can be matched by a #else or #endif
in another file.

Second, I'd like to know if when one is skipping over tokens looking for a #else
or #endif, one is entitled to skip #include directives without substituting the
file.

Thanks,
Rich

w-colinp@microsoft.UUCP (Colin Plumb) (03/17/89)

ccdn@levels.sait.edu.au (DAVID NEWALL) wrote:
> Mind you, this is still a problem.  A program section which is
> conditionally  _not_ compiled should be treated like a comment -- it
> shouldn't matter what's inside the #ifdef/#endif.

Yes, it should.  In particular, consider the string:
"This\n\
is a test\n\
of\n\
#endif\n"

Even though the characters "#endif" appear at the beginning of a line,
they should not be treated as a preprocessor directive.
-- 
	-Colin (uunet!microsoft!w-colinp)

"Don't listen to me.  I never do." - The Doctor

gwyn@smoke.BRL.MIL (Doug Gwyn ) (03/18/89)

In article <kY89tBy00hYPNFGX1Q@cs.cmu.edu> rpd@cs.cmu.edu (Richard Draves) writes:
>First, I'd like to know if a #if in one file can be matched by a #else or #endif
>in another file.

The preprocessing grammar implies not.  As I recall the debate on this,
generally it was felt that any possible use of such behavior would be
outweighed by the much greater chance of it contributing to mysterious
bugs.  The grammar is also much simpler this way.

>Second, I'd like to know if when one is skipping over tokens looking for a #else
>or #endif, one is entitled to skip #include directives without substituting the
>file.

Not only is one entitled to, one is required to.

henry@utzoo.uucp (Henry Spencer) (03/19/89)

In article <kY89tBy00hYPNFGX1Q@cs.cmu.edu> rpd@cs.cmu.edu (Richard Draves) writes:
>First, I'd like to know if a #if in one file can be matched by a #else or #endif
>in another file.

If you read the ANSI stuff carefully (Oct draft), the BNF that defines the
#if-#endif matching is a BNF for a *single* file.  So the answer is no.

>Second, I'd like to know if when one is skipping over tokens looking for a #else
>or #endif, one is entitled to skip #include directives without substituting the
>file.

This behavior is not only permissible, it is required.  The rule is that
preprocessor directives in a skipped section are examined only up to the
keyword following the "#", to keep track of #if nesting.  As I read it,
the rest of an ignored directive doesn't even need to be syntactically legal,
although it does have to consist of legal preprocessing tokens.  If you
think about it, this is highly desirable:  one reason for #if-ing out a
piece of code is that it uses implementation-specific features that
don't exist in the implementation it's currently being compiled on.
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu