[net.lang.c] Lattice/UNIX incompatibility

g-frank@gumby.UUCP (12/26/84)

> I looked at a Lattice C manual a few days
> ago.  The list of differences between Lattice C and K+R C seemed large,
> and then in browsing through the reference section I discovered further
> incompatibilities between Lattice and K+R.  I don't have similar
> reservations about Mark Williams compiler.
> 
> Kaare Christian

Which version of Lattice C did you look at?  As of 2.14, the "UNIX" interface
is much improved, with many more functions, and bit more sense to the ones
there are.  Lattice does have its own naming conventions for stuff, but also
offers the UNIX names, as far as I can tell.  The only big problem is the need
to use a "b" in the opentype string when opening files for "binary" (that is,
no translation of CR/LF pairs, no recognition of ^Z as EOF, etc.).  This is
100% UNIX incompatible, of course, but it's hard to see how else to resolve
it, given PC-DOS.

ark@alice.UUCP (Andrew Koenig) (12/26/84)

Every version of the Lattice compiler I have seen has four
non-standard things:

	1. Comments nest.

	2. If you try to pass a structure to a function
	   it quietly passes a pointer to the structure.

	3. Every declaration of an external variable but
	   one must say 'extern'.

	4. Case is ignored in external variables.

This makes it quite a nuisance to port Unix applications.

g-frank@gumby.UUCP (12/28/84)

> Every version of the Lattice compiler I have seen has four
> non-standard things:
> 
> 	1. Comments nest.
> 
> 	2. If you try to pass a structure to a function
> 	   it quietly passes a pointer to the structure.
> 
> 	3. Every declaration of an external variable but
> 	   one must say 'extern'.
> 
> 	4. Case is ignored in external variables.
> 
> This makes it quite a nuisance to port Unix applications.

1. Comment nesting can be turned off with a compiler option.

2. A pointer is passed, but not silently - a warning message is generated.

3. I'm not sure I understand this one.  If I declare a variable and don't
   use the extern qualifier, storage is allocated.  What do you mean?

4. This is an operating system problem.  Obviously it will cause some
   problems.  It is possible for non-extern idents to have long names
   (I think they are case-sensitive in any case), but the Microsoft linker
   and object file format don't accept long or case-sensitive names.  What
   does the proposed ANSI standard say about the issue?

henry@utzoo.UUCP (Henry Spencer) (12/29/84)

> Every version of the Lattice compiler I have seen has four
> non-standard things:
> 
> ...
> 
> 	3. Every declaration of an external variable but
> 	   one must say 'extern'.
> 
> 	4. Case is ignored in external variables.

These two are actually legitimate C.  In fact, if you look carefully
at K&R, it would appear to *require* #3, although in fact many of the
real implementations are looser.  (This is about the way the ANSI C
people are treating it, too.)  #3 is often necessary in non-Unix systems
because the linker insists that an occurrence of an external variable
is either (a) a reference to something declared elsewhere, or (b) a
(unique) declaration, and you *must* specify which.  So you cannot just
treat all occurrences as equivalent, the way the Unix setup does; one
of them (or all but one of them) must be specially marked.

#4 is likewise a legitimate variation when coping with stupid linkers.

Whether either of these is actually *necessary* in the environment the
Lattice compiler is running in, I can't say.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

henry@utzoo.UUCP (Henry Spencer) (12/30/84)

>  ... but the Microsoft linker
>  and object file format don't accept long or case-sensitive names.  What
>  does the proposed ANSI standard say about the issue?

The current draft says that the length limit (if any) and treatment
of case in external identifiers are "implementation-defined", which
means that implementors can do things as they wish but must document
their decisions.  Also, the length limit may not be shorter than 6.
There is a definite implication that arbitrary-length names are nice,
but it is politically necessary for conforming implementations to be
possible even on machines with old linker formats.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

Barry Gold <lcc.barry@UCLA-LOCUS.ARPA> (12/30/84)

>>	3. Every declaration of an external variable but
>>	   one must say 'extern'.
>
>...#3 is often necessary in non-Unix systems because 
>the linker insists that an occurrence of an external variable 
>is either (a) a reference to something declared elsewhere, or (b) a
>(unique) declaration, and you *must* specify which.  So you cannot just
>treat all occurrences as equivalent, the way the Unix setup does; one
>of them (or all but one of them) must be specially marked.

For any linker that supports FORTRAN, this restriction shows a lack of
imagination on the part of the c compiler writers.  Don't get me started
on this; it's a favorite flame.

To get the unix multiple declaration effect, you need only make the
declarations look like labelled COMMON to the linker.  This reserves space,
with all the areas overlapping and the *largest* determining the amount of
space allocated by the linker.

If you want to initialize such a variable, the (unique) occurrence with an
initializer should look like an ordinary data section (BLOCK DATA) to the
linker.  It *IS* true that at most one of the occurrences of an external
variable may have a initializer.  The other restriction shows stupidity
on the part of the compiler writer or a brain-damaged linker that doesn't
even support ANSI FORTRAN.

barry gold
lcc!barry@ucla-cs.ARPA
ucbvax!ucla-cs!lcc!barry

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/31/84)

> To get the unix multiple declaration effect, you need only make the
> declarations look like labelled COMMON to the linker.  This reserves space,
> with all the areas overlapping and the *largest* determining the amount of
> space allocated by the linker.

Many such linkers impose unacceptable restrictions on labeled COMMON, such
as: only a small number of them allowed; aligned on 4Kb boundaries; etc.

Although Ritchie favors the COMMON model, due to lack of universality the
ANSI C standards committee has settled on the DEF/REF model, with the COMMON
model relegated to the "Common extensions" appendix.

oacb2@ut-ngp.UUCP (oacb2) (12/31/84)

>>	3. Every declaration of an external variable but
>>	   one must say 'extern'.

> For any linker that supports FORTRAN, this restriction shows a lack of
> imagination on the part of the c compiler writers.  Don't get me started
> on this; it's a favorite flame.

Or, perhaps, just a desire for C to work correctly.  K & R specify this
behavior and doing otherwise invites nonportable code.
-- 

	Mike Rubenstein, OACB, UT Medical Branch, Galveston TX 77550

g-frank@gumby.UUCP (01/01/85)

> >>	3. Every declaration of an external variable but
> >>	   one must say 'extern'.
> 
> > For any linker that supports FORTRAN, this restriction shows a lack of
> > imagination on the part of the c compiler writers.  Don't get me started
> > on this; it's a favorite flame.
> 
> Or, perhaps, just a desire for C to work correctly.  K & R specify this
> behavior and doing otherwise invites nonportable code.
> -- 

I realize that it's normal under UNIX to be able to declare the same
variable multiple times and count on the linker to resolve it to a single
location.  I also realize that it's convenient to do this because one might
want to include such a declaration in a header file, to be read by all
compilands of a particular program.

Nonetheless, my common sense rebels.  A particular variable "belongs" to a
particular module, and is "external" to all other modules.  Real languages
enforce this restriction in a meaningful way (as does C++, I believe).
There IS semantic meaning to where a variable is declared.

ark@alice.UUCP (Andrew Koenig) (01/02/85)

I recently posted a note complaining that Lattice C
enforces the "single external declaration rule,"
which makes it harder to port programs from UNIX systems.

I did not expect the flood of messages that would ensue.

Let me correct some misconceptions.

First, the problem.  K&R [page 206] states:

	...in a multi-file program, an external data definition
	without the extern specifier must appear in exactly
	one of the files.  Any other files which wish to
	give an external definition for the identifier must
	include the extern in the definition.

Thus, on the surface, it would appear that there is no issue.
However, most UNIX systems permit extern to be omitted entirely
in all the files.  In effect, the linker permits an external
variable to be multiply defined.  This extension is so pervasive
that its use is also pervasive, to the extent that porting
almost any multi-file program to a compiler that enforces the
restriction is made much harder as a result.

On the surface, then, it would appear that I am merely campaigning
for my pet feature.  However, there is more to it than that:

	1. The 'feature' is actually the way things were
	at first, and still are in most UNIX systems.

	2. The restriction was thrown in as a sop to some
	non-UNIX operating systems, whose linkers make it
	tough to implement C without the restriction
	(although PL/I has exactly the same semantics for
	its external variables as "unrestricted" C, which
	makes it hard for me to understand why C can't
	do it on IBM systems).

	3. Common sense indicates that the definition of
	an external variable should appear only once in the
	program text, so that its type can be changed by
	only altering one thing.  The logical place
	for such a single definition is in an include file.
	However, under the restrictive definition of C,
	this is impossible: the program breaks whether the
	include file says "extern" or not.

	4. C++ enforces its own version of the restriction
	because it has to be able to generate C for compilers
	that also enforce the restriction.

I do not enjoy using programming systems that force me
to change things to make life easier for the computer.
I believe that machines should be my servants, not vice versa.

lauren@vortex.UUCP (Lauren Weinstein) (01/02/85)

I use C86 here.  It also enforces the external declarations in a manner
that, on the surface, makes multiple file include files difficult to
manage.  However, there is a simple solution, which, while not 
fancy, is fully compatible with all other compilers that I know of.

In the .h file to be included among various files, I add:

EXTERN

before each declaration.

Then, in one (usually the first) of my source files, I put

#define EXTERN

at the top, then

#define EXTERN extern

at the top of all the other source files.  This effectively takes
care of the problem, and lets me manage my .h files with a minimum
of hassle, under the circumstances.

--Lauren--

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (01/02/85)

> 	3. Common sense indicates that the definition of
> 	an external variable should appear only once in the
> 	program text, so that its type can be changed by
> 	only altering one thing.  The logical place
> 	for such a single definition is in an include file.
> 	However, under the restrictive definition of C,
> 	this is impossible: the program breaks whether the
> 	include file says "extern" or not.

No!  #include files should be used for extern DECLARATIONS.
The data/function DEFINITIONS, as observed, reasonably
appear in only one place, preferably in the source file
that handles the corresponding data.  (In repackaging of
older UNIX code such as the Bourne shell, people found it
more convenient to lump all extern data into a separate
file by itself, e.g. "data.c".)

Example (compressed):

	cc -o example main.c stack.c

contents of file "stack.h":

	extern int level;
	extern void push(double), reset(void);
	extern double pop(void);

contents of file "stack.c":

	#include "stack.h"
	#define MAXLEVEL 100
	int level = 0;	/* NOTE: This is NOT A CONFLICT. */
	static double stack[MAXLEVEL];
	void push(x) double x;	/* ANSI: is "double x;" necessary?? */
	{	if (level < MAXLEVEL) stack[level++] = x;	}
	void reset()
	{	level = 0;	}
	double pop()
	{	if (level > 0) return stack[--level];	}

contents of file "main.c":

	#include "stack.h"
	/*ARGSUSED*/ main(argc, argv) char **argv;
	{	for (;;)
		{	push(3.14159);
			if (level % 3 == 0)
				(void)pop();
			if (level == 10)
				return 0;
		}	/*NOTREACHED*/
	}

kc@rna.UUCP (Kaare Christian) (01/03/85)

When I originally entered my note on csd, I didn't expect one
small paragraph near the end to generate so much discussion.  The
oft quoted paragraph started by praising Mark Williams C compiler
for its Unixisms, and then I made an offhand remark about Lattice
C and its seemingly lower level of Unixness.

My thanks to ark at Alice for his list of four Lattice non-standards. I
agree with him that all four of those problems make it hard to
port Unix software to the PC using Lattice C.  The third problem,
Lattice's requirement for the extern phrase in all but one global
declaration, has generated the most controversy.  Lattice may be close
to what is spelled out in K+R, but it is clearly not the way Unix C
works and it certainly makes it hard to port Unix code to the pc.

My experience using Lattice C is NIL.  I have only looked at the
manual.  I do know someone who tried porting a Venix
application to the PC using Lattice.  He encountered problems.
The same application ported effortlessly using Mark Williams
compiler, including the assembly language in-lines.

Here are some additional rough spots in Lattice C that I have gleaned
from the Lattice C manual.

	5. No struct/union assignments.  This isn't mentioned in
Lattice's list of differences.  Struct/unions became full fledged
data objects (unlike arrays) several years ago.

	6.  In their list of differences, Lattice notes that they
use separate name pools for the members of each struct/union.  I
know that this is different from K+R, but it has been an accepted
upgrade in the Unix/C world for some years.

	7. Identical strings are stored together.  (I doubt that
this would cause many problems.)

	8. If a is an array, Lattice construes &a to be a pointer
to an array, not a pointer to the first element in a.  Most Unix
compilers take &a to mean the same thing as a itself - a pointer
to a's first element, not a pointer to an array. The type is
wrong, but the value is correct.

	9. The currency symbol is allowed in identifiers.

	10. Pre processor macros with arguments are limited to a
single line.  This is what the manual says. I don't know if you
can escape the newline as is routinely done in Unix versions of C.

	11. The open and fopen system calls must be told if you want to
treat the file as a binary file.  The default is to assume an ASCII
file, with disastrous consequences for many programs.  PCDOS knows
(exactly) the size of each file. Only a few programs use a Ctrl-z to
delimit the end of file - why bother.  There is at least one word proc
for the pc that delimits the text with a Ctrl-z and then places some
(binary) status info following the fake eof.  Ctrl-z as an eof for text is a
throwback to cpm/80 and it isn't necessary, or widely used in the pc.
I think the default for read (and probably for fread) should be
to give the programmer the data, raw and uninterpreted.

	12.  Multiple character constants (e.g. 'aa') (stored in ints or
longs) are allowed.  This feature probably shouldn't be used if
you want to port programs from Lattice to the real world, but I
doubt it would make it harder to port from real to Lattice.

And finally, let me note one peculiarity.  The modern Unix open
statement takes three arguments; the third is an access mode that
is used when the file is created.  Lattice's open doesn't accept
the third argument - this makes sense on pcdos because there
aren't any file access modes.  However the creat system call does allow
the mode argument, its meaning is supposedly explained in the
citation for open.

Lattice C looks like a good product for developing software on the
pc.  It certainly has a better price that Mark Williams, and many
netlanders are very happy with it.  I'm not trying to argue that
it is junk, I'm only noting some of its limitations.

My Mark Williams C manual is at home.  I'll post its advertised
shortcomings in a later note.

Kaare Christian		uucp:cmcl2!rna!kc

cottrell@nbs-vms.ARPA (01/04/85)

/*
>	3. Every declaration of an external variable but
>	   one must say 'extern'.

this sentence no verb.

*/

ron@brl-tgr.ARPA (Ron Natalie <ron>) (01/04/85)

> 
> 	7. Identical strings are stored together.  (I doubt that
> this would cause many problems.)
> 
Boy does this cause problems.  My application overwrites the string.
When I wanted to do it twice I did something like
	char	*str1 = "../../..";
	char	*str2 = "../../..";

> 	12.  Multiple character constants (e.g. 'aa') (stored in ints or
> longs) are allowed.  This feature probably shouldn't be used if
> you want to port programs from Lattice to the real world, but I
> doubt it would make it harder to port from real to Lattice.
> 
Character constants are INTS.  Most compilers allow you to use as may
chars in there constant as would fill out an int.

The next problem is that it doesn't allow functions returning char, but
it bitches when you try to return a char in an int function.

-Ron

ron@brl-tgr.ARPA (Ron Natalie <ron>) (01/04/85)

> >	3. Every declaration of an external variable but
> >	   one must say 'extern'.
> 
> this sentence no verb.
> 
> */

Every declaration (of an external variable, but one) must say 'extern'.

g-frank@gumby.UUCP (01/04/85)

> /*
> >	3. Every declaration of an external variable but
> >	   one must say 'extern'.
> 
> this sentence no verb.
> 
> */

I must disagree.

john@genrad.UUCP (John Nelson) (01/04/85)

>> 
>> 	7. Identical strings are stored together.  (I doubt that
>> this would cause many problems.)
>> 
>Boy does this cause problems.  My application overwrites the string.
>When I wanted to do it twice I did something like
>	char	*str1 = "../../..";
>	char	*str2 = "../../..";
>

defining these two strings as:

	char	str1[] = "../../..";
	char	str2[] = "../../..";

avoids the problem.  This defines two arrays (seperate storage) initialized
to be the value of the string, rather than two pointers to string storage.

I think that it is rather chancy to overwrite string literals, in any case.

guido@boring.UUCP (01/04/85)

In article <345@rna.UUCP> kc@rna.UUCP (Kaare Christian) gives a list
of Lattice incompatibilities with unix (not with K&R!), which is basically
correct.  Some minor corrections:

>	10. Pre processor macros with arguments are limited to a
>single line.  This is what the manual says. I don't know if you
>can escape the newline as is routinely done in Unix versions of C.

This can be done in the *definition* alright (though the whole of the
definition must be fairly short -- 150 chars in the version I'm used to),
but not in the *call*.

>	11. The open and fopen system calls must be told if you want to
>treat the file as a binary file.  The default is to assume an ASCII
>file, with disastrous consequences for many programs.
>I think the default for read (and probably for fread) should be
>to give the programmer the data, raw and uninterpreted.

The problem with this is that a newline on PC-DOS is CR LF, while most
UNIX programs assume to receive only a LF ('\n') at the end of a line.
This is a more important consequence of the binary/ascii mode given at
open time than the ^Z treatment.  Unfortunately the interface Lattice
choose was to set the mode at open time, and since their standard I/O
library uses their own read/write functions, one cannot specify that
the translation has to be done for getc calls but not for [f]read.
I would like it most if 'open' would have binary as default, and 'fopen'
ascii.

Regarding the ^Z treatment: this still makes some sense because the
PC-DOS COPY command, and probably others, places a ^Z after
each file copied in ascii mode (which is the default mode); naive C
programs would probably burp if they saw the ^Z.  We are, alas,
living with an OS whose main utilties don't know the power of the OS!
(RENAME can't move to another directory, etc., etc.)

	Guido van Rossum, "Stamp Out BASIC" Committee, CWI, Amsterdam
	guido@mcvax.UUCP

g-frank@gumby.UUCP (01/05/85)

> We are, alas,
> living with an OS whose main utilties don't know the power of the OS!
> (RENAME can't move to another directory, etc., etc.)
> 
> 	Guido van Rossum, "Stamp Out BASIC" Committee, CWI, Amsterdam
> 	guido@mcvax.UUCP

How about a "Stamp Out MS-DOS Committee"?

-- 
"good news is just life's way of keeping you off balance."