[comp.lang.c] "Numerical Recipes in C" is nonport

mcdonald@uxe.cso.uiuc.edu (08/29/88)

>As I understand the draft standard, you may portably compute address
>of the location *after* the last element of the array, but not the
>location *before* the first element of the array.

>Here is at least one architecture that breaks 80X86.

>    char *foo = malloc(65635)

>    /* foo <- address 4a33:0000 */

>    char *bar = foo-1;

>Ok, now what is the value of "(4a33:0000)-1"?  Answer: there isn't
>one.  The draft standard doesn't say that nonconforming programs will
>break on all machines, just that they won't work on all machines.
Interesting. The dovumentation says that the argument of malloc
is an unsigned int, for which the maximum value is 65535. Heaven knows
what would happen if you actually tried this. But let's say we go to
huge model and use halloc(65635,1) which is legal but obviously
nonportable. OR you could say

      char foo[65635]; 
and compile in the huge model. In either case you get something like
   
      2000:0000 for bar = foo;
 and  1000:FFFF for  bar = foo; bar--;
  and if you do bar = foo; bar--; bar++; you get back 2000:0000.

Thus on the 8086 in huge model it indeed works (but, we all agree,
is nonportable).

Actually if you write

     char foo[65536]; 

in large model it works: the pointer arithmetic works on only the
offset portion of the address, so you get wraparounds, but as
long as you don't try to dereference the resulting pointers everything
works. If you DO try to dereference bar = foo; bar--; you of
course actually access foo[65535].

In small data model, the first few bytes of the data segment
are reserved for the word "Microsoft" so the worst you can do
there is mess with their name. The last bytes of the segment are the
stack, so if you DO dereference past the end of the legal data
area, disaster most certainly COULD occur. 

The fact that it may work doesn't mean that it is pretty, though.

Doug McDonald
P.S. This is is for Microsoft C
.

mcdonald@uxe.cso.uiuc.edu (09/05/88)

>There exist machines whose protection philosophy is to prevent you from
>even thinking something illegal.  In particular, on the Unisys A-series,
>the compiler must implement all memory addressing protection--there is
>no kernel/user state protection on memory.*  A program cannot be allowed
>to form an invalid address, as there is nothing to stop it from using it,
>and nothing in the hardware to stop you from stomping on another user
>if you do.  Therefore, the compiler and the operating system would be
>written so as to cause an interrupt if computing 'b - 1' were attempted.

>Note that there is no C compiler for the A-series today, although one is
>rumored.
This seems logically inconsistent. You say that on the Unisys A-series
that the problem is in the compiler. But then you say there is no
C compiler. If the problem exists in other language compilers,
simply leave it out of C! Simply write the compiler so that it doesn't
check pointers. (If it did do that, wouldn't it be a horrendous time
penalty? Every time you said "pointer++" it would have to check
bounds, unless the pointer were declared the non-existant "noalias".)

What about assembly language? What is to stop things from 
happening there with out of bounds pointers?

Doug McDonald

ok@quintus.uucp (Richard A. O'Keefe) (09/06/88)

In article <225800063@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>This seems logically inconsistent. You say that on the Unisys A-series
>that the problem is in the compiler. But then you say there is no
>C compiler. If the problem exists in other language compilers,
>simply leave it out of C! Simply write the compiler so that it doesn't
>check pointers. (If it did do that, wouldn't it be a horrendous time
>penalty? Every time you said "pointer++" it would have to check
>bounds, unless the pointer were declared the non-existant "noalias".)
>
>What about assembly language? What is to stop things from 
>happening there with out of bounds pointers?
>
>Doug McDonald

WHAT assembly language?  NEWP (an Algol-like language) is as close to
assembly language as one gets on the A-series.  The point is that the
machine was not designed to do pointer arithmetic (there isn't even a
single notion of "address"; Indirect Reference Words and Indexed
Descriptors have different tags and interpretations).  The operating
system has been able to rely on this.  If you generate code to do
"pointer arithmetic" on, say, Indexed Descriptors, you find that (a) you
just bypassed the Virtual Memory system, and (b) bye-bye system integrity!
The compilers don't general special code to check pointers, so it isn't
something you can "leave out of" C.

The systems-programming languages do have things called pointers, which are
Indexed Descriptors, and can be adjusted.  But incrementing such a pointer
by N involves touching each word of storage referenced so that a boundary
word won't be missed (not a performance hit, because this is not the kind
of thing A-series machines are normally asked to do).

I know of two BCPL compilers for the A-series, one actual one and one that
was designed but not finished.  Neither of them was a pretty sight.  (The
PL/I compiler had similar troubles.)

The A-series is a "high level" architecture for a particular set of
languages (Algol, Fortran, COBOL) and that you can't expect languages
outside that set to map well onto it.

mcdonald@uxe.cso.uiuc.edu (09/09/88)

>What's to stop you from doing the following:

>	Generate code in an array.
>	Jump to the beginning of the array. *

>Now you've blown the protection. You can do anything. I hope this isn't a
>multiuser machine...
It is certainly possible to design machine\compiler combinations that
prevent this. I call them "totalitarian " or "Stalin" operating systems.
Apparently ANSI C does not prohibit this behaviour: a fatal flaw
in the ANSI standard. IF you can't do this, an entire class of programs
becomes absolutely impossible: incremental compilers. It would prohibit
a Turbo C or Quick C clone, for example. All of my programs I have designed
for teaching chemistry and physics wouldn't work.  It is even possible
to design an operating system so that is is impossible (inside it of course)
to write compilers: there is some magic cookie necessary to make
an executable file, and no compiler or assembler allows setting such 
cookie *. VMS makes it rather difficult to set such a thing (but 
possible). Does the Unisys A series REALLY make it all that impossible?
If so, maybe that is why no one has ever heard of them!

Doug McDonald

* I mean that the compiler can make an executable, but that you can't
write a program that will make an executable.

dricej@drilex.UUCP (Craig Jackson) (09/11/88)

In article <225800063@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:

[Although there is no attribution, I wrote the >> stuff.  CEJ]
>>There exist machines whose protection philosophy is to prevent you from
>>even thinking something illegal.  In particular, on the Unisys A-series,
>>the compiler must implement all memory addressing protection--there is
>>no kernel/user state protection on memory.*  A program cannot be allowed
>>to form an invalid address, as there is nothing to stop it from using it,
>>and nothing in the hardware to stop you from stomping on another user
>>if you do.  Therefore, the compiler and the operating system would be
>>written so as to cause an interrupt if computing 'b - 1' were attempted.
>
>>Note that there is no C compiler for the A-series today, although one is
>>rumored.
>This seems logically inconsistent. You say that on the Unisys A-series
>that the problem is in the compiler. But then you say there is no
>C compiler. If the problem exists in other language compilers,
>simply leave it out of C! Simply write the compiler so that it doesn't
>check pointers. (If it did do that, wouldn't it be a horrendous time
>penalty? Every time you said "pointer++" it would have to check
>bounds, unless the pointer were declared the non-existant "noalias".)

You don't completely understand.  The problem is not in the compiler,
the 'problem' is in the architecture that leaves things up to the compiler
to check.  Theoretically the system could be a little faster by not doing
as many security checks at run time; in reality, they don't save any logic,
I believe.

The upshot of this is that if you wrote a compiler that allowed undisciplined
pointer operations, the system would be about as safe as MS-DOS.

The nice thing about the hardware is that "pointer++" is checked by the
hardware, assuming that the arrays are set up in the normal manner.  There's
a special 'add to pointer' instruction, which checks the tags on memory.
There's another instruction to 'subtract from pointer', which is going
to be used for 'int b[10];int *bb = b - 1;'.  This instruction, in
attempting to move the pointer down from the beginning of the array,
would hit a word with an illegal tag and cause an interrupt.

>What about assembly language? What is to stop things from 
>happening there with out of bounds pointers?

There is no assembler for the A-series.  Normal programs are written in
ALGOL, COBOL, FORTRAN, PASCAL, or PL/I.  The operating system, and certain
operating systems extensions, are written in an extended ALGOL called NEWP.
NEWP cannot be used to write normal user programs, and NEWP libraries (which
are sort of an operating system extension) must be blessed by the operator
before they are executed.

>Doug McDonald

As a further note, I believe that one reason why A-series C might not
use the hardware stack and hardware pointers in a normal manner is
varargs.  What can be done about a system which *must* check argument
count & type before execution?
-- 
Craig Jackson
UUCP: {harvard!axiom,linus!axiom,ll-xn}!drilex!dricej
BIX:  cjackson

gwyn@smoke.ARPA (Doug Gwyn ) (09/11/88)

In article <225800065@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>>	Generate code in an array.
>>	Jump to the beginning of the array. *
>It is certainly possible to design machine\compiler combinations that
>prevent this. I call them "totalitarian " or "Stalin" operating systems.
>Apparently ANSI C does not prohibit this behaviour: a fatal flaw
>in the ANSI standard. IF you can't do this, an entire class of programs
>becomes absolutely impossible: incremental compilers. It would prohibit
>a Turbo C or Quick C clone, for example. All of my programs I have designed
>for teaching chemistry and physics wouldn't work.

I'm getting a bit tired of talk about "fatal flaws" in the proposed
ANSI C standard from people who don't understand the goals and
constraints under which such a standard is developed.  It is simply
NOT FEASIBLE for a global C standard to dictate characteristics of
an implementation environment such as the ability to (somehow) switch
the thread of execution into a process's data space.  The proposed C
standard does not prohibit an implementation from offering support
for such a feature, but it also does not require such support.
Any application that depends on such a feature, or on dynamic linking,
communication with coprocesses, or other specific techniques for
run-time creation and execution of machine instructions, is already
inherently nonportable.  It is not the job of a C standard to render
already nonportable code suddenly, magically portable.

Feel free to do anything that happens to work at the moment on your
particular system.  Just be aware that it may not work elsewhere or
elsewhen, and please have the good sense not to blame this on
people who have no direct control over that aspect of reality.

ldh@hcx1.SSD.HARRIS.COM (09/13/88)

This may have been specified before ... but I may have missed it.

1)      is "numerical recipes in C" PD, Shareware or $$$$$$$
2)      where do I get a copy of it
3)      I gather from the discussions that it will work on a PC, but which
	compiler is best suited to the games they play with the arrays? (TC1.5?)
4)      will it work (at all?) better with sysV or UCB compilers/libs ?

Thanks to all ...

Leo Hinds

*net:   ldh@hdw.harris.com      uunet!hcx1!hardy!ldh

mcdonald@uxe.cso.uiuc.edu (09/15/88)

In article <225800065@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu 
(that's me folks) writes:
>>	Generate code in an array.
>>	Jump to the beginning of the array. *
>It is certainly possible to design machine\compiler combinations that
>prevent this. I call them "totalitarian " or "Stalin" operating systems.
>Apparently ANSI C does not prohibit this behaviour: a fatal flaw
>in the ANSI standard. IF you can't do this, an entire class of programs
>becomes absolutely impossible: incremental compilers. It would prohibit
>a Turbo C or Quick C clone, for example. All of my programs I have designed
>for teaching chemistry and physics wouldn't work.

The usually sane Doug Gwyn replies:

>I'm getting a bit tired of talk about "fatal flaws" in the proposed
>ANSI C standard from people who don't understand the goals and
>constraints under which such a standard is developed.  It is simply
>NOT FEASIBLE for a global C standard to dictate characteristics of
>an implementation environment such as the ability to (somehow) switch
>the thread of execution into a process's data space.  The proposed C
>standard does not prohibit an implementation from offering support
>for such a feature, but it also does not require such support.
>Any application that depends on such a feature, or on dynamic linking,
>communication with coprocesses, or other specific techniques for
>run-time creation and execution of machine instructions, is already
>inherently nonportable.  It is not the job of a C standard to render
>already nonportable code suddenly, magically portable.

I don't care one whit about what the goals and constraints of X3J11
(or X3J3 for that matter) ARE. I care about what they OUGHT to do.
I don't see why being able to create code and execute it could
cause the hardware of any machine fits. I can see how it might make
a compiler vendor have fits if a cast of a data pointer to a code pointer
wasn't simply a no-op, as it is on most sane machines. On the
vast majority of machines it IS either a no-op, or , for example
in OS/2, there is a simple system call that turns a data pointer to
a code pointer which you can call. The cast would simply have to call
the operating system. I can conceive of an architecture where it
is absolutely impossible to have code and data in the same address 
space: say a physically different memory. But even there it could
be done: somehow the system has to get code into the code memory,
prehaps the only way being to write it to disk and read it out. In
that case the run time library has to write out the data, and read
it back in. I don't accept the argument that "our operating
system doesn't allow user programs to do that". If it were in the
C language spec they would have to CHANGE THE OPERATING SYSTEM TO
MAKE IT WORK or else admit "our operating system is so broken that
we can't have a C compiler". I want it put in the language definition
so that systems that can't do it are made to say to all the world
"Look at me, I'm the big bright computer of the future, I'll tell
you how great this hotshot new protection scheme is, it's
so great that I'm terminally unable to offer a C compiler to my 
users (if there are any)." I want the C standard to essentially
force vendors to fix their machines.
   Dynamic linking, coprocessors, etc. really ARE operating system
issues, and outside C. I am less than happy over the raw-terminal-io
discussion going on in another comp.lang.c thread: I think that
a portable way to get raw io MIGHT be possible, and should be
thought about. But the issue there is PORTABILITY, not IMPOSSIBILITY.
 
   I find it quite interesting to compare X3J11 to X3J3. X3J3 has
been known to give the same argument that Gwyn uses, to wit, 
"it would discombobulate one vendor" to argue against adding features
to Fortran, when the very same features are ALREADY in C! Among
these are bit operations ( | & ^ in C) and external names longer than
6 (six) characters.  

Doug McDonald

rob@kaa.eng.ohio-state.edu (Rob Carriere) (09/16/88)

In article <44100012@hcx1> ldh@hcx1.SSD.HARRIS.COM writes:
>This may have been specified before ... but I may have missed it.
>1)      is "numerical recipes in C" PD, Shareware or $$$$$$$
None of the above.  It is book, published by the Oxford University
Press for 40-some dollars.  The programs listed in it can be obtained
in machine readable form for another 20 or so.

>2)      where do I get a copy of it
See above.

>3)      I gather from the discussions that it will work on a PC, but which
>compiler is best suited to the games they play with the arrays? (TC1.5?)
I have no idea, but you can always eliminate the ``games'' at the cost
of a small amount of storage.

>4)      will it work (at all?) better with sysV or UCB compilers/libs ?
I am using it on a Sun 3/50 (BSD) with both cc and GNU cc.  'Tworks
fine.

Rob Carriere

ok@quintus.uucp (Richard A. O'Keefe) (09/16/88)

In article <225800069@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>
>In article <225800065@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu 
>(that's me folks) writes:
>>>	Generate code in an array.
>>>	Jump to the beginning of the array. *
>The usually sane Doug Gwyn replies:
>
>>I'm getting a bit tired of talk about "fatal flaws" in the proposed
>>ANSI C standard from people who don't understand the goals and
>>constraints under which such a standard is developed.  ...
>>It is not the job of a C standard to render
>>already nonportable code suddenly, magically portable.
>
>I don't care one whit about what the goals and constraints of X3J11
>(or X3J3 for that matter) ARE. I care about what they OUGHT to do.
>I don't see why being able to create code and execute it could
>cause the hardware of any machine fits.

The most famous example is the B6700, where memory consisted of 52-bit
words (1 parity, 3 tag, 48 data).  Even tags (0 = single precision,
2 = double precision, 4 & 6 hairy) were things user code could manipulate,
odd tags (1 = indirect reference, 5 = array description, 7 = procedure,
3 = boundary/stack control word/code) were not.  At my home university
we installed a hack (for the benefit of a load-and-go Fortran compiler)
which took an array and changed it to code.  But you couldn't use it as
code and data *both* at the same time, and there were a number of other
restrictions.  When MCP 3.0 of the operating system came out, a better
approach would have been to create a code file and attach it as a dynamic
library (that way the code would not have been locked in physical memory).

There are quite a few machines with separate I/D.  The UNIX PERQ was (is?)
one of them.  Some modern RISCs are.  A micro-controller with execute
access only to a ROM would not be able to do this.  And so on.

But all of this misses what I think Doug Gwyn's point is.
If you are generating code into an array, *that* part of the program is
*already* non-portable (because the code is machine-dependent).  The
ANSI C commmittee cannot be expected to demand that everyone emulate
the 80286 in order to make programs which generate 80286 code into an
array and jump to it portable.  If you move your program to another
machine you are going to have to rewrite much if not most of the code
that generates the instructions.  What is so terrible about changing
the call as well?

gwyn@smoke.ARPA (Doug Gwyn ) (09/16/88)

In article <225800069@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>If it were in the
>C language spec they would have to CHANGE THE OPERATING SYSTEM TO
>MAKE IT WORK or else admit "our operating system is so broken that
>we can't have a C compiler". ... I want the C standard to essentially
>force vendors to fix their machines.

X3J11 has rightly observed that such an attitude would most likely
lead to the ANSI C standard failing to gain the widespread support
necessary for a true standard.  There are practical reasons for
promoting a C standard, but imposition of a particular philosophy
of hardware architecture design on the computing industry is not
one of them.

>... when the very same features are ALREADY in C! Among
>these are bit operations ( | & ^ in C) and external names longer than
>6 (six) characters.  

C extern names are not necessarily unique beyond 6 characters,
monocase.  In some environments they are and in some they aren't.
Acknowledging this constraint was one of the most distressing
decisions that X3J11 had to make.  But the fact is, many C
implementors are not in a position to improve the linker that
will of necessity be used with the object code their compiler
generates.

dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/17/88)

In article <8507@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>C extern names are not necessarily unique beyond 6 characters,...
>But the fact is, many C
>implementors are not in a position to improve the linker that
>will of necessity be used with the object code their compiler
>generates.

(This is not meant to be a flame, just a comment.)

I think Doug Gwyn exaggerates in saying "many" and "of necessity".
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

gwyn@smoke.ARPA (Doug Gwyn ) (09/17/88)

In article <3981@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
->But the fact is, many C
->implementors are not in a position to improve the linker that
->will of necessity be used with the object code their compiler
->generates.
-I think Doug Gwyn exaggerates in saying "many" and "of necessity".

No.  (Sometimes I wonder why I waste my breath, er, fingers.)

henry@utzoo.uucp (Henry Spencer) (09/18/88)

In article <3981@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>>But the fact is, many C
>>implementors are not in a position to improve the linker that
>>will of necessity be used with the object code their compiler
>>generates.
>
>I think Doug Gwyn exaggerates in saying "many" and "of necessity".

No.  The world does not consist primarily of Unix systems with sources, or
of hobbyist-owned micros that can abandon standard software whenever it's
convenient to do so.  Most C compilers have to fit into existing environ-
ments, which the compiler writer cannot change without greatly diminishing
the market for his compiler.  Given a choice of conforming to ANSI C or
conforming to the de facto standards set by the operating system in question,
most compiler writers know which side their bread is buttered on.  Speaking
as an amateur compiler writer with professional compiler-writer friends,
we don't like this any more than you do.  We don't like income tax, either.
We have no illusions about being able to change either problem.
-- 
NASA is into artificial        |     Henry Spencer at U of Toronto Zoology
stupidity.  - Jerry Pournelle  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

will.summers@p6.f18.n114.z1.fidonet.org (will summers) (09/18/88)

 In article <3981@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
 > In article <8507@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>)
 > writes:
 > >C extern names are not necessarily unique beyond 6 characters,...
 > I think Doug Gwyn exaggerates in saying "many" and "of necessity".
 
I hate this restriction (big deal! **everybody hates this**, even the 
committee!)
 
So what to do?
 
Liberally paraphrasing the Rationale:
 dpANS work-around 2:
   Use defines:
       #define real_long_name   a_xyz_real_long_name
       #define real_long_name2  a_rwt_real_long_name2
 
 dpANS work-around 3:
   Use longer names and kiss portability to short-extern environments goodby.
 
What to do?
 
Well dpANS *permits* the implementor to honor as much significance as he 
wishes. In practice an implementor affected by market forces will honor as 
many characters as his environment permits.
 
So I choose (3), and will add #defines al'a (2) if I ever need to port to a 
short-extern environment.
 
I think so many programmers in longer-extern environments will do the same 
that those importing to short-extern environments will encounter the problem 
often enough to develop tools to generate the #defines automatically.
 
    \/\/ill


--  
St. Joseph's Hospital/Medical Center - Usenet <=> FidoNet Gateway
Uucp: ...{gatech,ames,rutgers}!ncar!noao!asuvax!stjhmc!18.6!will.summers

seanf@sco.COM (Sean Fagan) (09/18/88)

In article <225800069@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
[lots of ranting and raving, deleted; see Doug Gwyn (the "normally sane")'s]
[reply for some good answers]
>I can conceive of an architecture where it
>is absolutely impossible to have code and data in the same address 
>space: say a physically different memory.

The PDP-11, using split I&D, is unable to generate code and then jump to it.
The 80386 (and, I think, the 80286) cannot execute data.  You have to change
permissions for the page (or segment, I forget which).  Um,  I don't see a
problem with this.

>somehow the system has to get code into the code memory,
>prehaps the only way being to write it to disk and read it out. 

Yep, that's how XENIX does it.  Reads it all into data segments, and then
changes it (magicly) into a text segment.  Again, I see nothing wrong with
this.

>I don't accept the argument that "our operating
>system doesn't allow user programs to do that". If it were in the
>C language spec they would have to CHANGE THE OPERATING SYSTEM TO
>MAKE IT WORK or else admit "our operating system is so broken that
>we can't have a C compiler".

(Please excuse me, I don't normally do this.  Also, I would like to
reiterate the disclaimer below:  I alone share my opinions.)

Your problem is that you grew up on a machine which could execute data (such
as a VAX), and you think that all machines should then be like that.  You
are ranting and raving, calling Doug Gwynn insane (ok, you didn't out and
out say that, but you darn well implied it), and also insuating that the
X3J11 committee, me, gobs of other people in the world, and the Intel
Microprocessor design team is brain damaged and/or incompetent (well, Intel
is questionable 8-) ).

Perhaps we should also put in bit counting operators into C.  Then, we could
write programs that require said operator, and say that all other machines
are slow and stupid because they don't have such things built into the
hardware (CDC Cybers do, Crays might, it was the only think I could think of
at 12:40 am 8-)).  Or maybe we should require that all ints be 32 bits.  And
doubles be 64 bits, to hell with any machine which has a superior floating
point scheme.

There is very rarely any need to be able to execute data that you have
created on the fly.  If you really need to, you can create an executable
relatively easily, and then execute that.  I, personally, dislike the idea,
but that's just MHO.  Not all machines are alike, nor are all memory
management schemes, nor are all operating systems.  And, like it or not, at
no point in C's history was it stated (or even implied) that you could jump
to data.  All the function pointers in K&R were assigned to functions
created by the programmer (such as main, exit, printf, etc.), or NULL (which
is, of course, a valid pointer).

Argc.

>Doug McDonald


-- 
Sean Eric Fagan  | "Joy is in the ears that hear, not in the mouth that speaks"
seanf@sco.UUCP   |     -- Saltheart Foamfollower (S. R. Donaldson)
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

sjs@jcricket.ctt.bellcore.com (Stan Switzer) (09/19/88)

In article <1988Sep17.212624.8858@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
> In article <3981@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
> >>But the fact is, many C
> >>implementors are not in a position to improve the linker that
> >>will of necessity be used with the object code their compiler
> >>generates.
> >
> >I think Doug Gwyn exaggerates in saying "many" and "of necessity".
> 
> No.  The world does not consist primarily of Unix systems with sources, or
> of hobbyist-owned micros that can abandon standard software whenever it's
> convenient to do so.

Two is a couple.  A few is at least three (in my book).  I guess
*many* will have to be at least four.  Let's put this question to the
test.

"How many C implemenations are constrained by 6 character monocase
linkers and how badly are they constrained?"

In order to avoid netting too many red herring, we'll exclude machine
and operating system combinations for which no C compilers exist (if
there is a viable implementation in the works, we'll let it slide).
Also, different designations of the same basic architecture or OS count
only once.

I can think of one, so I'll start:

  1) GECOS / GCOS / GCOS 8
     for the GE 600 / Honeywell 6000 / DPS 8 series

Being essentially quantitative, the first part of this controversy is
easier to resolve than the second, but as of my last experience w/
GCOS (1982), I don't feel I'd have lost very much in abandoning the
standard linker in favor of a "C" linker.

Stan Switzer  sjs@ctt.bellcore.com

ok@quintus.uucp (Richard A. O'Keefe) (09/20/88)

In article <10295@bellcore.bellcore.com> sjs@ctt.bellcore.com (Stan Switzer) writes:
>> >>But the fact is, many C
>> >>implementors are not in a position to improve the linker that
>> >>will of necessity be used with the object code their compiler
>> >>generates.
>Two is a couple.  A few is at least three (in my book).  I guess
>*many* will have to be at least four.  Let's put this question to the
>test.
>I can think of one, so I'll start:
>  1) GECOS / GCOS / GCOS 8
>     for the GE 600 / Honeywell 6000 / DPS 8 series
Here are two very well known ones:
   2) MVS/XA for IBM S/370 series
   3) VM/CMS for IBM S/370 series
There are some similarities between these two operating systems, but
there are major differences too.  There is a Japanese workalike for
MVS, but let's ignore workalikes.  The S/370 range can run System V
(Amdahl's UTS) and SunOS, but they haven't got this linker problem (:-).
There was a C compiler for TOPS-10 on the DEC-10, but I guess we can
regard TOPS-10 as dead and not count it.

One more, and we'll be there!

But the question is not the number of _system types_ but the number of
_implementors_.  I know of four C compilers for VM/CMS, and I'm sure there
must be more in progress.

seanf@sco.COM (Sean Fagan) (09/21/88)

In article <10295@bellcore.bellcore.com> sjs@ctt.bellcore.com (Stan Switzer) writes:
>Two is a couple.  A few is at least three (in my book).  I guess
>*many* will have to be at least four.  Let's put this question to the
>test.
>I can think of one, so I'll start:
>  1) GECOS / GCOS / GCOS 8

   2) CDC Cybers, 170 series.  (I have to hedge a bit here, we can use *7*
character identifiers, but, since it also uses, I believe, an underscore,
that takes up one of the characters.)  It is, however, monocase.
And, surprising though it may be to those who know the machine (and those 
who don't should 8-)), there exist at least *two* C Compilers for the macine:
UofTexas (or is it Austin, I forget) ported PCC to NOS (ugh!), and I and a 
couple of friends (Hi mike!) ported Small-C (almost as much ugh!).   
The Compilers work, but there is not much we can do about the linker (part 
of the operating system, you see; generally, you build a ".o" equivilent, 
then, when you try to run it, the OS recognizes that it is non-linked and 
then proceeds to link it).

>Stan Switzer  sjs@ctt.bellcore.com


-- 
Sean Eric Fagan  | "Never underestimate the bandwith of a pickup full of
seanf@sco.UUCP   |     9-track tapes!"  - Eric Green (elg@killer)
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

will.summers@p6.f18.n114.z1.fidonet.org (will summers) (09/21/88)

(Re: dpANS guarentee of only 6 monocase characters of external name 
     significance)

In article <10295@bellcore.bellcore.com> sjs@jcricket.ctt.bellcore.com 
(Stan Switzer) writes:
 > Two is a couple.  A few is at least three (in my book).  I guess
 > *many* will have to be at least four.  
 
Ah... the way I heard it was two's company, three's a crowd, four's a
fist fight and five's a riot.  Guess we need six.  :-)


 > "How many C implemenations are constrained by 6 character monocase
 > linkers and how badly are they constrained?"

 >   1) GECOS / GCOS / GCOS 8
 >      for the GE 600 / Honeywell 6000 / DPS 8 series
 > 
 > Being essentially quantitative, the first part of this controversy is
 > easier to resolve than the second, but as of my last experience w/
 > GCOS (1982), I don't feel I'd have lost very much in abandoning the
 > standard linker in favor of a "C" linker.

I believe the committee's concern was over those installations where
security prevented all but "secure" programs from generating an 
executable module.  Does GCOS qualify?  I -think- the waterloo C compiler
for GCOS (single segment) recoginzes 100 case-siginificant characters 
in external names.

I am a supporter of dpANS, but have trouble understanding this decision.
Even if the implementor could not generate his own linker, it would 
seem that he could implement a pre-link pass that mapped longer 
identifiers in the .o files (or whatever).  Non-dpANS .LIB  files 
would need an associated mapping file.  Maybe I just don't understand
but it seems a small price for the rest of the world to enhjoy 32-bit
externs.                                 

I forsee this limitation as one of the most widely ignored, even by 
many programmers that are otherwise careful about portability 
considerations.

    \/\/ill 

    


--  
St. Joseph's Hospital/Medical Center - Usenet <=> FidoNet Gateway
Uucp: ...{gatech,ames,rutgers}!ncar!noao!asuvax!stjhmc!18.6!will.summers

mcdonald@uxe.cso.uiuc.edu (09/22/88)

>I can think of one, so I'll start:
>  1) GECOS / GCOS / GCOS 8

>   2) CDC Cybers, 170 series.  

And a third: PDP-11/RT11.

And all of this is rather unimportant, because it should be possible
to write a linker that links all the C files together and leaves only
operating system calls and calls to other languages for the system linker.

henry@utzoo.uucp (Henry Spencer) (09/22/88)

In article <1305@scolex> seanf@sco.COM (Sean Fagan) writes:
[6-character linkers in C environments]
>>  1) GECOS / GCOS / GCOS 8
>
>   2) CDC Cybers, 170 series.  (I have to hedge a bit here, we can use *7*
>character identifiers, but, since it also uses, I believe, an underscore,
>that takes up one of the characters.)  ...

Unless RT-11 has changed a lot since I last saw it, it's a 6-character
environment.  And yes, there is at least one C compiler for it.
-- 
NASA is into artificial        |     Henry Spencer at U of Toronto Zoology
stupidity.  - Jerry Pournelle  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

smryan@garth.UUCP (Steven Ryan) (09/23/88)

>The Compilers work, but there is not much we can do about the linker (part 
>of the operating system, you see; generally, you build a ".o" equivilent, 
>then, when you try to run it, the OS recognizes that it is non-linked and 
>then proceeds to link it).

Which is nice. 170 Loader runs like a bat out of hell because it has to.
ld runs like a turtle out of antartica.

dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/23/88)

In article <225800072@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
[re linkers with 6-char limit]

>And all of this is rather unimportant, because it should be possible
>to write a linker that links all the C files together and leaves only
>operating system calls and calls to other languages for the system linker.

Actually, it's even easier than that.  The C compiler can generate an
internal object format.  A custom post-processor takes these object
files, scans for all long identifiers, shortens them to unique 6-char
names, and produces as its output system-format object files ready for
the standard linker.  No linking need be done by this post processor.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

ok@quintus.uucp (Richard A. O'Keefe) (09/23/88)

In article <4071@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>In article <225800072@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>[re linkers with 6-char limit]
>Actually, it's even easier than that.  The C compiler can generate an
>internal object format.  A custom post-processor takes these object
>files, scans for all long identifiers, shortens them to unique 6-char
>names, and produces as its output system-format object files ready for
>the standard linker.  No linking need be done by this post processor.

There are several reasons why one wants the names in the source code to
bear a simple predictable relation to the names the system sees, such as
mixed language programming and system-supplied debugging tools like load
maps.  The people in comp.lang.c++ often complain about compiler-
generated names.

There was a program posted to one of the sources news-groups a while back
that did the long name -> unique name mapping on the source code; sorry I
can't remember the name or the date, ask in comp.sources.wanted.

gwyn@smoke.ARPA (Doug Gwyn ) (09/24/88)

In article <703.2339B3CB@stjhmc.fidonet.org> will.summers@p6.f18.n114.z1.fidonet.org (will summers) writes:
>but it seems a small price for the rest of the world to enhjoy 32-bit
>externs.                                 

Nothing is stopping the rest of the world from enjoying 32-bit externs.
A little (very little) information theory will show that this cannot
be guaranteed by any amount of trickery in a 6-character extern
environment, if one does not have control over the linker etc.

The proposed ANS for C does NOT repeat NOT prohibit implementations
from supporting more than 6 monocase characters of significance in
external identifiers.

>I forsee this limitation as one of the most widely ignored, even by 
>many programmers that are otherwise careful about portability 
>considerations.

It's already ignored, and already causes problems.

henry@utzoo.uucp (Henry Spencer) (09/25/88)

In article <4071@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>Actually, it's even easier than that.  The C compiler can generate an
>internal object format.  A custom post-processor takes these object
>files, scans for all long identifiers, shortens them to unique 6-char
>names, and produces as its output system-format object files ready for
>the standard linker.  No linking need be done by this post processor.

Right, so we build it into the output phase of the compiler, since it
doesn't have to do any linking.  Now we have a compiler whose output
contains only 6-character names.  How is this an improvement on simply
doing that from the beginning?  Remember that the rule applies only to
external names, so it's how the names appear to the outside world --
to libraries, to modules written in other languages, to linkers -- that
matters.  It's easy to say "shortens them to unique 6-char names", but
making that nice phrase *work* is just a wee bit harder.
-- 
NASA is into artificial        |     Henry Spencer at U of Toronto Zoology
stupidity.  - Jerry Pournelle  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/26/88)

I wrote:

     A custom post-processor takes these object files, scans for all
     long identifiers, shortens them to unique 6-char names, and
     produces as its output system-format object files ready for the
     standard linker.

In article <1988Sep24.212346.26591@utzoo.uucp> henry@utzoo.uucp (Henry
Spencer) writes:
>Right, so we build it into the output phase of the compiler, since it
>doesn't have to do any linking.  Now we have a compiler whose output
>contains only 6-character names.  How is this an improvement on simply
>doing that from the beginning?

*If* existence of the post-processor could be assumed on the handful of
systems with old linkers, using the post-processor would be better than
using 6-char externs in the source to begin with, because:

     It would let people on systems with modern linkers use long
     externs in their C programs, knowing that their code would still
     be portable to systems with old 6-char linkers.

Existence of the post-processor could be assumed *if* ANSI were to
mandate long externs in all conforming compilers and recommend such
post-processing to implementors stuck with an old linker.

By the way, you can't really build the post-processor into the output
phase of the compiler.  It has to have access to all user files that
will be linked so it can look for conflicting symbols and disambiguate
them.  The compiler itself might be used in a makefile to compile only
one file at a time, so it won't know about all identifiers that
conflict when truncated to 6 characters.

(The above discussion is largely moot, because the 6-char limit on
portable programs is here to stay for the next few years.  But it's
worth seeing that this limit was not necessary, and that the common
arguments in its favor are not valid.  This is *not* meant to be a
flame.)
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

news@ism780c.isc.com (News system) (09/27/88)

In article <8569@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>The proposed ANS for C does NOT repeat NOT prohibit implementations
>from supporting more than 6 monocase characters of significance in
>external identifiers.
>
Absolutly true.  But it does prevent me form *using* external identifiers
with more than 6 monocase characters if I want to be certain that my programs
will be accepted by *all* conforming C compililation systems.

   Marv Rubinstein

henry@utzoo.uucp (Henry Spencer) (09/28/88)

In article <16711@ism780c.isc.com> marv@ism780.UUCP (Marvin Rubenstein) writes:
>>The proposed ANS for C does NOT repeat NOT prohibit implementations
>>from supporting more than 6 monocase characters of significance in
>>external identifiers.
>Absolutly true.  But it does prevent me form *using* external identifiers
>with more than 6 monocase characters if I want to be certain that my programs
>will be accepted by *all* conforming C compililation systems.

No, not quite right.  For one thing, the identifiers can be longer than
6 characters, they just can't *rely* on being longer, i.e. they must be
distinct in the first six.  And second, it is not ANSI which is causing
this, it is the deficiencies of existing computer systems.  Anyone who
wants to be certain about portability has had to observe this restriction
all along.  Moreover, it is not within ANSI's powers to cure that, since
the systems that have the 6-character limit are the ones that can't change
easily anyway.

Encore une fois:  standards committees are in the business of recognizing
reality, not trying to change it just because the new version would be nicer.
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

henry@utzoo.uucp (Henry Spencer) (09/30/88)

In article <4111@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>     A custom post-processor takes these object files, scans for all
>     long identifiers, shortens them to unique 6-char names, and
>     produces as its output system-format object files ready for the
>     standard linker.
>
>By the way, you can't really build the post-processor into the output
>phase of the compiler.  It has to have access to all user files that
>will be linked so it can look for conflicting symbols and disambiguate
>them...

So we are talking about a partial linking step after all.  And the
postprocessor has to scan all the libraries, to prevent name conflicts
with them.  And the object modules and libraries can't be postprocessed
until linking time.  How, precisely, is this different from defining a
new object-module format and writing a new linker?
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

news@ism780c.isc.com (News system) (10/01/88)

Doug?
>>The proposed ANS for C does NOT repeat NOT prohibit implementations
>>from supporting more than 6 monocase characters of significance in
>>external identifiers.

[Marv]
>>Absolutly true.  But it does prevent me form *using* external identifiers
>>with more than 6 monocase characters if I want to be certain that my programs
>>will be accepted by *all* conforming C compililation systems.

[Henry]
>No, not quite right.  For one thing, the identifiers can be longer than
>6 characters, they just can't *rely* on being longer, i.e. they must be
	       ^^^^
		  who are the 'they' that can't rely? :-)

>distinct in the first six.  And second, it is not ANSI which is causing
>this,

[Marv again]
I was not suggesting that ANSI should do anything about the 6 character
problem.  I was just pointing out even though some compiler implementers are
kind enough to provide long names, I could not take advantage of their
kindness and write programs with names like 'interval_two' and
'interval_three' if I want to run on a old fashion system.

BTW.  It isn't all that hard to supply long names on the old systems.  I
once had to write a compiler supporting long names on a 6 character system.
What I did was write my own library-archive program and my own linker.  My
linker linked objects from the special archive and built a module that the
standard system linker could process so as to finish the job.  The effort
added about six staff weeks to the compiler project.

   Marv Rubinstein

gwyn@smoke.ARPA (Doug Gwyn ) (10/03/88)

In article <16711@ism780c.isc.com> marv@ism780.UUCP (Marvin Rubenstein) writes:
-In article <8569@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
->The proposed ANS for C does NOT repeat NOT prohibit implementations
->from supporting more than 6 monocase characters of significance in
->external identifiers.
-Absolutly true.  But it does prevent me form *using* external identifiers
-with more than 6 monocase characters if I want to be certain that my programs
-will be accepted by *all* conforming C compililation systems.

Wrong -- it is not the dpANS that prevents you from doing that,
but rather the way that some system environments happen to work.
The dpANS simply acknowledges this externally-imposed constraint.