[comp.os.vms] Size limitations

smith%eri.DECnet@MGHCCC.HARVARD.EDU ("ERI::SMITH") (08/03/87)

>	I am puzzled why VMS, a virtual memory operating system, imposes
>	limits on the number of command lines that DCL remembers and on
>	the length of the command prompt. 
     
>Gee, this is just like asking "Why does Jake's program not read my file
>with 190 byte input lines?" when Jake's program was built to read 80
>character "card images" and has an 80 byte input buffer.  There have
>to be limits someplace; the people designing/implementing DCL thought
>that 20 command lines and 1024 characters per command line were reasonable
>limits.  

A whole area of software engineering, and one that I haven't seen discussed
much, is the art of deciding whether there have to be limits and, if so, 
what they should be. This is an IMPORTANT area because it is a COMMON 
source of problems in porting programs.  It is also a common limitation in
the lifetime of computer systems!  

There do not HAVE to be limits--you can use pointer structures to guarantee
that all of available member can be used for any construct, if necessary.
For example, ANS Pascal specifies NO limits on identifier length, and ALL
characters are significant. At least, Think Technologies says so, and calls
out their own limitation to 255 characters as an exception.  I dunno what
VAX PASCAL does. How far should an implementor go to try to meet the 
ANS requirement?  Should identifiers of more than 255 characters be 
permitted, at the cost of using a word rather than a byte somewhere?  How
about more than 65K? Why shouldn't I be able to make use of the full VMS 
virtual memory address space?  Why should I be limited to a measly 4
gigabytes or so--why can't my identifier name be as long as I have
disk storage for?

Still, it's hard to believe that everyone will use arbitrary-length
multi-precision integers for everything (to say nothing of dynamic storage
allocation for everything).

OK, so if that's silly (and perhaps it is) how do you make a rational
decision as to where to draw the line?  Would it have been better for the
authors of the PASCAL standard to pick a number like 256 or 8192 and
get some vendors mad at them ("you DELIBERATELY made it 8192 just to 
make it hard on 12-bit PDP-8 implementors")?

Kernighan and Pike ("The Unix Programming Environment", p. 47) note that
"There are implementation limitations with most programs that expect text
as input. We tested a number of programs on a 30,000 byte text file 
containing no newlines, and surprisingly few behaved properly, because
most programs make unadvertised assumptions about the maximum length of a
line of text."  In UNIX, of course, which uses what I call the paper-tape
concept rather than the 80-column card concept, a 30,000-column line is
SUPPOSED to be acceptable.  (I wonder whether any of their programs that
worked on 30,000 bytes would fail on 32,769? or 65,540?).  Under VMS,
of course, lots would break on 81 characters, lots more at 133, and most
of the rest at 257...  

It is interesting to note how conservative architects have been, in the
face of memory technology that continues to deliver a factor of two every
two years.  It seems that typically, the ratio between the smallest 
memory configuration in the pioneer machine in a family and the place where 
the architecture hits a wall is usually in the range of 16 or so.  
That means that the family of machines starts to get kludgey and 
self-destructs from the weight of accumulated bandaids in just 8 or 10
years.

Examples: PDP-8, minimum 4K, hits the wall at 32.  PDP-11, minimum 16K,
hits a wall at 64 or 256 depending on what kludges you tolerate.  IBM PC,
minimum 16K (what? you don't remember?), hits the wall at 640K.  Macintosh,
curiously enough, despite an obvious opportunity to score with the 68000:
minimum 128K, hits a wall at 4 meg.  I don't know the right figures for the
VAX, which comes off looking pretty good, but clearly the thing that will
kill off 32-bit machines will be the existence of more than 4 gigabytes
of memory and good reasons for wanting to address them cleanly. (Wanting
to address that entire CD-ROM from FORTRAN as an array).  

What everyone does, in a situation where they don't know how much of
something will be needed, is to take a hot guess, based on what THEY
need NOW, what's expedient, etc.  Then you don't document it, because
surely 2048 is more than ANYONE is ever going to need.  

I'll betcha that a clean approach to managing this kind of issue would do 
as much for program portability and system longevity as goto-less code,
--------------------------------------------------------------------
Daniel P. B. Smith         ARPA: smith%eri.decnet@mghccc.harvard.edu
Eye Research Institute     CompuServe: 74706,661
20 Staniford Street        Telephone (voice): 617 742-3140
Boston, MA 02114
--------------------------------------------------------------------
"We are in great haste to construct a magnetic telegraph from Maine to
Texas; but Maine and Texas, it may be, have nothing important to
communicate."--Thoreau
or object-oriented programming, or whatever today's buzzword is...
------

RALPH@UHHEPG.BITNET (08/06/87)

Date:  4-AUG-1987 21:22:20.01
From: Ralph Becker-Szendy RALPH AT UHHEPG
To:   B_INFOVAX,RALPH
Subj: Re: Size limitations
Hi everyone

Even in the danger of creating another "metaphysical" discussion:
(BTW, i apologize for even having my own point of view about hackers ...)

Daniel Smith is right IN PRINCIPLE: a well designed system should not impose
artificial limitations just for the sake of ease of implementation. What
the system can do (for you) should be limited only by its architecture.
most of the restrictions (like: 6 character identifiers in old FORTRAN, upper
case source only for some languages, the infamous 19 continuation lines
for IBM FORTRAN compilers) are caused by mentally retired software designers
sticking to their old prejudices (where >90% of IBM as a whole is MENTALLY
RETIRED, and DEC is on the way down the hill).

On the other hand, systems are implemented by people, which are a scarce
resource. I agree, 20 commands in the recall-stack is a shame. But, on the
other hand, a stack area for 20 commands of 255 bytes each is much easier
to implement than a whole pointer structure with all the complications of
virtual memory. When i write a program, i usually declare a 80-character
string for command input. Yes, in principle i should just declare a
varying-length string for it, but that's such a hassle in FORTRAN, and just
not worth my time. Think about it the following way: maybe the time the
programmer saved by having only 20 commands in stack went into usefull features
of the system.

Ralph Becker-Szendy
University of Hawaii / High Energy Physics Group

Disclaimer: The views expressed here are probably not endorsed by my
employer. I hardly ever actually speak to my employer. Even our system manager
stops smiling when i come by.

minow@decvax.UUCP (Martin Minow) (08/06/87)

In article <smith%eri.decnet@mghccc.harvard.edu> writes:
>...
>It is interesting to note how conservative architects have been, in the
>face of memory technology that continues to deliver a factor of two every
>two years.  It seems that typically, the ratio between the smallest 
>memory configuration in the pioneer machine in a family and the place where 
>the architecture hits a wall is usually in the range of 16 or so.  
>
>Examples: PDP-11, minimum 16K, hits a wall at 64 or 256 depending on what
>kludges you tolerate.

Ahh, how soon they forget.  Quoting from the PDP-11 Programming Handbook
(2nd edition, 1969):

"The PDP-11 is available in two versions designated as PDP-11/10
 and PDP-11/20.  The PDP-11/10 contains ... 1,024 words of 16-bit
 read-only memory, and 128 16-bit words of read-write memory.  The
 basic PDP-11/20 contains ... 4,096 words of 16-bit read-write
 core memory, a programmer's console, and an ASR-33 Teletype."

Note that this was the original PDP-11/10 (I don't know if any were
actually manufactured), not the built-like-a-tank model from 1973
or so.  [And, yes, the manual is still useful.]

Back in 1969, Dec had 36 offices in the United States, 5 in Canada,
and 17 in Europe, Japan, and Australia.  We've grown a bit since then.

Martin Minow
decvax!minow

sommar@enea.UUCP (Erland Sommarskog) (08/12/87)

In a recent article "ERI::SMITH" <smith%eri.decnet@mghccc.harvard.edu> writes:
>Kernighan and Pike ("The Unix Programming Environment", p. 47) note that
>"There are implementation limitations with most programs that expect text
>as input. We tested a number of programs on a 30,000 byte text file 
>containing no newlines, and surprisingly few behaved properly, because
>most programs make unadvertised assumptions about the maximum length of a
>line of text."  In UNIX, of course, which uses what I call the paper-tape
>concept rather than the 80-column card concept, a 30,000-column line is
>SUPPOSED to be acceptable.  (I wonder whether any of their programs that
>worked on 30,000 bytes would fail on 32,769? or 65,540?).  Under VMS,
>of course, lots would break on 81 characters, lots more at 133, and most
>of the rest at 257...  

But that does mean that Unix doesn't impose limitation of what you can do? 
No. Yes, you can have lines that are million characters, no problem. But if
you want to have a line-feed charcter in a text string, thus without
the character having the meaning of new line? As far as I know, this is
not possible.
  The conclusion is that no matter what you do, you may get into problems.
Since you need *some* convention to indicate new lines, you must introduce 
*some* restriction. VMS restricts you in size, but permits you having LF:s 
in text string. Unix does the other way. Which you prefer is a matter of 
taste. (And by the way, VMS does know of stream-LF format too. Unix knows 
only of stream-LF.)
-- 

Erland Sommarskog       
ENEA Data, Stockholm    
sommar@enea.UUCP