[comp.society.futures] C's sins of commission

jlg@lanl.gov (Jim Giles) (09/14/90)

From article <1990Sep13.185833.17455@cunixf.cc.columbia.edu>, by wp6@cunixa.cc.columbia.edu (Walter Pohl):
> 
> 	What do you mean about C's sins of commission?
>       Do you mean the lack of type checking?  [...]

Actually, what you're asking is a tough question.  There are so many
problems with C that just listing the more obvious ones would take many
pages.  It is difficult to turn to _any_ page of the C draft standard
without stumbling upon something with which I completely disagree. (By
the way, it is difficult to turn to any page of the final C standard
because I haven't seen any copies of it.  Has it even been published
yet?  It was finalized in January/February.)

Yes, type checking is a problem with C.  To my mind, it is one of C's
least egregious faults.  For one thing, most violations _are_ illegal
in C - just that most implementations don't bother checking.  I make
a careful distinction between a language and any particular implementation.
The faults of C that I most object to are those which cannot be corrected
because the language itself requires them.

As I said, _most_ type violations are already illegal.  Not all though.
Unions are not discriminated.  Pointer 'casts' are allowed (essentially
between _any_ two pointer types - officially, casts can only be between
'void' pointers and others but cast first to void then to anything else
is legal).

This leads us to pointers.  Just about everything about C pointers is
bad.  From the fact that pointers are hopelessly confused with arrays
(which are completely separate conceptually) to the syntax of pointer
use, C's pointers are a mess.  In addition, many language design people
now feel that pointers of _any_ kind are a bad idea.  C.A.R. Hoare
condemned them as long ago as the early 70's (about the time C was
'designed').  He pointed out that pointers are the data structuring
element that corresponds to GOTOs in flow control - if the one is
bad, so is the other.

-----------------------------------------------------------------------------

Since this is comp.society.futures, I will discuss pointer replacements.
Essentially, pointers only do three things for you: 1) recursive data
structures (graphs, trees, etc....); 2) dynamic memory; and 3) run-time
'equivalence'.  C pointer arithmetic only does what one dimensional array
indexing already does (scaled address calculations): arrays are better for
this - so it's _not_ counted as one of the features of pointers.

Recursive data structures are best implemented directly (to use a
C/Fortran like declaration syntax with the type names on the left):

	Type Tree is record
	   integer :: value
	   tree :: left, right
	end type Tree

Note that the elements inside a tree-valued data type are not _pointers_
but are actually trees themselves.  No more confusing pointers with
what they point to - the pointers aren't explicitly visible.  No more
forgetting the dereference operator (or, conversely, putting it in
incorrectly) - there isn't a dereferencing operator.  To be sure, the
compiler _may_ internally use pointers to do the implementation of
these recursive structures (but then, it probably uses GOTOs to internally
implement loops), but since they aren't explicitly visible to the user,
his life is much easier.

Dynamic memory should also be implemented directly.  Again, here is an
example:

	Dynamic Integer :: a(:,:)       !-- declares two dimensional a
	...   use of a here is illegal - not allocated yet ...
	ALLOCATE a(50,100)              !-- allocates 5000 words memory for a
	...   use of a here is legal ...

Of course, there would have to be an inquiry function do detect whether
the object was allocated or not.  Further, the decision would have to
made in the language design whether deallocation would be automatic
(garbage count, reference count, etc.) or whether the user would have
to explicitly deallocate things.  Either way, this is simpler, safer,
and easier to code, use, and debug than pointer usage.  Further, the
compiler can optimize uses of the dynamic object with the knowledge
that it's not aliased to anything - a fact the compiler cannot deduce
from malloc() calls (which as far as the compiler knows is just a function
which might be returning just any old address it feels like).

Run-time equivalencing is a feature which some people (with a good
deal of justification) claim shouldn't be allowed at all.  I disagree.
But there are still some distintions to be made.

First, equivalencing might be used just reuse statically allocated space
(although, using dynamic memory is probably better).

Equivalence might also be used to provide a form of array reshaping or
slicing - here pointers are inadequate: try the ALIAS/IDENTIFY feature
in the first draft Fortran 8X proposal.

Equivalence might also be used for defeating type checking - but here I
prefer to recommend the below:

	Type Float_internal is record
	   bit.1 :: sign
	   bit.8 :: exponent
	   bit.23:: significand
	End type Float_internal

	Float :: x                      !-- x is a simple float variable
	Map x as Float_internal         !-- overlays record onto x
	x = 5.0                         !-- x used as usual
	x.sign = 1                      !-- negate x - use the mapping
	x.exponent=x.exponent+1         !-- multiply x by 2 - use the map
	... etc ...

This makes the defeating of the type checking explicit and also makes
the indended use clearer.

One of the problems with C pointers is that you can locally tell if a
pointer is supposed to be an array, a recursive structure, an allocated
object, or some exotic run-time equivalence.  Providing all these possible
features with high-level syntax and separate functionality improves the
clarity of the code.  It usually even makes the code more succinct
(shorter).  So, to make a long story short (too late), I haven't yet
found any application which _needs_ explicit pointers either for speed
or functionality.  The above replacements either conceal or eliminate
pointers and are as (or more) efficient and easier to use.

-----------------------------------------------------------------------------

Now, back to C.

Related to type checking is mixed mode.  I don't object to mixed mode,
in fact: I support it.  But C's rules for applying it are not reasonable.
The _claim_ is that the rules are designed to allow speed.  Actually,
there is no rational reason for minus five divided by a thousand to
_ever_ be positive or to _ever_ be larger than one in magnitude.  The C
rules sometimes require that (-5/1000U == some large machine dependent
constant).  The C type heirarchy needs considerable adjustment.

This brings us to mixed type operations (not just mixed mode).  Since
C has no 'logical' type, you are allowed to mix arithmetic with the
results of conditionals with wild abandon.  I have never seen any
advantage to this - I HAVE seen a lot of people make a lot of costly
and time consuming mistakes as a result.  Further, the lack of a
'logical' data type means that they must provide more than one set
of boolean operators (and, or, not, xor) in order to have bitwise
and logical distinguished.

So, the next point is this bit about C's operators.  There are too
many operators and too many precidence levels.  Some (like the logical
vs. bitwise problem) would not be necessary if C had better intrinsic
data types.  Others perform functions which would probably be better
done as function calls (intrinsics which could be inlined of course).
Still others (like pointer dereferencing) should probably not exist at
all.

In spite of all these operators, character string concatenation,
string comparison, and substring operations are _not_ operators.
Even Fortran is better.

Data type declaration "operators" (or whatever you want to call the
syntax elements) are particularly ugly, obscure, peculiar, difficult,
and arcane.  I'm told that this is because they wanted a declaration
of a data type to look like a use of that type.  This leads us to:

The use of complicated data types is particularly ugly, obscure,
peculiar, difficult, and arcane.  At least they met their goal, the
syntax of using the variables is every bit as bad as that for declaring
them.

Assignment operators are necessary in a procedural language.  But, these
combinations of assignment with other operators is just useless
syntactic sugar.  Personally, I don't care if the language has them or
not, but they do clutter up the syntax quite a bit.  The main problem
with assignment is not the operators, per se, but the fact that they are
allowed _within_ an expression.  There have been several well conducted
experiments on the effect of such operators on user productivity - the
conclusion has been that assignment should be a statement level operator
and _not_ an expression level one - at least, if you want to maximize
user productivity.

While we're on the subject of productivity experiments, here's a few
other C features that have failed such tests:

	Control structures which used 'compound statements' (ie. sequences
	    bounded by BEGIN/END or {/} as C spells them).  Better is the
	    IF/ELSEIF/ELSE/ENDIF, WHILE/ENDWHILE , etc. style.  Even better
	    is allowing control constructs to be given unique labels and
	    matching them up (ie. Ada and Fortran 90 have this feature).
	End-of-line ignored within comments.  Comments should be
	    terminated by the end-of-line mark.  C++ has the option
	    of doing this.  Unfortunately, it still retains the old
	    wraparound version as well (the danger of developing a
	    backward compatible language is the load of junk that you
	    can't get rid of).
	End-of-line ignored within statements.  The experimenters decided
	    that people just seem to regard the end-of-line as the same
	    as the end-of-statement, they really do.  Even C programmers
	    intuitively know this.  I examined 10,000+ lines of commercial
	    C code and found only 12 lines which used the C ability to
	    wrap statements across lines automatically.  Even so, forgotten
	    semicolons almost _all_ occur at the end-of-line, and it is
	    still a very common syntax error.  I think the end-of-line
	    mark should be a synonym for semicolon and should be escaped
	    in the rare (12 out of 10,000) case that a continuation is
	    needed.
	Pointers - well, we've talked about them.

        GOTOs.  This is an interesting subject because there are
            actually conflicting results here.  Spaghetti code clearly
            (and in the experiments, this was shown) causes massive
            productivity problems.  However, in the test involving
            BEGIN/END control flow brackets, GOTOs were found to be one
            of the things which were better (by about a factor of 2)
            than 'compound statements'.  Other experiments involving
            "disciplined" GOTO usage (with "disciplined" pretty much
            meaning you'd expect) were compared with "Structured"
            GOTO-less programs and _no_ statistically significant
            difference with productivity was observed at all.  Actually,
            in this one case, I think C has got it exactly right - leave
            unrestricted GOTO in the language _and_ provide all the
            "Structured" control flow constructs.  One of the very few
            things that I think C did right.

There are several other experimental results - this is just a sampling.
The only experiment that I've ever seen in which the losing feature
wasn't in C was the one that showed that semicolon should be a terminator
not a separater.  C got this one right.  C was on the wrong side of
every other experiment I've ever seen.

Some non-experimental features which are widely regarded as bad ideas:

	Case sensitive syntax.  In a case insensitive language, code
	    can be easily shared, teamwork is easier, and upper-case
	    can be used for emphasis or other documentation purposes.
	    In a case sensitive syntax, communication between sites
	    (or even down the hall) is impeded by differing case
	    conventions.  People waste time ironing this out and not
	    doing more useful work.
	Nonintuitive syntax.  This is very common in C.  If a concept
	    has a widely developed and simple notation which is compatible
	    with the keyboard and/or print devices available, the language
	    _should_ make every effort to accomodate this common notation.
	    I will give one specific example: what in the world possessed
	    them to use a leading zero to distinguish octal from decimal???
        Inconsistent syntax.  Also common in C.  An operator, keyword,
            or construct should have the same meaning (as nearly as
            possible) in every context in which it is allowed.  A
            specific example is the keyword 'static', which means that
            the memory for the corresponding variable being declared is
            permanently associated with the variable for the entirity of
            run-time - except in the beginning of a file (outside and
            procedure), where 'static' suddenly means the same thing
            that other languages call 'private'. (All variables declared
            outside of procedures have permanently allocated memory
            anyway - so, 'static' should be regarded as redundant
            there.)

Well, as I predicted, even to touch on the small number of obvious
problems is several pages long.  I trust that you can see there are
still others lurking in the language specification (like 'switch', which
doesn't automatically put a 'break' between the cases - whoops  - I can't
stop once I'm on a roll).

J. Giles

marick@m.cs.uiuc.edu (09/14/90)

/* Written  9:08 pm  Sep 13, 1990 by jlg@lanl.gov in m.cs.uiuc.edu:comp.society.futures */
> The main problem
> with assignment is not the operators, per se, but the fact that they are
> allowed _within_ an expression.  There have been several well conducted
> experiments on the effect of such operators on user productivity - the
> conclusion has been that assignment should be a statement level operator
> and _not_ an expression level one - at least, if you want to maximize
> user productivity.

By lost productivity, do you mean time spent discovering and
correcting errors where people wrote

	if (a = b)

when they meant

	if (a == b)

Did they study languages where that kind of error is less likely?  For
example, I doubt many Lisp programmers write

	(if (eq a b)

when they meant

	if (setq a b)

Were the studies of novice programmers?  How were costs calculated?

It's important to not to interpret experiments more broadly than the
data allows.  For example, were code reads used in the experiment?
With a code read checklist, such errors are readily caught, at low
additional cost.  If they were not used, the experiment says only that
value-returning-assignment raises costs in the absence of code reads,
not that they raise costs, period.

Because experiments require interpretation, I'd like to see citations.
Thanks.

(BTW:   I recommend this convention:

	if (5 == a)

instead of 

	if (a == 5)

Cuts down on errors considerably.  I saw this on the net somewhere,
but I don't know whose idea it was originally.)

jcburt@ipsun.larc.nasa.gov (John Burton) (09/14/90)

>This leads us to pointers.  Just about everything about C pointers is
>bad.  From the fact that pointers are hopelessly confused with arrays
>(which are completely separate conceptually) to the syntax of pointer
>use, C's pointers are a mess.

Not a mess, just cryptic, as is most of C. C IS NOT a high level 
language, it was originally designed as a mid-level language, somewhere
between Pascal and Assembler...it is NOT designed to be used by novice!

>                              In addition, many language design people
>now feel that pointers of _any_ kind are a bad idea.  C.A.R. Hoare
>condemned them as long ago as the early 70's (about the time C was
>'designed').  He pointed out that pointers are the data structuring
>element that corresponds to GOTOs in flow control - if the one is
>bad, so is the other.

Again, what they are considering is a high-level language. C is a mid-level
language. Assembler DOES NOT have arrays and the PRIMARY method of
flow control IS the GOTO...

BTW: since the early 70's it has also been shown that the total LACK
of GOTO's is also bad. Selective use of GOTO is the key.

>Since this is comp.society.futures, I will discuss pointer replacements.
>Essentially, pointers only do three things for you: 1) recursive data
>structures (graphs, trees, etc....); 2) dynamic memory; and 3) run-time
>'equivalence'.  C pointer arithmetic only does what one dimensional array
>indexing already does (scaled address calculations): arrays are better for
>this - so it's _not_ counted as one of the features of pointers.

Sorry, but it IS more expensive (execution time wise) to sequentially
index through an array, than it is to simply increment a pointer...
Array indexing is better for randomlly accessing the array...
just because random access is better for somethings, should we
totally do away with sequential access? come on, be fair...

>Recursive data structures are best implemented directly (to use a
>C/Fortran like declaration syntax with the type names on the left):
>
>	Type Tree is record
>	   integer :: value
>	   tree :: left, right
>	end type Tree
>
>Note that the elements inside a tree-valued data type are not _pointers_
>but are actually trees themselves.  No more confusing pointers with
>what they point to - the pointers aren't explicitly visible.  No more
>forgetting the dereference operator (or, conversely, putting it in
>incorrectly) - there isn't a dereferencing operator.  To be sure, the
>compiler _may_ internally use pointers to do the implementation of
>these recursive structures (but then, it probably uses GOTOs to internally
>implement loops), but since they aren't explicitly visible to the user,
>his life is much easier.

when i think of a (data structure) tree, i tend to think in terms of 
nodes being linked together NOT immediately adjacent (draw a data structure
tree...nodes are not directly attached...there is a line connecting the 
the nodes...the line is a pointer...

Not using pointers would make life easier for a novice programmer. It would
severly limit the experienced programmer...

>
> [more stuff about the evils of C]
>
>J. Giles

One thing that has apparently slipped the mind here
is that comparing C, Pascal and Fortran, is the same as  comparing 
apples to oranges... Fortran was designed to be a "high-level" 
scientific language, it was designed to "protect" the user from the machine 
(and protect the machine from the user) while still allowing him/her
some freedom to do calculations. Protecting the user from himself is
another story...I still have nightmares about debugging large Fortran
systems that use COMMON & EQUIVALENCE extensively. Few errors (other than
straight syntax) were caught by the compiler...most errors were debugged
(or not) at the runtime stage. Pascal was  designed
from the ground up as a "high-level" teaching language. It was designed
to enforce structured programming, and to detect  as many errors as
possible at the earliest possible stage (i.e. the compiler). Basically
it provided a high level of insulation between the machine and the user,
but at the expense of functionallity. (Note: you can still do almost
anything you want in Pascal, it just takes more work). C on the other
hand was design as a "Mid-Level" systems language. It was to be used
to write device drivers and low level routines. Essentially it filled
the gap between Assembly Language (high functionality/flexiblity, *low*
user  protection) and the high level languages (High user protection and
lowered functionality/flexibility). It was designed to have freer access
to the machine, but still provide some level of protection. Basically
what the above posting indicated is that the average programmer should
not use C. Fine, the average programmer should also not do systems programming,
and the average programmer should not use assmebler...

Speed of execution is another aspect for comparison...Pascal and Fortran
were designed to accomplish tasks safely (Pascal more so than Fortran).
C was designed to accomplish tasks quickly. The often made comparison
of indexing through a 1-dimensional array (linear representation of
a multidimensional array) instead of incrementing pointers is not strictly
valid. For every array access, there is a corresponding index calculation
(usually a multiplication and an addition) to determine where to look for
the data. Incrementing a pointer is faster (generally a register increment
operation). The difference between the two methods is the same as the
difference between a random access data structure (array) and a 
sequential access data structure (pointers). Neither is inherently better,
they each have particular applications where they are the best choice.

The problem is that C, Pascal, and FORTRAN were design for (different)
specific purposes, and are currently being used as high level
general purpose languages which is NOT what they were designed to do.
C wasn't even designed as a high level language...Each is probably
the best choice in its area, but not necessarily outside its area...
Ada is an example of a (government decreed) high level general purpose
language...supposedly it can do everything C, Pascal, and FORTRAN can
do, but i'm not sure it would be the language of choice in the specifc
areas that C, Pascal, and FORTRAN were designed for...

When asked the question, "what is the *best* programming language?" the
answer should be "it depends on what you want to do..."

perhaps the future of computer languages shouldn't be trying to find a
"best" general purpose language, but instead, develop transparent ways
for modules written in one language to be used in programs written
in other languages...this is already being done to some extent, but
it should be taken further...

John Burton
(jcburt@ipsun.larc.nasa.gov)
(jcburt@cs.wm.edu)

bzs@WORLD.STD.COM (Barry Shein) (09/15/90)

>>This leads us to pointers.  Just about everything about C pointers is
>>bad.  From the fact that pointers are hopelessly confused with arrays
>>(which are completely separate conceptually) to the syntax of pointer
>>use, C's pointers are a mess.
>
>Not a mess, just cryptic, as is most of C. C IS NOT a high level 
>language, it was originally designed as a mid-level language, somewhere
>between Pascal and Assembler...it is NOT designed to be used by novice!

I think we're working from axioms here that may not be self-evident.
For example, HLLs are easier for novices than MLLs. I don't know if
that's true or not.

Novices make a lot of semantic errors. HLL's provide such a high level
of abstraction and so many complicated rules that it's nearly
impossible to explain why something is wrong, or in particular, figure
it out for yourself.

The PL/I manual set (an HLL if there ever was one) was something like
19 linear shelf-feet. The rules for arithmetic alone demanded an
understanding of all sorts of issues.

MLL's like C provide a very simple semantics (can be explained in a
few dozen pages) but a reasonably high-level syntax. That is,
"algebraic", analogous to what is learned in grade-school math rather
than having to do the translation needed for assembler etc.

Pascal is considered an HLL. I've taught several languages over the
years to college students. Pascal had to be one of the worst, most
kids couldn't get past where the semi-colons go. That sort of subtlety
and worshipping of abstraction is typical of HLL's.

Also, most students have problems with pointers because their teachers
didn't understand them. No joke. Somewhere they became these bogey-men
and teachers in intro courses would stand up there and give all these
signals to the class that what s/he is about to teach is impossible to
understand so here we go...

I just told them funny stories about the confusions between the thing
and the thing contained ("The White House said today...") and went on
with it and never had any problems. A location is not really a hard
problem, houses have addresses etc, pigeon-holes with box numbers and
so forth. The hard problem was lousy teachers with worse attitudes.

Self taught programmers almost never had any problem with pointers or
thought they were particularly interesting/challenging/whatever.

I think the future of programming, however, lies in moving the
solution of the problem closer to the problem. Programming languages
are for programmers, people trained in a specific skill. People who do
not have that training should have applications packages and
generators.

Remember, it wasn't that long ago that when we wanted a simple graph
we used to write a program. Today it would be rare to do that.

The point being, that trying to make programmers out of everyone
(typically by designing languages so easy to use even "your secretary"
could program...that was absolutely beyond a doubt the typical sexist
claim, I was there) was a strange, 1970's dream that by and large has
become unnecessary.

Programming is a skill, like driving a semi, most people shouldn't
need that skill.

jcburt@ipsun.larc.nasa.gov (John Burton) (09/15/90)

>Also, most students have problems with pointers because their teachers
>didn't understand them. No joke. Somewhere they became these bogey-men
>and teachers in intro courses would stand up there and give all these
>signals to the class that what s/he is about to teach is impossible to
>understand so here we go...
>
>I just told them funny stories about the confusions between the thing
>and the thing contained ("The White House said today...") and went on
>with it and never had any problems. A location is not really a hard
>problem, houses have addresses etc, pigeon-holes with box numbers and
>so forth. The hard problem was lousy teachers with worse attitudes.
>
I couldn't agree more...I guess the point I was trying to make was
that pointers are NOT difficult to understand AND they provide
much needed flexibility...If a programmer does not understand them,
most languages provide useful alternatives for them to use...but
don't take them away from people who understand and can use them
effectively...

>
>I think the future of programming, however, lies in moving the
>solution of the problem closer to the problem. Programming languages
>are for programmers, people trained in a specific skill. People who do
>not have that training should have applications packages and
>generators.
>
>[...stuff deleted...]
>
>Programming is a skill, like driving a semi, most people shouldn't
>need that skill.

Exactly!!! Programmers should help provide the tools for non-programmers
to use...Programming languages SHOULD NOT be restricted to provide
safety for novice/non-programmers at the expense of those that can
benefit from the flexibility of "dangerous" attributes such as pointers...

John
(jcburt@cs.wm.edu)
(jcburt@ipsun.larc.nasa.gov)

mst@vexpert.dbai.tuwien.ac.at (Markus Stumptner) (09/17/90)

From article <1990Sep14.160429.2732@abcfd20.larc.nasa.gov>, by jcburt@ipsun.larc.nasa.gov (John Burton):
>>This leads us to pointers.  Just about everything about C pointers is
>>bad.  From the fact that pointers are hopelessly confused with arrays
>>(which are completely separate conceptually) to the syntax of pointer
>>use, C's pointers are a mess.
>>Since this is comp.society.futures, I will discuss pointer replacements.
>>Essentially, pointers only do three things for you: 1) recursive data
>>structures (graphs, trees, etc....); 2) dynamic memory; and 3) run-time
>>'equivalence'.  C pointer arithmetic only does what one dimensional array
>>indexing already does (scaled address calculations): arrays are better for
>>this - so it's _not_ counted as one of the features of pointers.
> 
> Sorry, but it IS more expensive (execution time wise) to sequentially
> index through an array, than it is to simply increment a pointer...
> Array indexing is better for randomlly accessing the array...
> just because random access is better for somethings, should we
> totally do away with sequential access? come on, be fair...

I have a friend who works for CDC.  The current series of graphics
workstations sold by CDC (to my knowledge, very similar to Silicon
Graphics machines) are based on the MIPS RISC chip family and use
heavily optimizing compilers.   One day, while leafing through a
C compiler manual for the system (I don't know what manual it was exactly,
have never seen it), he discovered to his amazement that the programming
guidelines include the following rules:

	- Do not use the increment and decrement operators (++ and --)

	- Do not use pointer incrementing for sequential array access

This was a year ago, and my memory is fuzzy.  As far as I remember,
according to the manual, in most cases using ordinary
assignment/expression syntax and incrementing an array index will be
MUCH FASTER since the compiler has more freedom in keeping values in
registers instead of having to store them back in memory immediately.

I confess I have been very amused by this.  Can anyone support it?
The MIPS architecture is used by DEC and lots of other manufacturers.
Perhaps somebody else has stumbled on this.  What about other RISC
architectures?  Perhaps there is still hope for high-level
languages...


Markus Stumptner                                mst@vexpert.at
Technical University of Vienna                  vexpert!mst@uunet.uu.net
Paniglg. 16, A-1040 Vienna, Austria             ...mcsun!vexpert!mst

jlg@lanl.gov (Jim Giles) (09/18/90)

From article <1990Sep14.160429.2732@abcfd20.larc.nasa.gov>, by jcburt@ipsun.larc.nasa.gov (John Burton):
> [...]
> BTW: since the early 70's it has also been shown that the total LACK
> of GOTO's is also bad. Selective use of GOTO is the key.

Yes, I believe I mentioned that in the article to which you are responding.
Although, I have not been able to find any evidence that GOTO-less is
actually _bad_, there were several experiments that showed that disciplined
use of GOTOs was not worse (or better) than "Structured" coding in any
statistically significant way.  The conclusion of most researchers was
that all the "Structured" alternatives to GOTO should be provided in a
language - and GOTO should also be provided just in case.

> [...]
> Sorry, but it IS more expensive (execution time wise) to sequentially
> index through an array, than it is to simply increment a pointer...

Sorry, but it is not.  The compiler technology to tell that array indexing
is semantically _identical_ to the pointer incrementing scheme to which
you are referring is about 30 years old: if your compiler is _that_ far
behind the state-of-the-art, you got rooked. (The optimization is called
"constant folding".  The address of the array is added to the initial
value of the array index variable at _compile_ time - the interior of the
loop (or wherever) uses this combined value as your C program would use a
pointer - including the simple increment as th loop progresses.)

> [...]
>>Recursive data structures are best implemented directly [...]
> [...]
> Not using pointers would make life easier for a novice programmer. It would
> severly limit the experienced programmer...

The method I gave is semantically _identical_ to using pointers.  The
only difference is the lack of the need for dereferencing.  There is no
functionality that C can perform that the features I gave cannot also
perform.  There is no reason (at the present state-of-the-art) for the
compiler to generate less efficient code than using explicit pointers
would use.   There is (at the present state-of-the-art) excellent reason
to expect that the features I propose could be _MORE_ efficiently
implemented since the presence (or absence) of aliasing is easier to
detect.

> [...]
> C was designed to accomplish tasks quickly. The often made comparison
> of indexing through a 1-dimensional array (linear representation of
> a multidimensional array) instead of incrementing pointers is not strictly
> valid. For every array access, there is a corresponding index calculation
> (usually a multiplication and an addition) to determine where to look for
> the data. Incrementing a pointer is faster (generally a register increment
> operation).  [...]

As I pointed out, the modern (more recent than the late 50's) compiler
can eliminate the addition you refer to.  The multiplication is only
needed for multidimensional arrays - which C doesn't, strictly speaking,
even have.  The multiply should also be eliminated by a modern (more
recent than the early 60's) compiler - the technique is called "strength
reduction".  If your compiler doesn't have it, you been rooked again.

> [...]
> When asked the question, "what is the *best* programming language?" the
> answer should be "it depends on what you want to do..."

The first correct thing you've said.  However, you have not made a
convincing argument that the answer should _ever_ be C - no matter
what the application is.

J. Giles

jlg@lanl.gov (Jim Giles) (09/18/90)

From article <3643@vexpert.dbai.tuwien.ac.at>, by mst@vexpert.dbai.tuwien.ac.at (Markus Stumptner):
> [...]
> 	- Do not use the increment and decrement operators (++ and --)
> 
> 	- Do not use pointer incrementing for sequential array access
> 
> This was a year ago, and my memory is fuzzy.  As far as I remember,
> according to the manual, in most cases using ordinary
> assignment/expression syntax and incrementing an array index will be
> MUCH FASTER since the compiler has more freedom in keeping values in
> registers instead of having to store them back in memory immediately.

This is quite possibly true.  You see, pointers are unrestricted alias
generators.  If you have a subroutine which (say) copies one array into
another:

    for (i=0;i<max;i++)
      A[i] = b[i];

The compiler probably just does the constant folding and zips through
the assignment.  If you do the following instead:

    for (lim=a+max; a<lim; *a++ = *b++);  /* the usual idiom */

The compiler probably has to complete each store of 'a' before the load
of the next 'b'.  Further, since *b may be an alias for itself or for
'a', the values of the pointers are probably stored and reloaded each
trip through the loop as well.

For example, suppose memory is like this:

    address     content         name
    0199        0200            a
     ...
    0200        0300            b _and_ *a
     ...
    0300        0123            *b

In this case, the arrays don't overlap, but *a points to the place that
'b' is stored - so the first trip through the loop alters the location
of the *b array.  This is legal in C, the compiler has no way of knowing
that the user didn't do this, the compiler must genetrate code to allow
this to happen - in this case: 'b' must be reloaded after each loop
trip.  Other memory configurations would require other special actions.
The only "safe" thing the compiler can do is store/reload _all_ the
variables on each loop trip.

Of course, this shows that the CDC compiler was wrong.  The two programs
given here should both generate the same code (since compiler technology
is sufficiently advanced for the compiler to see that both do the same
thing - except for setting 'i' and/or 'lim').  However, the compiler
should generate the same _slow_ code for both.

J. Giles

jlg@lanl.gov (Jim Giles) (09/18/90)

From article <1990Sep14.212806.8131@abcfd20.larc.nasa.gov>, by jcburt@ipsun.larc.nasa.gov (John Burton):
> [...]
>                         I guess the point I was trying to make was
> that pointers are NOT difficult to understand AND they provide
> much needed flexibility...

Really?  Iv'e been asking this same question for over two years
on the net - no one has yet answered it:  Please give me a specific
example of a _legal_ C data structure which _cannot_ be implemented
with the same efficiency with the data structuring features below.
Note, there is not a _single_ explicit pointer data type in the
following list.

1) 'Atomic' types (floats (various sizes), ints (various sizes), booleans,
   characters (various character sets), etc....)

2) Enumerated data types.  These are simply a vay to allow the user to
   invent a new 'atomic' type.

3) Arrays.  Mappings from a tuple of indices to a typed value.  Note: an
   array of arrays is legal and is _NOT_ the same as a 2-d array (although
   a little syntactic sugar could allow this later - no one has ever asked
   for it).

4) Sequences.  An ordered collection of zero or more objects to be
   accessed in a specific order.  Obvious syntactic sugar (like direct
   referencing of the last element or an arbitrary element) is permitted.
   The usual implementation of character strings in C is an example of an
   inefficient implementation of a sequence.  You can have a sequence of
   any data type (including a sequence of sequences - which is what a
   dictionary is).

5) Records.  Like C struct.  No difference at all really.

6) Unions.  These are _always_ discriminated.  The compiler is
   responsible for maintaining and checking the type tags.  Note that
   this only _seems_ inefficient: _legal_ C programs should always
   explicitly maintain a tag anyway.

7) Recursive types.  These may be given the attribute 'aliased' in order
   to allow circular and overlapping references.  Other than that, we have
   discussed these before.

In addition, all variables can be declared with a 'dynamic' attribute,
which means that they must be allocated before use (dynamic arrays give
their size at allocation time).  It might be desireable for sequences
and recursive data type to be given the dynamic attribute automatically.

I can demonstrate sample syntax for these if anyone thinks it is required.
Anyone who proposes a C data object that he claims is not representable
here is invited to do so (I'm not joking - I'm designing a language with
these features - this challenge is an attempt to find out whether I'm
leaving something out).

> [...]
>>Programming is a skill, like driving a semi, most people shouldn't
>>need that skill.
> 
> Exactly!!! Programmers should help provide the tools for non-programmers
> to use...Programming languages SHOULD NOT be restricted to provide
> safety for novice/non-programmers at the expense of those that can
> benefit from the flexibility of "dangerous" attributes such as pointers...
> [...]

I agree completely ... except with the last line.  Pointers are a bad
example of the philosophy you are discussing.  The presence of pointers is
an example of something useful having been _left_out_ of programming
languages.  Making me use pointers when what I really _want_ is dynamic
memory, or arrays, or sequences, or recursive data structures, etc.; is
like forcing the semi driver to use a crank to start his truck.  We give
semi trucks electric starters because it is a simple, reliable, and easy
to use replacement for the crank (in the role of truck starter anyway).
In fact, the electric starter is _so_ good, they no longer bother
providing cranks for trucks (or, any way to use a crank even if you had
one).  In fact, truck engines are now so large and heavy, you couldn't
turn it over by hand anyway - the presence of the starter has made it
possible to design larger and more powerful trucks than would otherwise
be possible.

A programming language should be designed with simple, reliable, and
easy to use replacements for hardware level concepts as well - or
would you rather have only conditional jumps for flow control and bit
twiddling for data?  The presence of more powerful language features
should allow programmers to concentrate on _what_ they want the code
to do, not _how_ the machine does it internally.  This should make
possible larger and more powerful programs than are presently feasible
with sufficient reliability.

J. Giles

pirinen@cc.helsinki.fi (09/18/90)

From article <1990Sep14.160429.2732@abcfd20.larc.nasa.gov>, by jcburt@ipsun.larc.nasa.gov (John Burton):
> Sorry, but it IS more expensive (execution time wise) to sequentially
> index through an array, than it is to simply increment a pointer...
> Array indexing is better for randomlly accessing the array...
> just because random access is better for somethings, should we
> totally do away with sequential access? come on, be fair...
Pointer derefencing, as such, is not sequential, it is in fact more
random than array indexing (can you say "aliasing"?).  A loop over an array
that uses indexing can be compiled using pointer incrementing -- it's a
standard compiler technique.

In article <3643@vexpert.dbai.tuwien.ac.at>, mst@vexpert.dbai.tuwien.ac.at (Markus Stumptner) writes:
> [C compiler manual for MIPS RISC chip-based computer says:]
> 	- Do not use pointer incrementing for sequential array access
This doesn't surprise me a bit: modern chips have been designed to
execute high-level languages effectively, array indexing being a case in
point.  Intel 80286, 386, and i486 data sheets say that all
indirect memory addressing modes take a equal number of clocks,
including scaled indexed addressing.  This eliminates the supposed
advantage of pointer incrementing in most cases, if the compiler didn't
already.


Pekka P. Pirinen    University of Helsinki
pirinen@cc.helsinki.fi  pirinen@finuh.bitnet  ..!mcvax!cc.helsinki.fi!pirinen
Read my Lisp: no new syntax! -nil

pirinen@cc.helsinki.fi (09/18/90)

In article <1990Sep14.212806.8131@abcfd20.larc.nasa.gov>, jcburt@ipsun.larc.nasa.gov (John Burton) writes:
> I guess the point I was trying to make was
> that pointers are NOT difficult to understand AND they provide
> much needed flexibility...If a programmer does not understand them,
> most languages provide useful alternatives for them to use...
Except C, Pascal, etc.
> but don't take them away from people who understand and can use them
> effectively...
I agree pointers can be used effectively. It would be interesting to
program in a language that had pointers AND a useful alternative, to see
how often one would choose each.  Are there any such languages?

> Programming languages SHOULD NOT be restricted to provide
> safety for novice/non-programmers at the expense of those that can
> benefit from the flexibility of "dangerous" attributes such as pointers...
Where does this idea of C-hackers come from, that only novices need
safety?  I'm no novice (10 years of programming), and I want all the
safety I can get.  I'm sick and tired of debugging for hours to find
simple errors that could have been caught at the expense of a few
seconds of the compiler's time.  Programmers are not machines, even good
programmers make simple mistakes.


Pekka P. Pirinen   University of Helsinki
pirinen@cc.helsinki.fi  pirinen@finuh.bitnet  ..!mcvax!cc.helsinki.fi!pirinen
Read my Lisp: no new syntax! -nil

brendan@batserver.cs.uq.oz.au (Brendan Mahony) (09/18/90)

jlg@lanl.gov (Jim Giles) writes:

 - 1) 'Atomic' types (floats (various sizes), ints (various sizes), booleans,
 -    characters (various character sets), etc....)

 - 2) Enumerated data types.  These are simply a vay to allow the user to
 -    invent a new 'atomic' type.

 - 3) Arrays.  Mappings from a tuple of indices to a typed value.  Note: an
 -    array of arrays is legal and is _NOT_ the same as a 2-d array (although
 -    a little syntactic sugar could allow this later - no one has ever asked
 -    for it).

 - 4) Sequences.  An ordered collection of zero or more objects to be
 -    accessed in a specific order.  Obvious syntactic sugar (like direct
 -    referencing of the last element or an arbitrary element) is permitted.
 -    The usual implementation of character strings in C is an example of an
 -    inefficient implementation of a sequence.  You can have a sequence of
 -    any data type (including a sequence of sequences - which is what a
 -    dictionary is).

 - 5) Records.  Like C struct.  No difference at all really.

 - 6) Unions.  These are _always_ discriminated.  The compiler is
 -    responsible for maintaining and checking the type tags.  Note that
 -    this only _seems_ inefficient: _legal_ C programs should always
 -    explicitly maintain a tag anyway.

 - 7) Recursive types.  These may be given the attribute 'aliased' in order
 -    to allow circular and overlapping references.  Other than that, we have
 -    discussed these before.

 - In addition, all variables can be declared with a 'dynamic' attribute,
 - which means that they must be allocated before use (dynamic arrays give
 - their size at allocation time).  It might be desireable for sequences
 - and recursive data type to be given the dynamic attribute automatically.

 - I can demonstrate sample syntax for these if anyone thinks it is required.
 - Anyone who proposes a C data object that he claims is not representable
 - here is invited to do so (I'm not joking - I'm designing a language with
 - these features - this challenge is an attempt to find out whether I'm
 - leaving something out).

Yes you are. You are leaving out memory mapped I/O and operating system
vectors and other disgusting cludges that make the computing world go
round. Other than than you are spot on!

--
Brendan Mahony                   | brendan@batserver.cs.uq.oz       
Department of Computer Science   | heretic: someone who disgrees with you
University of Queensland         | about something neither of you knows
Australia                        | anything about.

KPURCELL@liverpool.ac.uk (Kevin Purcell) (09/18/90)

On 17 Sep 90 08:21:00 GMT eru!hagbard!sunic!mcsun!tuvie!vexpert.dbai.tuwien.ac.
(eru!hagbard!sunic!mcsun!tuvie!vexpert.dbai.tuwien.ac.%!mst@edu.mit.bl) said:

>From article <1990Sep14.160429.2732@abcfd20.larc.nasa.gov>, by
> jcburt@ipsun.larc.nasa.gov (John Burton):
[stuff about pointers and merits versus multiply and add indexing of arrays]
>
> ... he discovered to his amazement that the programming
>guidelines include the following rules:
>
>	- Do not use the increment and decrement operators (++ and --)
>
>	- Do not use pointer incrementing for sequential array access
>
>This was a year ago, and my memory is fuzzy.  As far as I remember,
>according to the manual, in most cases using ordinary
>assignment/expression syntax and incrementing an array index will be
>MUCH FASTER since the compiler has more freedom in keeping values in
>registers instead of having to store them back in memory immediately.

If the machines has access to a vetorising processor or can run stuff
through a pipeline processor very quickly it is sometimes better to avoid
doing it explicitly fast (compared to say a PDP-11) with pointer and just
say what you really want. The compiler is probably better at picking up
this form to vectorise or unroll.

For example,

on some machines,

for(i=0; i<5; i++)
   a[i] = 0.0;

might execute faster as an unrolled loop:

a[1] = 0.0;
a[2] = 0.0;
a[3] = 0.0;
a[4] = 0.0;
a[5] = 0.0;

or it may get dispatched to a say 5 cpus in a vectorised procesor.

Writing it as:

i = 5;
ap = &a;
while(i--)
  *a++ = 0;

may not be pulled out by the optimiser.

In this case I think most optomisers would find it, but I can imagine
some slightly more complex constructs that might fool them.

There is a simple review of what RISC compilers do in UnixWorld Aug 1990
that expands on these ideas.

>
>Markus Stumptner                                mst@vexpert.at
>Technical University of Vienna                  vexpert!mst@uunet.uu.net
>Paniglg. 16, A-1040 Vienna, Austria             ...mcsun!vexpert!mst

I fear we are in danger of drifting into alt.religion.comp or even
comp.lang.c territory and way from futures.

Kevin Purcell          | kpurcell@liverpool.ac.uk
Surface Science,       |
Liverpool University   | Programming the Macintosh is easy if you understand
Liverpool L69 3BX      | how the Mac works and hard if you don't. -- Dan Allen

jeremy@ultima.socs.uts.edu.au (Jeremy Fitzhardinge) (09/19/90)

In comp.society.futures you write:

|From article <1990Sep14.212806.8131@abcfd20.larc.nasa.gov>, by jcburt@ipsun.larc.nasa.gov (John Burton):
|> [...]
|>                         I guess the point I was trying to make was
|> that pointers are NOT difficult to understand AND they provide
|> much needed flexibility...
|
|Really?  Iv'e been asking this same question for over two years
|on the net - no one has yet answered it:  Please give me a specific
|example of a _legal_ C data structure which _cannot_ be implemented
|with the same efficiency with the data structuring features below.
|Note, there is not a _single_ explicit pointer data type in the
|following list.
|
|1) 'Atomic' types (floats (various sizes), ints (various sizes), booleans,
|   characters (various character sets), etc....)
|
|2) Enumerated data types.  These are simply a vay to allow the user to
|   invent a new 'atomic' type.
|
|3) Arrays.  Mappings from a tuple of indices to a typed value.  Note: an
|   array of arrays is legal and is _NOT_ the same as a 2-d array (although
|   a little syntactic sugar could allow this later - no one has ever asked
|   for it).
|
|4) Sequences.  An ordered collection of zero or more objects to be
|   accessed in a specific order.  Obvious syntactic sugar (like direct
|   referencing of the last element or an arbitrary element) is permitted.
|   The usual implementation of character strings in C is an example of an
|   inefficient implementation of a sequence.  You can have a sequence of
|   any data type (including a sequence of sequences - which is what a
|   dictionary is).

How does this differ from an array (or vica versa)?  Is it that a sequence
is a completely dynamic object, where arrays are created with a specific
size (whether it be at compile or run time)?  You seem to imply this later.
Is an object in a sequence of any type?  Do all objects have to be the same
type?  I guess a sequence of unions would achieve that effect while still
only allowing a sequence of one type.

|5) Records.  Like C struct.  No difference at all really.
|
|6) Unions.  These are _always_ discriminated.  The compiler is
|   responsible for maintaining and checking the type tags.  Note that
|   this only _seems_ inefficient: _legal_ C programs should always
|   explicitly maintain a tag anyway.

Careful selection of the tagging mechanism would be needed, I suppose.
At a guess you would have some sort of tag field in the union that has
a number representing the current type of the union.  Since the union
can be of any type (simple and derived) the actual tags used would have
to be decided at compile time.  This would make linking individually
compiled modules and libraries using shared union types difficult, since
they would all have to use the same tagging convention.  I think it
should become a task of the linker to organize this kind of thing.  Have
something like a "tag table" along side the "symbol table", and treat
them similarly.  Scoping of [union] types would have to be handled like
scoping of variables, with similar name-space conflicts.
Perhaps I'm taking a too C oriented approach,  but this seems to accord
with current practice with practical languages I know/use.

|7) Recursive types.  These may be given the attribute 'aliased' in order
|   to allow circular and overlapping references.  Other than that, we have
|   discussed these before.
|
|In addition, all variables can be declared with a 'dynamic' attribute,
|which means that they must be allocated before use (dynamic arrays give
|their size at allocation time).  It might be desireable for sequences
|and recursive data type to be given the dynamic attribute automatically.
|
|I can demonstrate sample syntax for these if anyone thinks it is required.
|Anyone who proposes a C data object that he claims is not representable
|here is invited to do so (I'm not joking - I'm designing a language with
|these features - this challenge is an attempt to find out whether I'm
|leaving something out).

How would you handle what is currently handled by pointers to functions
in C?  I'm primarily a C programmer, but I certainly don't want to let
myself be caught by "How can I do it in C" as opposed to "How can it be
done".  For the things I do (OS hacks, graphics, realtime interactive)
I've found C to be the most useful since, it is a simple language
that can be found on a wide range of machines, or tends to be the
best supported/implemented on those machines.  Reciently I've been
teaching myself C++ and found it to fill a lot of gaps and problems
in C (although I hadn't noticed them until I used C++).  No doubt there
are other languages I will come across that have features I want in C++.
I don't think they will be FORTRAN or COBOL.



-- 
Jeremy Fitzhardinge: jeremy@ultima.socs.uts.edu.au jeremy@utscsd.csd.uts.edu.au 
                            DEATH TO ALL FANATICS!

jlg@lanl.gov (Jim Giles) (09/20/90)

From article <4905@uqcspe.cs.uq.oz.au>, by brendan@batserver.cs.uq.oz.au (Brendan Mahony):
> [...]
> Yes you are. You are leaving out memory mapped I/O and operating system
> vectors and other disgusting cludges that make the computing world go
> round. Other than than you are spot on!

Perhaps you can be kind enough to point out the reason I need pointers
(or anything else that's not on my list) to provide the functionality
you mention.  The first memory mapped I/O I ever used was done in Fortran.
And, not an extended Fortran either - passing arrays with call-by-reference
is quite adequate to tell the system where my I/O buffer is to be.

J. Giles

jlg@lanl.gov (Jim Giles) (09/20/90)

From article <18377@ultima.socs.uts.edu.au>, by jeremy@ultima.socs.uts.edu.au (Jeremy Fitzhardinge):
> In comp.society.futures you write:
> [...]
> |4) Sequences.  An ordered collection of zero or more objects to be
> |   accessed in a specific order.  [...]
>
> How does this differ from an array (or vica versa)?  Is it that a sequence
> is a completely dynamic object, where arrays are created with a specific
> size (whether it be at compile or run time)?  You seem to imply this later.
> Is an object in a sequence of any type?  Do all objects have to be the same
> type?  I guess a sequence of unions would achieve that effect while still
> only allowing a sequence of one type.

The answers to your questions are: 1) ther're different (see below);
2) exactly, arrays are fixed size/shape, sequences are always one-d
and variable length (initialized empty unless the declaration does
an initialization); 3) and 4)  "sequence" is a declaration attribute which
can be applied to any type - all elements of a sequence have the same
type; 5) exactly, all the elements of a sequence can be in the same
union type - the union can be collections of any types.

Examples:

	Integer sequence :: x   !empty sequence of integers
	Integer sequence(0:256:16) :: y
				!empty sequence, no space initially
				!allocated (the zero), max space is
				!256 elements, allocate in 16 element
				!chunks.
	ASCII sequence :: s     !empty character string - ASCII
	char sequence :: ss = "abc"
				!character string in native character
				!set (which may or may not be ASCII),
				!initial value is three long "abc".
				!quotes is syntactic sugar for character
				!sequence types.
	type u_test is union(integer, ASCII sequence)
				!declares a union type where members
				!are integers or ASCII strings
	u_test sequence :: directory
				!directory is a sequence of ints or
				!ASCII sequences (each element may differ).
	...

	s = ss                  !native character set is automatically
				!converted to ASCII
	ss = ss | "def"         !concatenate is '|', ss is "abcdef"
	ss(2:4) = "xyz"         !substring usage, ss is "axyzef"
	x = (1,3,5)             !parenthesis are sequence constructor
				!for non-character types
	...

Anyway, I think you get the picture.

> [... unions ...]
> Careful selection of the tagging mechanism would be needed, I suppose.
> At a guess you would have some sort of tag field in the union that has
> a number representing the current type of the union.  Since the union
> can be of any type (simple and derived) the actual tags used would have
> to be decided at compile time.  This would make linking individually
> compiled modules and libraries using shared union types difficult, since
> they would all have to use the same tagging convention.  I think it
> should become a task of the linker to organize this kind of thing.  Have
> something like a "tag table" along side the "symbol table", and treat
> them similarly.  Scoping of [union] types would have to be handled like
> scoping of variables, with similar name-space conflicts.
> [...]

The type tags would indeed be in the representation of the object itself.
As a practical matter, the union object would probably consist of a type
tag and a reference (pointer) to the data that represents the object.
This would permit arrays and sequences (etc.) of unions to be allocated
without having to worry about variable space depending on the type of
object actually stored.  Note: this use of pointers would be hidden
from the user's view and subject to stronger compiler control so that
it shouldn't raise any concerns about aliasing and other abuse - this
is no different that the fact that IF/THEN/ELSE uses GOTOs internally.

The scoping of user defined types of all flavors (not just unions) is a
problem that the linker has to worry about (or <shudder> the run-time
environment).  Actually, the proper use of interface specifications and
the requirement that the type declarations in the caller match the ones in
the callee would simplify the loader's work in this regard.  This would
also be a safer approach from the user's point of view, since it would
make the type assumptions the user is making very explicit.  As a
practical matter, the type definitions and function prototypes could be
contained in an include file so that the user would not be forever
retyping them.

> [...]
> How would you handle what is currently handled by pointers to functions
> in C? [...]

Functions are an 'atomic' data type.  Their attributes include the number
and type of their arguments as well as the type of their result (if any).
A variable of type 'function' can be declared and assigned to.  Function
definitions declare a named function with a 'constant' attribute.

A possible syntax is:

	 !comment sin() and cos() are the usual trig functions
	 !comment the exclamation point begins a comment
	 !end-of-line ends a comment
	 float function (float) :: x   ! x is a variable whose type
				       ! is the same as sin() and cos()
	 if (some_condition) then
	    x=sin       !function name with arg list causes no evaluation
	 else
	    x=cos
	 endif
	 ...
	 ans = x(0.123) !does either sin(0.123) or cos(0.123)

Syntactic/semantic sugar (such as adding function constructors, etc.)
would allow adding complete functional language support.  No pointers
are implied here - not necessarily even in the low-level implementation
- the assignment _could_ actually copy the code for the function.  As a
practical matter, function assignment should only copy the local
variable space of the function and use a pointer to the code.  This
would permit, for example, multiple copies of a random number generator
with a different seed in each.

J. Giles

bson@AI.MIT.EDU (Jan Brittenson) (09/20/90)

   So how would you propose to accomplish the following, for example,
without pointers or pointer arithmetic?


	1. Pointer range check (see if a buffer crosses page
	   boundaries, for instance).

	2. Calculate physical addresses for DMA controllers.

	3. Sort a linked list on addresses of some data pointed to
	   from within the node. Or to keep it sorted as new (addresses
	   of) data is added.

	4. Implement malloc()/free().

	5. Read and write addresses from/to pipes.


   I wonder whether any compiler can be designed to successfully
determine when to duplicate data and when to use a reference. As a rule,
it's bad practise to duplicate data other than in the rare occasions when
an explicit duplicate is needed. Usually any processing of the data
constitutes duplication in itself, with the exact type/size of the
resultant duplicate dependending on the interpretation of the original.

   I'm curious as to why so many programmers engage themselves in hot
debates over how to best implement strings. String processing is
proportionally insignificant - the first thing done after a read is
usually a tokenization, either through hand-written code or the output of
a lexical front-end generator. The string is nothing but a character
buffer, irrelevant to the data or command itself - it's only the
written/read representation of it, and can be discarded once tokenized.
Strings held on to rarely need any further processing, they mostly get
passed around by reference, or located through hash tables.

   Just like some algorithms cannot be reasonably coded without gotos -
the alternative would likely be even worse - some operations on data or
certain functionality cannot reasonably be performed without pointers.
Certainly, all data types can be expressed without pointers - but so can
they also in everyday English.  Declarations and talk are to be
distinguished from working programs. (No macho intent.)

   Finally, not all people agree with the view that "pointers are the
data type equivalent of gotos." I for one think it's been a little
stretched lately. A program like a device driver could probably be coded
perfectly well without a single goto, but not without pointer arithmetic.

Just my two cents worth.

							-- Jan Brittenson
							   bson@ai.mit.edu

isr@rodan.acs.syr.edu (Michael S. Schechter - ISR group account) (09/20/90)

In article <63475@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>From article <4905@uqcspe.cs.uq.oz.au>, by brendan@batserver.cs.uq.oz.au (Brendan Mahony):
>> Yes you are. You are leaving out memory mapped I/O and operating system
>> vectors and other disgusting cludges that make the computing world go
>> round. Other than than you are spot on!
>And, not an extended Fortran either - passing arrays with call-by-reference
>is quite adequate to tell the system where my I/O buffer is to be.

Now in C it's easy to go MyPtr=(Pointer)0x100  In fortran, it's a 
pain. At least in the one's I've used I'd have to use IPEEK and IPOKE
functions to access memory in this way.
I suppose it could be done via
   CALL MYSUB(ivalue,100)

SUBROUTINE MYSUB(ivalue,MYADD())
   MYADD(0)=ivalue

But let's face it, what's the difference between them?
None.
They both allow you to get into trouble.
So why complain about being allowed to do EASILY what you
can do in any other language with a little effort.
This further illustrates what started the entire thread-
FORTRAN is UGLY and a PAIN for things that use a lot of pointers.
--
Mike Schechter, Computer Engineer,Institute Sensory Research, Syracuse Univ.
InterNet: Mike_Schechter@isr.syr.edu isr@rodan.syr.edu Bitnet: SENSORY@SUNRISE

jlg@lanl.gov (Jim Giles) (09/21/90)

From article <9009201017.AA06087@rice-chex>, by bson@AI.MIT.EDU (Jan Brittenson):
> 
>    So how would you propose to accomplish the following, for example,
> without pointers or pointer arithmetic?
> 
> 	1. Pointer range check (see if a buffer crosses page
> 	   boundaries, for instance).

Well, without pointers, why do you need a pointer range check?  Computing
the range of something that doesn't exist seems a little silly.

However, you parenthetical remark is of value - unfortunately, it's not
possible in _legal_ ANSI C.  Pointer arithmetic cannot be carried out past
the bounds of an individual object.  Pointers to different objects cannot
be subtracted or compared _legally_.  This is so that pointer arithmetic
operations can ignore the segment part of addresses on segmented machines.
So, you can't tell with C pointers whether your buffer crosses page
boundaries or anything because you can only compare the pointer _within_
the buffer itself - and you don't know the relative position of the
beginning of the buffer to page boundaries.

I think you had in mind casting the pointer to an int and looking at
the raw address - the ANSI standard leaves this process undefined.

Now, if you're talking about non-standard extensions to C which would
allow you to do this stuff - then any other language can contain the
same non-standard extensions.

> [...]
> 	2. Calculate physical addresses for DMA controllers.

Why should I care?  The system/environment should be able to give me the
address if I need it.  But, how do I use a raw address anyway? _Standard_
C pointers don't give me any such access.  Access to such things as
hardware controllers should be privilaged to the system - and _it_
can contain machine dependent code - like assembly.

> [...]
> 	3. Sort a linked list on addresses of some data pointed to
> 	   from within the node. Or to keep it sorted as new (addresses
> 	   of) data is added.

I guess you'll have to tell me how this differs from sorting on the
index of the data within an array or sequence.  Since the sequence is
dynamic, you can add all the elements you wish - and still sort on
index.  And, once again, the integer value of different pointers is
_not_ defined by the ANSI standard - nor it their relative order.

> [...]
> 	4. Implement malloc()/free().

When I found out that the ANSI C standard prohibited comparing/subtracting
pointers to different objects, I pointed out on comp.lang.c that malloc()
and free() could not not be written in _standard_ C.  They agreed with me.
I pointed out that the ability to use pointers as raw addresses was the
only thing of value that C pointers had (in my opinion).  They said I was
wrong for wanting it, I couldn't do it, that's that.  In fact, I'm on the
side of the rest of you who _want_ pointers to do raw address calculations
- C pointers don't.

> [...]
> 	5. Read and write addresses from/to pipes.

Again, standard C can't do this.  However, this is also something that
the system should provide a clearer, higher-level way to do.

> [...]
>    I wonder whether any compiler can be designed to successfully
> determine when to duplicate data and when to use a reference. As a rule,
> it's bad practise to duplicate data other than in the rare occasions when
> an explicit duplicate is needed.  [...]

It is even worse programming practice to alias data (by copying references)
other than those rare occasions when aliasing is a required part of the
algorithm.  Inadvertent aliasing leads to many man-hours of unnecessary
debugging time.

Besides, you aren't paying attention.  The list of data structures I
gave included an alias attribute.  Data types with the alias attribute
are assigned by copying the reference instead of the data.  Thus, the
programmer has explicit control over whether aliasing is allowed or not,
And, when it's not, the compiler can detect and signal an error when
the programmer inadvertently tries to do it.

> [...]
>    I'm curious as to why so many programmers engage themselves in hot
> debates over how to best implement strings. String processing is
> proportionally insignificant - the first thing done after a read is
> usually a tokenization, either through hand-written code or the output of
> a lexical front-end generator.  [...]

Tokens are also strings (which must be frequently compared and efficiently
stored).  Symbol tables also contain strings (among other stuff).  Text
processors usually don't have much data that isn't part of one string
or another.  This is a vital question to _some_ applications.  If it
isn't for you, so be it.  But don't question the need that others feel
for strings - they may know something about their application that you
don't.

> [...]
>    Just like some algorithms cannot be reasonably coded without gotos -
> the alternative would likely be even worse - some operations on data or
> certain functionality cannot reasonably be performed without pointers.
> [...]

I still you'd tell me what those applications are.  Absolute raw addresses
aren't even in C (though, I think that for the systems programmer, they
are absolutely necessary - not for anyone else though).  I'm still looking
for a _legal_ C application that can't be done with the 7 data structuring
tools I gave.

J. Giles

jlg@lanl.gov (Jim Giles) (09/21/90)

From article <1990Sep20.161852.22977@rodan.acs.syr.edu>, by isr@rodan.acs.syr.edu (Michael S. Schechter - ISR group account):
> [...]
> Now in C it's easy to go MyPtr=(Pointer)0x100  [...]

Well, yes, you can do that in some extended versions of C no doubt.
You could extend Fortran to do such things too.  But, _standard_
C leaves the result of that cast undefined.  For example, on a
segmented machine, the above statement _may_ put the number 0x100
into the offset part of MyPtr - leaving the segment part of the
pointer unchanged - and _still_ be standard conforming C.  This is
fine if it is what you want, but you may have intended that the
statement set the segment address as well as the offset.

> [...]                                            In fortran, it's a
> pain.  [...]

In _standard_ C it's a pain.  Some extended Fortran's I've seen will
let me declare an array and base it at any hardware address I want.
Some PC users do this to allow them to address display memory as a
simple 2-d array.  Presumably, _extended_ versions of C can do the same.
But, we were talking about _legal_ uses of the language as defined
by the appropriate language definition.

> [...]
> This further illustrates what started the entire thread-
> FORTRAN is UGLY and a PAIN for things that use a lot of pointers.

Perhaps it's the use of pointers that's "UGLY and a PAIN".  It's maybe
less ugly in C, but it's usually not necessary elsewhere.  In fact,
it's usually better not to use pointers at all if they can be avoided.

J. Giles

bzs@WORLD.STD.COM (Barry Shein) (09/21/90)

LISP is a good example of a language which has rarely included any
explicit pointers yet has been made to do most anything. I've seen
LISPs with explicit pointer types (address foo) but I've never seen
that do anything which a dynamically allocated, type-coercible array
can't do (e.g. load object code into an array and then set its type to
compiled-function.) A nice example is the (PD) Franz Lisp arrays
package, implemented entirely in LISP except for one or two little
malloc-like primitives.

Of course, LISP has its own problems (which might make for interesting
"futures" discussions), notably its run-time interpreter overhead
making it difficult to deliver layered products w/o first making the
customer buy an entire LISP system.

For example, I have friends with LISP products they'd like to sell
for, say, $995/workstation. But the customer first has to buy a $2500
(or more) LISP to load the package into (and the customer may have no
interest in the LISP.) This has really stifled the market for low-cost
software implmented in LISP (not to mention that you're also dependent
on the OS and LISP system continuing to work together thru upgrades
which is often not the case, and the LISP vendor wants $$$ for new
releases.)

        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs@world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

mwm@DECWRL.DEC.COM (Mike Meyer, My Watch Has Windows) (09/21/90)

>> I think you had in mind casting the pointer to an int and looking at
>> the raw address - the ANSI standard leaves this process undefined.

You keep acting like "undefined" means "unusable". That's not the
case. It means that the compiler can do whatever it wants, which may
or may not be usefull in your environment. If there's an environment
where the C compiler does something usefull that some "replacement"
can't do, then that replacement is lacking.

One of the reasons C became popular was because operations were
undefined originally, and compiler writers were free to do whatever
made sense on their machines. The ANSI standard holds true to that
spirit, but marks such areas clearly, so that you have a fighting
chance of writing code that will port to multiple environments.

Used in that light, as a high level, semi-portable assembler, (aka a
"systems" language) C is acceptable. If you're working on a
replacement for it in that sense, then you have to be able to allow
users to do things that they do in C in their environment, even if
those things are "undefined" by the standard. I'd be interested in any
such attempt.

As a general-puprose programming language, C is old and has a large
number of problems (many of which are shared by lots of language from
that era). However, there are a large number of replacements around,
and so far, I've seen nothing in your proposals that are new.  If you
haven't yet, you might take a look at Euclid and peoples comments on
it to see how some of your proposed features work in practice.

You've as yet to answer my questions about garbage collection, and
what constructs would be provided for working with sequences. As an
addendum to that list, I'd be interested to know how you would recode
the following C sequence with undefined behavior using your proposed
constructs so that it would do what it's author intended:

	long 	word ;
	char	*pointer ;

	word = '1234' ;
	pointer = (char *) word ;
	printf("%c %c %c %c\n", pointer++, pointer++, pointer++, pointer++) ;

The intent was to determine the byte ordering of the machine the code was
running on.

	<mike

pmorris@BBN.COM (Phil Morris) (09/21/90)

>Date: Thu, 20 Sep 90 14:04:29 PDT
>From: Mike Meyer <mwm@decwrl.dec.com>
>Subject: Re: C's sins of commission (was: (pssst...fortran?))
>To: jlg@lanl.gov
>Cc: info-futures@encore.com

[...]

>You've as yet to answer my questions about garbage collection, and
>what constructs would be provided for working with sequences. As an
>addendum to that list, I'd be interested to know how you would recode
>the following C sequence with undefined behavior using your proposed
>constructs so that it would do what it's author intended:
>
>	long 	word ;
>	char	*pointer ;
>
>	word = '1234' ;
>	pointer = (char *) word ;
>	printf("%c %c %c %c\n", pointer++, pointer++, pointer++, pointer++) ;
>
>The intent was to determine the byte ordering of the machine the code was
>running on.
>
>	<mike

How about something as simple as:

#define SZLONG sizeof(long)

	union xxx {
		long word;
		char str[SZLONG];
	} un;

	un.word = '1234';
	printf("%c %c %c %c\n", un.str[0], un.str[1], un.str[2], un.str[3]);


It works on my machine, and the hypothetical language under discussion can handle these
constructs (except he didn't mention how to do sizeof(xxx)).

-Phil


--------
Phil Morris (pmorris@dgi0.bbn.com)
Disclaimer: ME? I'm only a non-smoking cat; can't believe a word I meow.

pmorris@BBN.COM (Phil Morris) (09/21/90)

> printf("%c %c %c %c\n", un.str[0], un.str[1], un.str[2], un.str[3]);

Whoops -- stupid assumption -- make that:

int i;

...

for (i = 0; i < SZLONG; i++)
   printf("%c ", un.str[i]);
printf("\n");

Sorry,

-Phil

--------
Phil Morris (pmorris@dgi0.bbn.com)
Disclaimer: ME? I'm only a non-smoking cat; can't believe a word I meow.

mwm@DECWRL.DEC.COM (Mike Meyer, My Watch Has Windows) (09/21/90)

>> >The intent was to determine the byte ordering of the machine the code was
>> >running on.
>> >
>> >	<mike
>> 
>> How about something as simple as:

I goofed. Should have stated the problem, rather than trying to
demonstrate it.

The problem is with order of evaluation. You generally have to choose
between three options:

1) Don't allow side effects, so that normally it isn't critical.

2) Specify it exactly, so you can predict side effects.

3) Leave it "undefined".

This problem arises in all expressions, not just function arguments.
There are problems with all three solutions. I was curious as to which
was going to be taken in this case.

	<mike

jlg@lanl.gov (Jim Giles) (09/21/90)

From article <9009202104.AA21146@raven.pa.dec.com>, by mwm@DECWRL.DEC.COM (Mike  Meyer, My Watch Has Windows):
> [...]
|> You've as yet to answer my questions about garbage collection, and
|> what constructs would be provided for working with sequences. As an
|> addendum to that list, I'd be interested to know how you would recode
|> the following C sequence with undefined behavior using your proposed
|> constructs so that it would do what it's author intended:
|>
|>       long    word ;
|>       char    *pointer ;
|>
|>       word = '1234' ;
|>       pointer = (char *) word ;
|>       printf("%c %c %c %c\n", pointer++, pointer++, pointer++, pointer++) ;
|>
|> The intent was to determine the byte ordering of the machine the code was
|> running on.

Answered via email.  I can post the answer here if there's any interest.

J. Giles

jlg@lanl.gov (Jim Giles) (09/21/90)

From article <9009202211.AA15218@encore.encore.com>, by pmorris@BBN.COM (Phil Morris):
> [...]
|> How about something as simple as:
|>
|> #define SZLONG sizeof(long)
|>
|>       union xxx {
|>               long word;
|>               char str[SZLONG];
|>       } un;
|>
|>       un.word = '1234';
|>       printf("%c %c %c %c\n", un.str[0], un.str[1], un.str[2], un.str[3]);
|>
|>
|> It works on my machine, and the hypothetical language under discussion can handle these
|> constructs (except he didn't mention how to do sizeof(xxx)).


Actually, I prefer sizeof() to be measured in bits, not bytes.  And
I prefer 'union' to be non-storage order dependent.  I actually prefer
the use of 'mapping' declarations (such as I described in the first
article I posted with this title).  Other than that, your solution
is much like the one I sent via email to the person who posted the
problem.

J. Giles

jlg@lanl.gov (Jim Giles) (09/21/90)

From article <9009202211.AA15218@encore.encore.com>, by pmorris@BBN.COM (Phil Morris):
> [...]
|> How about something as simple as:
|>
|> #define SZLONG sizeof(long)
|>
|>       union xxx {
|>               long word;
|>               char str[SZLONG];
|>       } un;
|>
|>       un.word = '1234';
|>       printf("%c %c %c %c\n", un.str[0], un.str[1], un.str[2], un.str[3]);
|>
|>
|> It works on my machine, and the hypothetical language under discussion can handle these
|> constructs (except he didn't mention how to do sizeof(xxx)).


Actually, I prefer sizeof() to be measured in bits, not bytes.  And
I prefer 'union' to be non-storage order dependent.  I actually prefer
the use of 'mapping' declarations (such as I described in the first
article I posted with this title).  Other than that, your solution
is mech like the one I sent via email to the person who posted the
problem.

J. Giles

jlg@lanl.gov (Jim Giles) (09/21/90)

From article <9009202236.AA21344@raven.pa.dec.com>, by mwm@DECWRL.DEC.COM (Mike  Meyer, My Watch Has Windows):
> [...]
|> I goofed. Should have stated the problem, rather than trying to
|> demonstrate it.
|>
|> The problem is with order of evaluation. You generally have to choose
|> between three options:
|>
|> 1) Don't allow side effects, so that normally it isn't critical.
|>
|> 2) Specify it exactly, so you can predict side effects.
|>
|> 3) Leave it "undefined".
|>
|> This problem arises in all expressions, not just function arguments.
|> There are problems with all three solutions. I was curious as to which
|> was going to be taken in this case.


Oh, well.  If it's a question of side-effects.  I oppose them outright.
As a practical matter, you can convince most programmers that operators
should not have side-effects.  But, when it comes to functions, they
always demand to be allowed side-effects.

So, functions should be locally declared (in the interface or prototype
or whatever you choose to call it) if they have side-effects.  Those
functions that don't, can be evaluated in any order in the expression.
Functions that _do_ have side-effects should be evaluated in a specific
order that can be determined by the user from looking at the source.

Unfortunately, nested function calls and even funny operator precidence
rules can require certain constraints on the evaluation order.  The
compiler has no trouble discovering these, but the user might.

This can be hard for the user if several of the operators are "left-
associative" while several others are "right-associative".  Clearly,
operator precidence should be consistent.  And there should be as
few different precidence levels as possible.

In any case, the order of side-effects _should_not_ be left undefined.

J. Giles

michels@cs.UAlberta.CA (Michael Michels) (09/21/90)

To put this discussion about C back in the "futures" group I
would like to present here my two words or 64 bits on my machine
as Jim Giles would prefer it :-).

I like C because it alows me to do things I want and am paid to do
in nice and easy way.  If I want my code anylized to death then I
can use 'lint' or "syntax" options on the compiler.  If I want my code
stay as it is my C compiler lets me do it.  Why should I be forced
to write my programs in a cryptic form just because someone else
has a different opinion.

As already mentioned before there are other
languages that are better for other tasks but for writing system
routines and compilers I cannot imagine a better one.
I would like to se Jim Giles to write that sort of code in any
of the solutions that he proposed :-).

Actually that all may change now that the ANSI started to play with it.
I thing that one "ADA" is enough :-). Besides C like any other
language evolves and changes as the times change.
To standarize it and stop it from evolving is the same as killing it
and I would not like to see it happen.

Other aspect that was touched on was the "wonderfull" role of optimizers.
Sure, they are getting "better" all the time but when I hear that
the compiler ignores the "register" modifier I get upset.  It is like
someone telling you that automatic optimazation can do better job than you can.
My view on this subject is that if someone wants to drive TOYOTA let them
but if I want to build a FERRARI I should be allowed as well.

In any futaristic languages I would like to see the same things that
I like about C.  I want to be able to write my programs that do the job
and are short and easy to understand.  I guess that is the same thing that
Gorge Orwell said about writing. Why should writing programs be any different?

Michael Michels

bzs@WORLD.STD.COM (Barry Shein) (09/21/90)

Jim Giles,

Have you looked closely at PL/I? I could probably dig up some old,
good, textbook recommendations. It has almost everything exactly as
you're describing (including the declaration of dynamic, recursive
objects.)

PL/I was definitely a bloated language with far too many rules
violating "the law of least astonishment", but it did have a lot of
good ideas. I think PL/I's main problem was that it was perceived as a
"pig" of a language in a time when resources were much more dear. The
compiler probably needed almost 1MB to run! (in a day when large
mainframes had 2MB of real memory and tried to run 100 logged in
users, this made it anathema at university computing centers and the
rest was history.)

The bit-oriented sizeof() is also in PL/I (which is what touched off
this remark.) In fact, it worked both ways:

	declare array fixed bin(31);

declared a 32-bit integer array (you always omitted the sign bit if
there was one, strange.) In fact you could declare any odd-sized
object pretty much:

	declare I fixed bin(11);

and it would do its arithmetic on such objects constrained to that
many bits. How? by turning all ops into function calls and linking
into a variable bit library...like I said, bloated...it took an expert
to get mostly native code out of PL/I as some small indiscretion might
make it turn all your code into library function calls.) But it did
work in all sorts of crazy situations (e.g. if you recompiled code
with bit-level declarations that didn't match the current hardware, it
would still work, albeit slowly.)

Anyhow, doomed to repeat it...as they say.

        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs@world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

bzs@WORLD.STD.COM (Barry Shein) (09/22/90)

Probably an unnecessary correction but...

>	declare array fixed bin(31);
>
>declared a 32-bit integer array (you always omitted the sign bit if
>there was one, strange.)

Should have been something like:

	declare array(MAXARRAY) fixed bin(31);

or thereabouts. My PL/I is rusty, but it ain't that rusty...

	DCL FOO(MAXFOO) FIXED BIN(31) CONTIGUOUS BASED (THING);

heh heh.

        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs@world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

jlg@lanl.gov (Jim Giles) (09/22/90)

From article <michels.653892230@menaik>, by michels@cs.UAlberta.CA (Michael Michels):
> [...]
> I like C because it alows me to do things I want and am paid to do
> in nice and easy way.  [...]

That's exactly why I don't like C.  It doesn't let me do anything I
find useful without first droping myself to my knees and scraping
around on the implementation level of a computer model (which doesn't
even match the _real_ machine that I'm on).

> [...]                                       Why should I be forced
> to write my programs in a cryptic form just because someone else
> has a different opinion.

Hear, Hear!!  I like a programming language to allow me to say what I
mean - not to have to convert my algorithm into something cryptic.
However, C forces me to encrypt my programs - I can't use arrays, I
have to encrypt them as pointers; I can't use dynamic memory, I have
to encrypt them as pointers; I can't use mapping (run-time equivalence),
I have to encrypt them as pointers; etc....  And that's just the problem
with _pointers_ - C promotes other difficulties as well.  And, with as
many _different_ things all being encrypted as pointers, how can I hope
to easily decipher someone else's code to determine which of these
concepts he intends his variables to represent?

> [...]
> I would like to se Jim Giles to write that sort of code in any
> of the solutions that he proposed :-).

I can't see that it could be anything but easier.  Being able to say
what you _mean_ - and not have to squash down into the confines of
an inadequate language model can only be an improvement.

> [...]
> My view on this subject is that if someone wants to drive TOYOTA let them
> but if I want to build a FERRARI I should be allowed as well.

Of course, the proper automobile analogy for C is a '72 Jeep CJ (with the
wrong transmission).  It's clunky and uncomfortable.  It does poorly on
the road (that is, as a machine independent portable language).  It has
pretenses of being a all terrain verhicle, but it only does well on its
home turf - byte addressed, 32-bit word, CISC architectures with VAX style
structure.

> [...]
> In any futaristic languages I would like to see the same things that
> I like about C. [...]

And, I don't want to see anything I don't like about C.
(By the way, it's "futuristic".  I spell badly too, but if I
don't complain someone else will. :-)

> [...]             I want to be able to write my programs that do the job
> and are short and easy to understand.  [...]

Well, at least we agree about something.

J. Giles

jlg@lanl.gov (Jim Giles) (09/22/90)

From article <9009211543.AA25709@world.std.com>, by bzs@WORLD.STD.COM (Barry Shein):
> [...]
> PL/I was definitely a bloated language with far too many rules
> violating "the law of least astonishment", [...]

I answered the bulk of this message over email.  However, it is my
that the worst violator of "the law of least astonishment" among
currently popular languages is C - by a _very_ long margin.

J. Giles

isr@rodan.acs.syr.edu (Michael S. Schechter - ISR group account) (09/22/90)

In article <63613@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>From article <1990Sep20.161852.22977@rodan.acs.syr.edu>, by isr@rodan.acs.syr.edu (Michael S. Schechter - ISR group account):
>> Now in C it's easy to go MyPtr=(Pointer)0x100  [...]
>Well, yes, you can do that in some extended versions of C no doubt.
>You could extend Fortran to do such things too.  But, _standard_
I'm sorry, ANSI can dream all they want, 
I think you'll find that most people will
agree that K&H is 'standard' C, NOT ANSI.
That's why the compiler mfr's advertise that
they support ANSI, because it's NOT standard enough
to be assumed.
AND AS YOU SAY:  "You ****could**** extend Fortran"
Yeah, but i do real work, not write preproccessors, that's better
left as exercises for students. And since virtually every
****EXISTING**** C will allow it in some way, why bother doing
Satan's work and extending Fortran?

>C leaves the result of that cast undefined.  For example, on a
>segmented machine, the above statement _may_ put the number 0x100
Out of context, your point is valid,  however I was talking about hardware
addresses, presumably, the system programmer or real-time programmer
(ones who _must_ access hardware, not just use system calls)
knows what must be done to get valid pointers.

Enough.
I quit.


--
Mike Schechter, Computer Engineer,Institute Sensory Research, Syracuse Univ.
InterNet: Mike_Schechter@isr.syr.edu isr@rodan.syr.edu Bitnet: SENSORY@SUNRISE

jcburt@ipsun.larc.nasa.gov (John Burton) (09/22/90)

In article <63722@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>From article <michels.653892230@menaik>, by michels@cs.UAlberta.CA (Michael Michels):
>> [...]                                       Why should I be forced
>> to write my programs in a cryptic form just because someone else
>> has a different opinion.
>
>Hear, Hear!!  I like a programming language to allow me to say what I
>mean - not to have to convert my algorithm into something cryptic.
>However, C forces me to encrypt my programs - I can't use arrays, I
>have to encrypt them as pointers; I can't use dynamic memory, I have
>to encrypt them as pointers; I can't use mapping (run-time equivalence),
> [...]
>
Excuse me? are you REALLY saying you CAN'T use arrays in C without
resorting to pointers??? I'm confused. Does this mean that I can't 
use the statement
  a[i][j] = 123;
in a C program. If thats so, you'd better tell my C compilers that
(SunOS 4.1 C compiler, Turbo C, Turbo C++...all using ANSI standard
 mode)
Can't use dynamic memory without using pointers? Again, I assume that's
not *really* what you mean...I can send you code to create 2,3,...
whatever Dimensioned array you want from the heap (using malloc &
calloc) that can be used in any situation where you use a statically
declared one (I have yet to find a situation where it doesn't work)
using exactly the same syntax. It works on all the compilers mentioned
above using the ANSI standard mode. I use this routine regularly in
the image processing work I do...the only problem I've run into is
running out of memory for 2-D arrays larger than 1024x1024 of type
float.

Obviously I have misinterpreted what you are saying, perhaps you could
clarify?


John Burton

jlg@lanl.gov (Jim Giles) (09/22/90)

From article <1990Sep21.193403.20381@abcfd20.larc.nasa.gov>, by jcburt@ipsun.larc.nasa.gov (John Burton):
> In article <63722@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> [...]
>>Hear, Hear!!  I like a programming language to allow me to say what I
>>mean - not to have to convert my algorithm into something cryptic.
>>However, C forces me to encrypt my programs - I can't use arrays, I
>>have to encrypt them as pointers; I can't use dynamic memory, I have
>>to encrypt them as pointers; I can't use mapping (run-time equivalence),
>> [...]
>>
> Excuse me? are you REALLY saying you CAN'T use arrays in C without
> resorting to pointers??? I'm confused. Does this mean that I can't 
> use the statement
>   a[i][j] = 123;

Syntactic suger.  Try sending the array to a subroutine as a parameter.
Then you'll find out what the array _really_ is.  Try referencing the
array in the subroutine with the above statement - lots of luck.

What I want is an array that _stays_ an array when I pass it around.
Arrays that have to be locally declared or global are practically
useless for programs which do any serious array manipulation.  For
them, arrays in C are not anthing but another name for pointer.

> Can't use dynamic memory without using pointers? Again, I assume that's
> not *really* what you mean...I can send you code to create 2,3,...
> whatever Dimensioned array you want from the heap (using malloc &
> calloc) that can be used in any situation where you use a statically
> declared one (I have yet to find a situation where it doesn't work)
> using exactly the same syntax. It works on all the compilers mentioned
> above using the ANSI standard mode. I use this routine regularly in
> the image processing work I do...the only problem I've run into is
> running out of memory for 2-D arrays larger than 1024x1024 of type
> float.

Ok.  Now give me code in which those declarations resemble static
array declarations in any significant way.  The declaration of a
dynamic object should be _identical_ to the declaration of a static
object of the same type (with the possible exception of place-holders
for the information to be filled in at allocation time).

Once you've failed to do that.  Then you can tell me how the compiler
knows that the pointers (hidden under your clever declarations) are to
dynamic objects and are not aliased to _any_ other pointers.  In order to
get any kind of efficiency, the compiler must be able to detect aliasing
so that it can optimize non-aliased references.  However, as far as the
compiler knows, the result of a malloc() or calloc() call is just any old
pointer, could be aliased to anything.

No, the techniques you are advocating (which I've seen before and
dismissed for the same reasons) merely hide the facts.  This is
doubly cryptic - you are pretending to be arrays when you're really
using pointers, and you are using pointers because the language doesn't
really have arrays.  I still prefer to tell the compiler straight out
what it is I want to do.

J. Giles

jcburt@ipsun.larc.nasa.gov (John Burton) (09/22/90)

In article <63751@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>From article <1990Sep21.193403.20381@abcfd20.larc.nasa.gov>, by jcburt@ipsun.larc.nasa.gov (John Burton):
>> In article <63722@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>> [...]
>>>Hear, Hear!!  I like a programming language to allow me to say what I
>>>mean - not to have to convert my algorithm into something cryptic.
>>>However, C forces me to encrypt my programs - I can't use arrays, I
>>>have to encrypt them as pointers; I can't use dynamic memory, I have
>>>to encrypt them as pointers; I can't use mapping (run-time equivalence),
>>> [...]
>>>
>> Excuse me? are you REALLY saying you CAN'T use arrays in C without
>> resorting to pointers??? I'm confused. Does this mean that I can't 
>> use the statement
>>   a[i][j] = 123;
>
>Syntactic suger.  Try sending the array to a subroutine as a parameter.
>Then you'll find out what the array _really_ is.  Try referencing the
>array in the subroutine with the above statement - lots of luck.
>
>What I want is an array that _stays_ an array when I pass it around.
>Arrays that have to be locally declared or global are practically
>useless for programs which do any serious array manipulation.  For
>them, arrays in C are not anthing but another name for pointer.

By the same token, in reality ALL variables in C, FORTRAN, Pascal, etc
are simply pointers. They do not contain a value themselves, they point
to a memory location that contains the value. Also in reality an array
is just a sequence of memory locations. How you access those locations
is a matter of preference. Any way you do it you ultimately reference
a memory location (or register) and obtain the value there. Explicitly
defined pointers simply are indentifiers pointing to a memory location
which contain the address of another memory location. 

One point that intrigues me is _how_ you plan to pass your array around
so that it _stays_ an array without using pointers. The primary choices
for paramter passing seem to be either make a copy of the data within
the subroutine (pass by value or copy-in) or tell the routine _where_
the data is stored (pass by reference), i.e. you either pass the value
or a pointer to a value. My interpretation of what you say is that you 
want to eliminate passing the pointer (pass by reference). This does
have the advantage of not allowing anyone else access to the copy of 
the array as the subroutine works on it, but it  has the disadvantage
of requiring copies to be made of the array. I'm not convinced that 
making a copy of a 4 meg array can be done faster than passing a pointer.

>
>> Can't use dynamic memory without using pointers? Again, I assume that's
>> not *really* what you mean...I can send you code to create 2,3,...
>> whatever Dimensioned array you want from the heap (using malloc &
>> calloc) that can be used in any situation where you use a statically
>> declared one (I have yet to find a situation where it doesn't work)
>> using exactly the same syntax. It works on all the compilers mentioned
>> above using the ANSI standard mode. I use this routine regularly in
>> the image processing work I do...the only problem I've run into is
>> running out of memory for 2-D arrays larger than 1024x1024 of type
>> float.
>
>Ok.  Now give me code in which those declarations resemble static
>array declarations in any significant way.  The declaration of a
>dynamic object should be _identical_ to the declaration of a static
>object of the same type (with the possible exception of place-holders

Why should it be identical? what purpose would that serve save hiding
the fact that the machine is doing two different operation (allocating
space at compile time vs. allocating space at run time). 

>
>Once you've failed to do that.  Then you can tell me how the compiler
>knows that the pointers (hidden under your clever declarations) are to
>dynamic objects and are not aliased to _any_ other pointers.  In order to
>get any kind of efficiency, the compiler must be able to detect aliasing
>so that it can optimize non-aliased references.  However, as far as the
>compiler knows, the result of a malloc() or calloc() call is just any old
>pointer, could be aliased to anything.
>
>No, the techniques you are advocating (which I've seen before and
>dismissed for the same reasons) merely hide the facts.  This is
>doubly cryptic - you are pretending to be arrays when you're really
>using pointers, and you are using pointers because the language doesn't
>really have arrays.  I still prefer to tell the compiler straight out
>what it is I want to do.

As far as a machine goes, there is no such thing as array. So whats the 
difference between *me* "hiding  the facts" and the compiler "hiding the
facts". Personally I prefer to know whats going on as opposed to
handing the job to the compiler and hopes it does what i think it does.

If I want to use pointers as opposed to arrays, or vice versa, that should
be my choice, NOT a restriction of the language. This whole discussion
boils down to a difference of opinion. I hold that a programmer should
be allowed the freedom to create programs in whatever way he/she chooses
and be provided the tools to do the job. What you're proposing significantly
limits this freedom of choice. What advantages does this limiting provide?
By your own words, None...both methods (supposedly) can be used to create 
the same end product. 

John Burton

"Save me from those who seek to save me from myself"

bson@AI.MIT.EDU (Jan Brittenson) (09/22/90)

From info-futures-request@encore.com Fri Sep 21 19:06:58 1990
Return-Path: <info-futures-request@encore.com>
Received: from encore.encore.com by life.ai.mit.edu (4.1/AI-4.10) id AA01650; Fri, 21 Sep 90 19:06:52 EDT
Received:  by encore.encore.com (5.64/25-eef)
	id AA16278; Fri, 21 Sep 90 18:41:01 -0400
Received: from ucbvax.Berkeley.EDU by encore.encore.com with SMTP (5.64/25-eef)
	id AA16245; Fri, 21 Sep 90 18:40:43 -0400
Received: by ucbvax.Berkeley.EDU (5.63/1.42)
	id AA10526; Fri, 21 Sep 90 15:34:13 -0700
Received: from USENET by ucbvax.Berkeley.EDU with netnews
	for info-futures-mail@encore.com (info-futures@encore.com)
	(contact usenet@ucbvax.Berkeley.EDU if you have questions)
Date: 21 Sep 90 22:21:52 GMT
From: sdd.hp.com!uakari.primate.wisc.edu!abcfd20.larc.nasa.gov!ipsun.larc.nasa.gov!jcburt@ucsd.edu  (John Burton)
Organization: NASA Langley Research Center, Hampton, VA  USA
Subject: Re: C's sins of commission (was: (pssst...fortran?))
Message-Id: <1990Sep21.222152.22479@abcfd20.larc.nasa.gov>
References: <1990Sep21.193403.20381@abcfd20.larc.nasa.gov>, <63751@lanl.gov>
Sender: info-futures-request@encore.com
To: info-futures@encore.com
Status: R

In article <63751@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

 >> Excuse me? are you REALLY saying you CAN'T use arrays in C without
 >> resorting to pointers??? I'm confused. Does this mean that I can't 
 >> use the statement
 >> a[i][j] = 123;

 > Syntactic suger.  Try sending the array to a subroutine as a parameter.
 > Then you'll find out what the array _really_ is.  Try referencing the
 > array in the subroutine with the above statement - lots of luck.


bar(some_array)
  int some_array[3][4];
{
  some_array[0][1] += some_array[1][0];
}

snorf()
{
  int foo[3][4];

  bar(foo);
}


"Look Mom, no pointers!!!"

bson@AI.MIT.EDU (Jan Brittenson) (09/22/90)

   This message is quite long. I apologize if you think I'm filling up
your mailbox with junk flamage.

Jim Giles:

 >> 	1. Pointer range check (to see if a buffer crosses page
 >> 	   boundaries, for instance).

 > Well, without pointers, why do you need a pointer range check?  Computing
 > the range of something that doesn't exist seems a little silly.

   Pointers _are_ addresses, and nothing else. Regardless of whether
they include segment information, or other information relevant only
to non-state-of-the-art architectures. The "address" idiom covers all
information relevant to locating the addressee. Pointers may be
interpreted differently, depending on the datum, though. On a pdp-10,
not only is a word address necessary, but also a character index
within the word if it's a character pointer.

 > I think you had in mind casting the pointer to an int and looking at
 > the raw address - the ANSI standard leaves this process undefined.

   You're right, that was my intent with the buffer example. But
unless _somehow_ a means of retrieving the address of the buffer - a
pointer to it - is provided, the page boundary check can not be done
_at all_, defined or undefined, portable or not. To me the simple
C-style casting is preferable to some obscure union declared miles
away, since pointer-to-int casting at least tells me what is going on.
Besides, in almost any implementation casting a pointer to an int of
sufficient size and then later back, will yield the original pointer.
I most certainly would refuse to use a compiler for which this
assumption wasn't correct. If the machine hardware is such that it's
not a reasonable assumption to make - say on a Lisp Machine, for
instance - then, well, forget about portable C code.

 > Now, if you're talking about non-standard extensions to C which would
 > allow you to do this stuff - then any other language can contain the
 > same non-standard extensions.

   Extensions, or non-uptight about pointer typing, call it whatever
you like.

 >> [...]
 >> 	2. Calculate physical addresses for DMA controllers.

 > Why should I care?  The system/environment should be able to give me the
 > address if I need it.  But, how do I use a raw address anyway? _Standard_
 > C pointers don't give me any such access.  Access to such things as
 > hardware controllers should be privilaged to the system - and _it_
 > can contain machine dependent code - like assembly.

...or like C, which most certainly is more defined than assembler!

   I'm not sure what kind of programming you're talking about. There
are languages which are defined similar to what you have described
here, but few outside academia use them - Euclid for instance.
According to my experience, programmers can be put into either of two
major groups: application programmers and system programmers. While
the former use various 4G and other kinds of application-oriented
tools - such as XYZ-SQL, COBOL, or Prolog, to write applications, the
latter do the system-dependent stuff, such as database, server, and
support tool implementations - mostly things that are system-dependent
to start with. Neither of these groups would have particular use for
your proposed language - the application people would ask you what
syntax applies to selecting records in a database, while the system
people would ask you how to set up 2D bitblt operation in a graphics
device, or how to create a channel program in a mainframe environment.

   For sure, some of the work done by system folks falls somewhere
in-between. But I seriously doubt programming efficiency or
maintenance would be improved to any degree worth mentioning by
forcing everyone to learn Yet Another Language and an entirely new set
of idioms when the previous ones are considered quite sufficient.

   Can you give me one example of a project you or a first-hand
reference has been involved in that falls between the two major
categories I've outlined above, and which by itself constitutes a
project large enough to warrant not simply making do with what you've
got and are used to, and possibly for an employer to require
experience with your language as desirable?

 >> [...]  > 	3. Sort a linked list on addresses of some data
 >> pointed to > 	 from within the node. Or to keep it sorted as new
 >> (addresses > 	 of) data is added.

 > I guess you'll have to tell me how this differs from sorting on the
 > index of the data within an array or sequence.  Since the sequence is
 > dynamic, ...

   So how do I know where a certain index resides? I guess this would
be an undefined topic - although in this example it would be well
defined in C, since the buffers would be of the same type (i.e.
arbitrarily dimensioned character vectors).

 > ... you can add all the elements you wish - and still sort on index.

   How do I know that the addresses of the previous indexes do not
change as new elements are added? This would have to be undefined, as
well.

 >> [...]
 >> 	4. Implement malloc()/free().

 > When I found out that the ANSI C standard prohibited comparing/subtracting
 > pointers to different objects, I pointed out on comp.lang.c that malloc()
 > and free() could not not be written in _standard_ C.  They agreed with me.

   No doubt you're correct. Implementation is fairly trivial in
"nonstandard" C, and I fail to see how it could be made easier or more
"defined" without any pointers (i.e. explicit object addresses) at all?

 >> [...]
 >>    I'm curious as to why so many programmers engage themselves in hot
 >> debates over how to best implement strings. String processing is
 >> proportionally insignificant - the first thing done after a read is
 >> usually a tokenization, either through hand-written code or the output of
 >> a lexical front-end generator.  [...]

 > Tokens are also strings ....  Symbol tables also contain strings
 > among other stuff).

   First, tokens are best handled as small integers or enumerated
types, while symbol tables are commonly hashed. Other than converting
strings-to-int-tokens and symbols-to-hash-values, very little string
processing is done. Second, take a look at an assembler or compiler,
and you'll be amazed at the total lack of string operations. (Apart
from the lexical front-ends, of course.)

 > Text processors usually don't have much data that isn't part of one
 > string or another.

   Granted, but then for most text processors, a simple string or any
other sequence isn't enough to store the text and all relevant
information. A couple of years ago I wrote a type-setting system - it
should qualify as a "text processor" as good as any. The first thing
done with the incoming text was chopping it up in segments containing
font-pitch-kerning-etc-info unique to the segment. The actual
characters of the segment weren't used again until it was time to
print them. _All_ work was performed on the remaining segment
information, the lists of segments, and lists of lists of segments. Of
all hairy things done, _none_ involved character data. (And rarely any
duplication either, for that matter.)

   Let's distinguish between "defined," and "portable." Even if a
program adheres to a formal definition, there is no guarantee that
it's going to run on every other system that adheres to the same
definition. In the end, common sense and portability constraints will
have to lead all development.

							-- Jan Brittenson
							   bson@ai.mit.edu

jlg@lanl.gov (Jim Giles) (09/25/90)

From article <9009220030.AA03386@rice-chex>, by bson@AI.MIT.EDU (Jan Brittenson):
> [...]
| bar(some_array)
|   int some_array[3][4];
| {
|   some_array[0][1] += some_array[1][0];
| }
|
| snorf()
| {
|   int foo[3][4];
|
|   bar(foo);
| }
> 
> 
> "Look Mom, no pointers!!!"

A procedure which claime to be able to do array manipulation and
yet only works on a _fixed_ array size is useless.  When I pass
the _other_ array (the one you left out: int bilvet [4][3]), the
procedure "bar" with mangle it.

Try again.

J. Giles

jlg@lanl.gov (Jim Giles) (09/25/90)

From article <9009220848.AA00539@wheat-chex>, by bson@AI.MIT.EDU (Jan Brittenson):
> [...]
> Jim Giles:
> 
>  >> 	1. Pointer range check (to see if a buffer crosses page
>  >> 	   boundaries, for instance).
> 
>  > Well, without pointers, why do you need a pointer range check?  Computing
>  > the range of something that doesn't exist seems a little silly.
> 
>    Pointers _are_ addresses, and nothing else.  [...]

Yes, but you're missing the point.  Surely what's wanted above is
a reliable method to allocate buffers that don't contain page
boundaries or other unpleasant hardware dependent things.  This
is clearly the job of the memory manager - to give the programmer
adequate support for machine dependent problems of this kind.
The programmer should merely have to allocate memory (with the
right mode flags on his request, and the manager should return
an allocated object with the right memory boundary properties
(whether this means "doesn't cross a boundary", or "starts on
a boundary", etc.).  And, as I've pointed out before, dynamic
memory allocation probably should not involve programmer visible
pointers.

> [...]
>  > Now, if you're talking about non-standard extensions to C which would
>  > allow you to do this stuff - then any other language can contain the
>  > same non-standard extensions.
> 
>    Extensions, or non-uptight about pointer typing, call it whatever
> you like.

Ah, but I'm still not convinced that the language of the future should
even _contain_ pointers.  No one has yet provided an example of a user
level application that _requires_ them.  System level applications also
_mostly_ don't need them.  And, like GOTOs in flow control, pointers
in data structuring tend to result in spaghetti.  If this is deliberate,
it on the programmers head.  If someone just get some wires crossed though,
I'm willing to apportion at least _some_ of the blame on the language
feature itself.

> 
>  >> [...]
>  >> 	2. Calculate physical addresses for DMA controllers.
> 
>  > Why should I care?  The system/environment should be able to give me the
>  > address if I need it.  But, how do I use a raw address anyway? [...]
> [...]
> ...or like C, which most certainly is more defined than assembler!

Once again, you've missed the point.  Not even the systems programmer
that has to write the access routines for the DMA controller _cares_
what its address is.  What he wants is to be able to say "DMA_port=command"
whenever he needs to.  Why not have the compiler (or the loader) contain
a list of all the hardware-specific addresses with some mnemonic names
that the programmer can just declare and use?  Why does the programmer
have to mess with addresses at all?

> [...]
>                Neither of these groups would have particular use for
> your proposed language - the application people would ask you what
> syntax applies to selecting records in a database, [...]

I don't understand the objection.  I give the programmer _more_ clear,
explicit, direct, data structuring tools that C has, remove pointers
(which still haven't been proven useful), and you claim it will be harder
to use.  The database person would probably use whatever syntax he
_presently_ uses except without the need for dereferencing on the linked
list types of stuff.  Give me a _specific_ example of what you think
would be hard.

> [...]                                             while the system
> people would ask you how to set up 2D bitblt operation in a graphics
> device, [...]

Again, the same way they do now - except the graphics device itself
would now be a named object and the programmers would no longer have
to pretend the absolute address of it was somehow part of their task.

> [...] or how to create a channel program in a mainframe environment.

Oh?  You can do that in C?  On the Cray for example, channel programs
aren't even written in the same _machine_ language.  The C compiler
doesn't even generate channel code.  Not at all.  (Maybe there's a
different C compiler that does, but I've not seen it.)  What does
this have to do with the discussion about whether pointers (or any
other feature) should be incorporated in a programming language?

> [...]
>    For sure, some of the work done by system folks falls somewhere
> in-between. But I seriously doubt programming efficiency or
> maintenance would be improved to any degree worth mentioning by
> forcing everyone to learn Yet Another Language and an entirely new set
> of idioms when the previous ones are considered quite sufficient.

Yes, I can see your point.  Once you've learned a language it _is_
kind of like a trap.  You begin to see only how _that_ language works
and not how to solve your _actual_ problems at all.  This kind of
thing has happened _many_ times before in history.  The great modern
bridges were not built by the same people who built the great ancient
ones - or even their intellectual descendents.  Stone masons couldn't
_ever_ span distances over 200 feet - yet they considered their work
to be "quite sufficient" for all practical bridge building work.

Yes, perhaps the current generation of C programmers will have to retire
before a fresh group - without the biases - can address the problem from
new and more effective points of view.

> [...]
>    Can you give me one example of a project you or a first-hand
> reference has been involved in that falls between the two major
> categories I've outlined above, and which by itself constitutes a
> project large enough to warrant not simply making do with what you've
> got and are used to, and possibly for an employer to require
> experience with your language as desirable?

Yes.  Our organization is currently switching _TO_ C/UNIX from something
else.  All the trouble you predict is indeed upon us - the retraining, the
expense, the incidental blunders along the way.  Unfortunately, after all
this, is it becomming clear that C/UNIX are worse that what we had.  Fewer
features, harder to use, _SLOW_.  Those of us that _knew_ C/UNIX before
the conversion warned that this would be the case.  But, both users and
management succumbed to the hype. (Note, there are still people here that
think we did the right thing.  In a couple of years all this trauma will
be behind us, they say.  Then, we will have advantages of a industry
standard system, they say.  Unfortunately, we had to add so much non-UNIX
stuff to the system - just to make it marginally acceptable - that
switching to any other UNIX system later would be just as traumatic as
what we are doing now.  Oh, well)

>  >> [...]  > 	3. Sort a linked list on addresses of some data
>  >> pointed to > 	 from within the node. Or to keep it sorted as new
>  >> (addresses > 	 of) data is added.
> 
>  > I guess you'll have to tell me how this differs from sorting on the
>  > index of the data within an array or sequence.  Since the sequence is
>  > dynamic, ...

I don't understand your answer on this.  It seems to me that the
objections you raise are _MORE_ applicable to pointers, not less.

>  >> [...]
>  >> 	4. Implement malloc()/free().
> 
>  [... I said standard C can't do it ...]
>
>    No doubt you're correct. Implementation is fairly trivial in
> "nonstandard" C, and I fail to see how it could be made easier or more
> "defined" without any pointers (i.e. explicit object addresses) at all?

Actually, the implementation is quite trivial. period.  I don't see
why we have to mung-up the language design to do something which should
be done in assembly for efficiency anyway.  Even so, the system kernel
starts up a tool called the memory manager which handles all the rest
of memory as a single large array, using indices relative to wherever
the kernel ends.  Where's the user-visible pointers in that?  The run-time
memory manager for the I/O library and application programs works the
same way - it has all the memory that the system allocated for the heap
in one large array.  The only piece that needs to have _raw_ pointer access
is the part that performs the alias - that is, the copy of the reference
when the allocation process finishes (or, when the deallocation process
starts).  This little fragment amounts to less than a half-dozen instructions
on most machines -surely you can't object to _that_ much assembly in an
operation that is _blatantly_ machine dependent.

> [... discussion about text string efficiency.  I refuse to argue     ...]
> [... any further.  It is clear that _some_ people regard it as an    ...]
> [... important issue.  You don't.  I regard it as a subset of the    ...]
> [... sequence type construct - Which has other applications: most    ...]
> [... linked lists in C that I encounter would really have been _much_...]
> [... more efficient as sequences - for example.                      ...]

J. Giles

bson@AI.MIT.EDU (Jan Brittenson) (09/25/90)

Jim Giles:

>From article <9009220848.AA00539@wheat-chex>, 
    by bson@AI.MIT.EDU (Jan Brittenson):

 > Surely what's wanted above is a reliable method to allocate buffers
 > that don't contain page boundaries or other unpleasant hardware
 > dependent things.  This is clearly the job of the memory manager - to
 > give the programmer adequate support

   Who is the `programmer' - the application programmer or the system
programmer? You don't seem to have a very clear concept of who does
what. I for one wouldn't touch 4G programs with 5-ft pole; the same
can be heard from the 4G people. Which isn't a big surprise, different
programmers have different concerns. And at least in my experience
there have easily 10-15 4G/COBOL/you-name-it application programmers
to one or two system programmers.

 > Ah, but I'm still not convinced that the language of the future should
 > even _contain_ pointers.

   I'm not even convinced computers of the future will execute
instructions sequentially, just because this happens to be the case
today.

 > And, like GOTOs in flow control, pointers in data structuring tend to
 > result in spaghetti.  If this is deliberate, it on the programmers
 > head.  If someone just get some wires crossed though, I'm willing to
 > apportion at least _some_ of the blame on the language feature itself.

   Yeah, I tend to agree here. No one has said that system programming
is easy to learn or get accustomed to, and regardless, constitutes a
mere fraction of all programming effort, as well. You'll find a lot of
system hackers on the networks writing free software of course - but I
think they're a considerable minority at their respective workplaces,
or else they're students.

 > What [the programmer] wants is to be able to say "DMA_port=command"
 > whenever he needs to.  Why not have the compiler (or the loader)
 > contain a list of all the hardware-specific addresses with some
 > mnemonic names that the programmer can just declare and use?  Why does
 > the programmer have to mess with addresses at all?

   Now this is a catch-22 as good as any! Who is to implement the
semantics of "DMA_port=command"? I mean, I can type it at my terminal
as many times as I like, even check the syntax, without anything
happening to the DMA_port.

 > The database person would probably use whatever syntax he _presently_
 > uses except without the need for dereferencing on the linked list
 > types of stuff.  Give me a _specific_ example of what you think would
 > be hard.

SELECT NAME=some_name AND SHOESIZE=10.5 FROM Register_1


You tell an applications programmer that he or she is to use something
like:

    execute_sql(CMD_SELECT_ROM, 3, KEY("NAME"), find_var("some_name"), 
		KEY_LOGICAL("AND"), KEY("SHOESIZE"), 10.5, "Register_1");

and the person is going to laugh you in the face!

   You *don't* want to build *everything* pereceivably useful into the
compiler, or add syntax for it. There are excellent (well...) 4G
compilers that generate C code, and they don't mind generating code
that uses pointers one itsy bit.

 >> [...] or how to create a channel program in a mainframe environment.

 > Oh?  You can do that in C?

   Certainly, with data structures and pointers. Although on IBM
machines it's more commonly done in assembler, which compared to C is
totally unintelligible and structureless. I much rather maintain a
program written in C than ASSEMBLER XF.

 > Once you've learned a language it _is_ kind of like a trap.  You begin
 > to see only how _that_ language works and not how to solve your
 > _actual_ problems at all.

   I am very wary of falling into that trap. But I'm not obsessed with
syntax or a clear correspondence between syntax and semantics -
semantics is most important to me. To a student or application
programmer a clear correspondence between syntax and semantics is
probably of more importantance than it is to me. If C doesn't allow me
to do what I need to do, then I'd ditch C and go dig up an assembler.
Which is a rare occurence indeed, but I doubt it would be less
frequent without pointers or some address data type.


 > Our organization is currently switching _TO_ C/UNIX from something
 > else.  All the trouble you predict is indeed upon us - the retraining,
 > the expense, the incidental blunders along the way.

   Perhaps C/UNIX was a bad choice then. Why C? What kind of software
did you port?

 >>  >> [...]
 >>  >> 	4. Implement malloc()/free().
 >> 
 >>  [... I said standard C can't do it ...]
 >>
 >>    No doubt you're correct. Implementation is fairly trivial in
 >> "nonstandard" C, and I fail to see how it could be made easier or more
 >> "defined" without any pointers (i.e. explicit object addresses) at all?

 > Actually, the implementation is quite trivial. period. I don't see why
 > we have to mung-up the language design to do something which should be
 > done in assembly for efficiency anyway.

   I don't think the efficiency issue is all that important.
Portability and maintainability is of greater economical concern -
it's penny-wise to code RTLs and kernels in assembler to gain 15%
speed, while it'll take at least twice as long, be twice as hard to
maintain, and not portable for shit.

   As a customer, you're stuck with a single vendor who can literally
rip you off with upgrades and patches, i.e. make you even more
dependent on half-working assembler code.

---

   Regarding the example I posted, against the importance of strings.
I just like to add that it was not an argument for pointers, only
against the often stated importance of strings as a data type. In
fact, the program would probably have been much easier to write
*without* explicit pointers, in Common Lisp for instance. Unfortunately,
that was not an option.

brendan@batserver.cs.uq.oz.au (Brendan Mahony) (09/26/90)

jlg@lanl.gov (Jim Giles) writes:

>Oh, well.  If it's a question of side-effects.  I oppose them outright.
>As a practical matter, you can convince most programmers that operators
>should not have side-effects.  But, when it comes to functions, they
>always demand to be allowed side-effects.

Do they really? Why in the world do they want to make their life so
difficult? I don't see what side-effects in functions can do that tuple
valued expressions can't do, execept make the code unreadable and
impossible to reason about. For example if

	f : int -> int;

	a := f(b);

also updates c and d what is wrong with using a tuple valued function,

	f : int x int x int -> int x int x int;

	(a,c,d) := f(b,c,d);

If that takes too much typing for you why not

	#define (A):=F(B) (A,c,d) := f(B,c,d)

Even allowing functions to look at global state variables leads to confusion,
letting them change them means you don't have a function and terms are
not terms. What is the point of such confusion?
--
Brendan Mahony                   | brendan@batserver.cs.uq.oz       
Department of Computer Science   | heretic: someone who disgrees with you
University of Queensland         | about something neither of you knows
Australia                        | anything about.

mickey@ncst.ernet.in (R Chandrasekar) (09/27/90)

In article <3114.26f57247@cc.helsinki.fi> pirinen@cc.helsinki.fi writes:

>Where does this idea of C-hackers come from, that only novices need
>safety?  I'm no novice (10 years of programming), and I want all the
>safety I can get.  I'm sick and tired of debugging for hours to find
>simple errors that could have been caught at the expense of a few
>seconds of the compiler's time.  Programmers are not machines, even good
>programmers make simple mistakes.

I agree completely. In fact, it is the more experienced programmers
who need to be 'protected' -- they are more likely to be writing
bigger applications, and many of them might be over-confident
with their prowess with a programming language.

My complaint is not neccessarily with C - it is with any language
which provides 'flexible' ways to goof.

C-philes say that a variety of syntactic problems could be trapped
with tools such as lint. But hardly anyone uses lint ot lint-like
programs (usual comments:"lint gives too many vague messages" etc
etc).

The smart programmer is one who uses safe programming practices,
perhaps a layer of code over the basic language, to achieve what is
required.

>Pekka P. Pirinen   University of Helsinki
>pirinen@cc.helsinki.fi  pirinen@finuh.bitnet  ..!mcvax!cc.helsinki.fi!pirinen

 -- Chandrasekar
______________________________________________________________________
R Chandrasekar, National Centre for Software Technology, 
Gulmohar Cross Rd No. 9, Juhu, Bombay 400 049,INDIA
E-mail : mickey@ncst.ernet.in  OR  mickey@ncst.in
______________________________________________________________________

peter@ficc.ferranti.com (Peter da Silva) (09/29/90)

In article <5006@uqcspe.cs.uq.oz.au> brendan@batserver.cs.uq.oz.au writes:
> Even allowing functions to look at global state variables leads to confusion,
> letting them change them means you don't have a function and terms are
> not terms. What is the point of such confusion?

OK, how do you implement the function "rnd", which returns a random
number, without letting it have side effects?
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

asylvain@felix.UUCP (Alvin E. Sylvain) (09/29/90)

Say, I'd like to suggest that these articles on the sins of C be
posted to comp.lang.c and/or comp.std.c.  Then, let's get back on
track to this newsgroup's actual charter.

I don't have the actual charter in front of me, but it seems to me
that 'society' and 'futures' kinda sez it all ... and doesn't include
discussion of whether pointers in C are good, bad or indifferent.

What is the *future* of computing, and how will it impact *society*.

Thanks mucho.
------------------------------------------------------------------------
"I got protection for my    |               Alvin "the Chipmunk" Sylvain
affections, so swing your   |   Natch, nobody'd be *fool* enough to have
bootie in my direction!"    |   *my* opinions, 'ceptin' *me*, of course!
-=--=--=--"BANDWIDTH??  WE DON'T NEED NO STINKING BANDWIDTH!!"--=--=--=-
--
------------------------------------------------------------------------
"I got protection for my    |               Alvin "the Chipmunk" Sylvain
affections, so swing your   |   Natch, nobody'd be *fool* enough to have
bootie in my direction!"    |   *my* opinions, 'ceptin' *me*, of course!
-=--=--=--"BANDWIDTH??  WE DON'T NEED NO STINKING BANDWIDTH!!"--=--=--=-

brendan@batserver.cs.uq.oz.au (Brendan Mahony) (10/01/90)

My line:
-> Even allowing functions to look at global state variables leads to confusion,
-> letting them change them means you don't have a function and terms are
-> not terms. What is the point of such confusion?

peter@ficc.ferranti.com (Peter da Silva) writes:
>OK, how do you implement the function "rnd", which returns a random
>number, without letting it have side effects?

Fine use a tuple valued function

function rnd (oldseed : integer) -> (newseed, ran : integer)

begin
newseed := ...
ran := ...
end

The function would be used in code 

(seed, ran) := rnd(seed);

But seriously folks what you really want is a procedure. Note that this
function (which makes explicit the action of the of the operation)
cannot (easily) be used as an integer term, and must be accompanied with
an update to seed for the whole thing to work properly. rnd is not an
integer term, its purpose is not solely to define the value of an
integer. Why then do you want to include rnd in the grammar of integer
terms? Is it just to save a few keystrokes in the initial coding? Pretty
silly given that this is such a small part of the software cycle. The
idea is not conciseness but clarity! Side effects in rnd may not worry
you, but that is only because everyone knows that it must have side
effects. If it was not as well known it would be easy to overlook the
fact that a function called rnd, appearing deep in some complicated
expression, actually goes and plays with global variables.

I think it is very worthwhile to have a clear seperate notion of 

	term: expression defining a value

and not to have to worry about any side effects when reading the code,
and also when using the function. The second point is important, for my
function an expression like

	4*second(rnd(seed)) + 3-second(rnd(seed))

has a well defined meaning. For a side effect rnd it may not be as clear
what the result is. At the very least order of evaluation becomes
important. I know we don't care with rnd but we often will. Note also that
the compiler also knows for sure that it will only have to evaluate
rnd(seed) once for this expression, if side-effects are possible this
optimisation becomes very difficult to determine. For instance it is not
even clear that two references to the same variable will yield the same
result. Concider

	seed*rnd + seed

There are several possible evaluation strategies for this expression,
all yield different results. Are the few keystrokes saved worth the
extra complication?

Can't think of any more problems at the moment, but I am sure they are
there.

--
Brendan Mahony                   | brendan@batserver.cs.uq.oz       
Department of Computer Science   | heretic: someone who disgrees with you
University of Queensland         | about something neither of you knows
Australia                        | anything about.

aahz@netcom.UUCP (Dan Bernstein) (10/01/90)

Okay, I hate sounding like an ignoramus, but just WHERE do you get the 
ability to return tuples?

brendan@batserver.cs.uq.oz.au (Brendan Mahony) (10/01/90)

aahz@netcom.UUCP (Dan Bernstein) writes:

>Okay, I hate sounding like an ignoramus, but just WHERE do you get the 
>ability to return tuples? 

Not sure what you mean. Are you questioning the theoretical possibility
or are you simply telling us that this facility does not exist in C? It
does exist in some (functional) languages and is a simple extension to a
procedural languages run-time stack conventions. If your problem is the
second then I think you have lost this thread as we are discussing the
inadequacies of C, and other "industrial" programming languages.

--
Brendan Mahony                   | brendan@batserver.cs.uq.oz       
Department of Computer Science   | heretic: someone who disgrees with you
University of Queensland         | about something neither of you knows
Australia                        | anything about.

reg@lti2.UUCP (Rick Genter x18) (10/01/90)

> OK, how do you implement the function "rnd", which returns a random
> number, without letting it have side effects?

This is trivial.  Pass in the initial seed; rnd() must return the new
seed as well as the random number (some algorithms may allow the new seed
to be the random number).

Let us get the discussion back to *futures*.
					- reg
---
Rick Genter					reg%lti.uucp@bu.edu
Language Technology, Inc.

peter@ficc.ferranti.com (Peter da Silva) (10/01/90)

In article <5049@uqcspe.cs.uq.oz.au> brendan@batserver.cs.uq.oz.au writes:
> But seriously folks what you really want is a procedure. Note that this
> function (which makes explicit the action of the of the operation)
> cannot (easily) be used as an integer term, and must be accompanied with
> an update to seed for the whole thing to work properly.

Precisely.

> rnd is not an
> integer term, its purpose is not solely to define the value of an
> integer. Why then do you want to include rnd in the grammar of integer
> terms?

Why do you assume that the algebra you find useful for the basis of a
programming language is the same algebra that I find useful for the
basis of a programming language?

> Is it just to save a few keystrokes in the initial coding? Pretty
> silly given that this is such a small part of the software cycle. The
> idea is not conciseness but clarity!

Suppose one is implementing an algorithm taken from the literature. Would
it not be clearer to use the same syntax as that in the source document?

(this is beginning to sound like the famous GOTO debates. If so, GOTO
 alt.flame).

Suppose one is working with a database library, then. How about the following
code:

	struc3 = join(struc1, struc2, key_info);

This the clearest way of expressing this, yet join() modifies all sorts
of global state. Even if struc1 and struc2 are local, the new struc3 has
to be allocated... now you're modifying the heap. If the structs are in
external files you've done lots of global operations to read and write
parts of the files. Yet it is desirable to implement the database library
with this sort of interface... it's clearer.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

jlg@lanl.gov (Jim Giles) (10/03/90)

From article <5006@uqcspe.cs.uq.oz.au>, by brendan@batserver.cs.uq.oz.au (Brendan Mahony):
> jlg@lanl.gov (Jim Giles) writes:
> 
>>Oh, well.  If it's a question of side-effects.  I oppose them outright.
>>As a practical matter, you can convince most programmers that operators
>>should not have side-effects.  But, when it comes to functions, they
>>always demand to be allowed side-effects.
> 
> Do they really? Why in the world do they want to make their life so
> difficult? I don't see what side-effects in functions can do that tuple
> valued expressions can't do, execept make the code unreadable and
> impossible to reason about. [...]

I quite agree with what you said.  I don't think that functions
need to be 'tuple valued' - they just need to be able to return
data of _any_ type, including user defined types.  This way, if a
tuple result is needed, the function would be defined as returning
values of some 'record' type (said type describes the tuple).

However, tuples don't solve the function side-effect problem.  There are
three distinct ways a function might have a side-effect: 1) the function
may modify its arguments (which is what the tuple idea would eliminate the
need for); 2) the function may perform I/O or modify global data; 3) the
function may contain internal context which causes its return value to
be dependent on the number of times it's called or the order it recieves
arguments.

Now, some people may disagree with the last two points above and refuse
to use the term 'side-effect' for these features.  I don't want to argue
about it.  From a practical standpoint, all three types of side-effects
inhibit optimization in exactly the same ways.  Such calls can't be
reordered, they can't be eliminated (if they happen to have the same
argument values), and they can't run in parallel.

In any case, the user community seems to regard the ability to have
functions with side-effects as indispensable (and, at least with regard
to random number generators, I tend to agree).  This means that the
language designer must think about this issue very carefully.  I still
support allowing functions to have side-effects - but only if the nature
of their side-effects are clearly described in the 'function prototype'
or 'interface' or whatever you decide to call it.

J. Giles

jlg@lanl.gov (Jim Giles) (10/03/90)

From article <14197@netcom.UUCP>, by aahz@netcom.UUCP (Dan Bernstein):
> Okay, I hate sounding like an ignoramus, but just WHERE do you get the 
> ability to return tuples? 

This _is_ comp.society.futures you know.  You will, I hope, get
the ability in future languages.  Nemesis has them.

By the way Dan, I still have a lot of back mail from you I need
to answer.  Don't assume silence is agreement!

J. Giles

jlg@lanl.gov (Jim Giles) (10/03/90)

From article <151675@felix.UUCP>, by asylvain@felix.UUCP (Alvin E. Sylvain):
> [...]
> I don't have the actual charter in front of me, but it seems to me
> that 'society' and 'futures' kinda sez it all ... and doesn't include
> discussion of whether pointers in C are good, bad or indifferent.
> [...]

On the contrary, in a discussion about future language design, this is
a quite appropriate topic.  It is my contention that future languages
shouldn't have pointers at all.  Not just no C-like pointers, none at
all.  I just picked on C as the most unpleasant example of what I'm
against.

J. Giles

brendan@batserver.cs.uq.oz.au (Brendan Mahony) (10/03/90)

jlg@lanl.gov (Jim Giles) writes:

	[a good exposition of the nature of side effects]

>In any case, the user community seems to regard the ability to have
>functions with side-effects as indispensable

It seems to me that this attitude could reasonably be likened to the
antiquated European belief that regular bathing was dangerous to the
health. Indeed we now believe that bathing is good for the health
provided the water supply is clean and hygenic. The percieved need for
side-effects in terms is merely a by-product of the poor state of language
design, and would not be missed at all in better languages.

The purpose of terms is to define values. In serving this purpose they
need above all to be UNAMBIGUOUS. Terms with side effects are ambiguous.

>This means that the
>language designer must think about this issue very carefully.  I still
>support allowing functions to have side-effects - but only if the nature
>of their side-effects are clearly described in the 'function prototype'
>or 'interface' or whatever you decide to call it.

Actually to do this you will need to specify the local/global state
being side effected by the function. Since this is required why not
"officially" include it in the interface?

>(and, at least with regard to random number generators, I tend to agree).

I have discussed random number generators elsewhere, but I do not see
how using a procedure instead of a function is such a burden that we
must admit ambiguous terms.

--
Brendan Mahony                   | brendan@batserver.cs.uq.oz       
Department of Computer Science   | heretic: someone who disgrees with you
University of Queensland         | about something neither of you knows
Australia                        | anything about.

bzs@WORLD.STD.COM (Barry Shein) (10/03/90)

>Okay, I hate sounding like an ignoramus, but just WHERE do you get the 
>ability to return tuples? 

You mean what language supports this feature? Common Lisp, it's in the
Guy Steele book. You need all sorts of additional operators to support
it like a parallel assignment statement.

The (more than) rumor I heard was that Symbolics successfully lobbied
to have multiple-value-return put into the common lisp standard
because there was something about their hardware that made this very
desireable (I think it was that the top of their return stack, the
first 256 bytes, was made of very fast stuff mapped into memory.)

So the whole thing may have been a hoax (as far as any abstract
motivations were concerned.)

It didn't really add anything useful to the language that people
hadn't been doing for decades by returning a list (it is a LISt
Processor..., but that's the same basic crew that added hunks and
hasharrays and all sorts of other non-lists, some useful, some sorta
dumb.)

        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs@world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

nevin@igloo.scum.com (Nevin Liber) (10/04/90)

[I added comp.lang.misc to the newsgroup list; please follow-up to the
appropriate newsgroup only.]

In article <64618@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

>It is my contention that future languages
>shouldn't have pointers at all.  Not just no C-like pointers, none at
>all.  I just picked on C as the most unpleasant example of what I'm
>against.

I really hate to agree with you Jim :-), but I'm beginning to think
that you are right.  The only real argument I can see _for_ having
pointers is efficiency; more specifically, to help in
hand-optimisation.  Extensions to C such as C++ are showing that
pointers aren't needed nearly as much as they use to be; heck, code
seems to be more readable w/o them.  In languages such as Icon and
LISP I find that I don't even miss them.
-- 
	NEVIN ":-)" LIBER
	nevin@igloo.Scum.com  or  ..!gargoyle!igloo!nevin
	(708) 831-FLYS
California, here I come!	Public Service Announcement: Say NO to Rugs!

nevin@igloo.scum.com (Nevin Liber) (10/04/90)

[I added comp.lang.misc to the list of newsgroups; please follow-up to
the appropriate newsgroup ONLY.]

In article <5088@uqcspe.cs.uq.oz.au> brendan@batserver.cs.uq.oz.au writes:

>The percieved need for
>side-effects in terms is merely a by-product of the poor state of language
>design, and would not be missed at all in better languages.

I disagree.  This would throw out all functions which maintain their
own state (eg: i/o).  Heck, you might ask why we even have variables?
Even the LISP community gave into this as being a helpful programming
technique.
-- 
	NEVIN ":-)" LIBER
	nevin@igloo.Scum.com  or  ..!gargoyle!igloo!nevin
	(708) 831-FLYS
California, here I come!	Public Service Announcement: Say NO to Rugs!

jlg@lanl.gov (Jim Giles) (10/05/90)

From article <5088@uqcspe.cs.uq.oz.au>, by brendan@batserver.cs.uq.oz.au (Brendan Mahony):
> [...]
> It seems to me that this attitude could reasonably be likened to the
> antiquated European belief that regular bathing was dangerous to the
> health. Indeed we now believe that bathing is good for the health
> provided the water supply is clean and hygenic. [...]

A very good analogy, and one which is also directly applicable to the
issue of pointers in languages.

> [...]                                           The percieved need for
> side-effects in terms is merely a by-product of the poor state of language
> design, and would not be missed at all in better languages.

This, however, is not clear.  Mathematical notation for random variates
existed before computing languages.  The variates were always of the same
form as ordinary variables.  In fact, that is what I think the use of
random generators should look like.  What you are suggesting is that
the programmer should alter his notation to suit the language, not
the other way around.  I think that it is the language that should
cater to the desires of the programmer.

> [...]
> The purpose of terms is to define values. In serving this purpose they
> need above all to be UNAMBIGUOUS. Terms with side effects are ambiguous.

No, they aren't ambiguous (necessarily).  As long as the order of
evaluation of such terms are predictable and the nature of the side-
effect of each is known, their use is completely unambiguous.  Now,
in languages like C (where the order of evaluation is not specified
and the nature of the side-effects needn't be declared), the feature
is indeed quite ambiguous.

If all side-effects that a function might cause are clearly defined
in the function interface, the compiler can then generate code so that
the side-effects are evaluated in a fixed order with respect to other
functions (or local assignments) which have side-effects on the same
objects.  This is a completely unambiguous solution - and functions
that _don't_ have side-effects can still be optimized fully.

> [...]
> Actually to do this you will need to specify the local/global state
> being side effected by the function. Since this is required why not
> "officially" include it in the interface?

Indeed.  The interface should contain a complete list of global variables
that it modifies, the specific arguments that it modifies should be
identified, and the function declaration should specify if it has any
internal, time-dependent, state.

> [...]
>>(and, at least with regard to random number generators, I tend to agree).
> 
> I have discussed random number generators elsewhere, but I do not see
> how using a procedure instead of a function is such a burden that we
> must admit ambiguous terms.

One of the problems with language design is that most designers are not
in daily touch with their potential user base.  How much of a burden
the prohibition of side-effects in functions would pose is not for
the language designer to say.  I know mathematical/scientific users
who would deem it a considerable burden indeed.

Note: the requirement of explicit declarations of all side-effects,
especially if backed up by the loader which can check that such
assertions are true, would actually decrease the use of side-effects
on its own.  After all, who would go to such trouble unless it was
outweighed by the trouble of avoiding it?  This indeed, would be
a good empirical test of your claim that side-effect free functions
aren't a burden - how much trouble are the users willing to go to
in order to still have them?

J. Giles

jbickers@templar.actrix.co.nz (John Bickers) (10/05/90)

Quoted from - jlg@lanl.gov (Jim Giles):
> On the contrary, in a discussion about future language design, this is
> a quite appropriate topic.  It is my contention that future languages
> shouldn't have pointers at all.  Not just no C-like pointers, none at

    Perhaps future languages should control pointers in new and more
    fascinating ways, rather than do away with them altogether. This seems
    similar to the argument about goto, except with less basis in reality,
    and goto is still with us.

    Look for ways to improve on what seems to be deficient, rather than
    ban it altogether. Since the usefulness of pointers seems to be a
    matter of judgement (I intensely dislike the ideas of Pascal or BASIC
    string constructs), it'd probably be more useful to look at improving
    "lint"s, having seperate languages for seperate applications, and so
    on.

> J. Giles
--
*** John Bickers, TAP, NZAmigaUG.         jbickers@templar.actrix.co.nz ***
***          "All I can do now is wait for the noise." - Numan          ***

brendan@batserver.cs.uq.oz.au (Brendan Mahony) (10/05/90)

Me:

-> [...]                                           The percieved need for
-> side-effects in terms is merely a by-product of the poor state of language
-> design, and would not be missed at all in better languages.

jlg@lanl.gov (Jim Giles) writes:

>This, however, is not clear.  Mathematical notation for random variates
>existed before computing languages.  The variates were always of the same
>form as ordinary variables.

	I can't remember my prob theory too well. Is it true that no
	distinction is made between integer variables and integer random
	variables? I have a vague feeling that random variables
	represented sequences of values, you could talk about the
	average of a random variable and that sort of thing?
	
	Perhaps what is required is a more sophisticated data structure?
	How about a non-deterministic choose operator? Use a special
	notation in the special case, don't force people to have to
	worry about non-determinism all the time!
	Possible suggestion, declare a random variable

		r : seq of (1..10) | "normally distributed"
	
	now when you want a random number use "choose(r)".

>In fact, that is what I think the use of
>random generators should look like.  What you are suggesting is that
>the programmer should alter his notation to suit the language, not
>the other way around.  I think that it is the language that should
>cater to the desires of the programmer.

	Look the programmer is not the only person who has to cope with
	the code the is written. The programmer may well be the person
	who spends the least amount of time trying to understand the
	stuff. The idea should be to produce code that is easily
	comprehensible, rather than easily written. Included in that
	criteria should be the ability to easily reason about the
	behaviour of the code. I would agrue against global variables in
	both procedures and functions on the grounds of
	comprehensibility. Procedures have the mediating factor that
	their syntactic intent is to change program state. Side effects
	in terms deny the syntactic intent of terms, which is to define
	a value.

	The rest of your article gives a reasonable way of formalising
	the action of side effects. The general gist of it seems to be
	that to understand the "meaning" of a term with side effects you
	must break its evaluation down to a set of state changes, and
	determine the sequence of this actions. If this activity is
	required to make the code readable it should be reflected in the
	code.

--
Brendan Mahony                   | brendan@batserver.cs.uq.oz       
Department of Computer Science   | heretic: someone who disgrees with you
University of Queensland         | about something neither of you knows
Australia                        | anything about.

jlg@lanl.gov (Jim Giles) (10/06/90)

From article <5116@uqcspe.cs.uq.oz.au>, by brendan@batserver.cs.uq.oz.au (Brendan Mahony):
> [...]
> 	Look the programmer is not the only person who has to cope with
> 	the code the is written. The programmer may well be the person
> 	who spends the least amount of time trying to understand the
> 	stuff. The idea should be to produce code that is easily
>       comprehensible, rather than easily written. [...]

All the more reason to use the conventional terminology and notation
rather than force the user to conform to some purist's idea of what
should be allowed by a programming language.  My experience talking
to large-scale users of such features is that they would be quite
willing to spen considerable effort in the declaration of a random
generator in order tha the _use_ of the thing retain conventional
properties.  For example, say I want a triangular probability
distribution.  The following two codes are examples of your style
and mine:

Yours:
      qran(z,seed)
      qran(x,seed)
      tri_dist = z-x

Mine:

      tri_dist = ranf - ranf
or:
      tri_dist = ranf() - ranf()

Note, my experience is that the second of my forms (with the explicit
denotation that ranf is a function call) is quite acceptable to users
while your form is usually not.

> [...]                                             Included in that
> 	criteria should be the ability to easily reason about the
>       behaviour of the code.  [...]

Given the rules of the language and a clear declaration of the fact
that ranf() has side-effects, the forms I gave are susceptable to
reasoning _identically_ well compared to your proposed form.  The
ability to reason about programs is impossible without knowledge of
the language's rules - but it should be equally possible in any
two well defined languages.

> [...]                    I would agrue against global variables in
> 	both procedures and functions on the grounds of
>       comprehensibility.  [...]

And I would argue in favor of global variables for the same reason.
I find procedures with large and complicated calling sequences to
be quite incomprehensible.  Further, having to pass a data structure
around through the calling sequence because it represents information
which is shared by low-level routines I find appalling.

For example, a simulator of a helicopter might have three routines (all
deep in the call chain) which need the data structure describing the
tail rotor: the power routine needs to know the state of the rotor to
compute the power required to drive it, the structure routine needs to
know the stresses the tail rotor is producing, and the aerodynamics
routine needs to know how much torque the rotor is imparting into the
air.  Clearly, these functionalities are completely separate in the
simulation - so you don't want to combine the three routines into
one multi-purpose routine.  But, you also don't want to have to force
the rest of the program to carry around the tail rotor data which you
don't want anything except the three low-level routines to be able
to change or examine.  The problem is, your data is 'helicopter shaped'
but your program's procedure call chain is roughly tree shaped.  By
depriving the programmer the use of global data, you deprive him of
the ability to partition his data into manageable small pieces which
are imported only by those routines which actually use them.

Now, of course, global data can be misused.  I have seen some programs
which deliberately import _all_ global variables into every routine.
This means that you have no means of determining where a given data
item might be used or changed.  However, carefully used, global data
can improve the comprehensibility of programs by isolating the data
to those routines which actually need it and guaranteeing that all
other routines will keep their electronic hands off.

> [...]                     Procedures have the mediating factor that
> 	their syntactic intent is to change program state. Side effects
> 	in terms deny the syntactic intent of terms, which is to define
> 	a value.

I agree that this constraint makes the analysis of expressions much
simpler.  This is why I advocate explicit declaration of all side-effects
that a function may produce - so that side-effect free expressions, (the
majority) can be analysed in this simple way.  Procedures which _have_
side-effects may make the program easier to analyse in other ways
and at other levels than the expression level.  I think that the user
should be the one to decide which is most important to him.

> [...]      to understand the "meaning" of a term with side effects you
> 	must break its evaluation down to a set of state changes, and
> 	determine the sequence of this actions. If this activity is
> 	required to make the code readable it should be reflected in the
> 	code.

Exactly.  But this usually need not be such a burden as you seem to
think.  For random number generators for example, all that's needed is
an attribute on the interface specification to the effect that it has
side-effects. (The language I'm designing presently has the rather
fanciful term 'fickle' for this property: a random number generator is a
fickle function.  Before we actually release the language to outside
users we will probable switch to some more dignified or techie type of
word.  I don't know though - look through your thesaurus some time to
see if you can find a better word - we couldn't.)  In any case, a
'fickle' function must be regarded as having some internal state that
causes its value to be different from call to call - even if the same
arguments are sent (or no arguments at all in the case of random number
functions).

By the way, with your attitude toward side-effects, you must dislike
C even more than I do.  I thought I was the most anti-C person on the
net.  Maybe not.

J. Giles

peter@ficc.ferranti.com (Peter da Silva) (10/08/90)

In article <2883@igloo.scum.com> nevin@igloo.UUCP (Nevin Liber) writes:
> pointers aren't needed nearly as much as they use to be; heck, code
> seems to be more readable w/o them.  In languages such as Icon and
> LISP I find that I don't even miss them.

Last time I checked the primary data objects in Lisp were the symbol and
the pointer. (oh sure, a DOTPR is a constrained pointer (well, pointer
pair)... but when it can in principle point to any data or code object
it's just as dangerous as pointers in C. What makes it safe is the limited
types of the objects it can point to: other pointers or symbols. It can't
point to the second part of a DOTPR, or into a primitive, or the middle
of a symbol).
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

dmiller@YODA.EECS.WSU.EDU (10/11/90)

Just a minor clarification, you should probably be saying "no programmer-
visible pointers" rather than "no pointers."  An unqualified "no pointers"
can be read to imply that a particular implementation would not use pointers.
This would, however, be dependent upon the underlying architecture of the
target machine and other implementation specific considerations.

--DLM

**********************************************************************
David L. Miller                     Internet: millerd@prime.tricity.wsu.edu (?)
Systems Analyst                           or  dmiller@yoda.eecs.wsu.edu
WSU Tri-Cities                        Bitnet: MILLERD@WSUVM1
100 Sprout Rd.                         Voice: (509) 375-9245 or 375-3176
Richland, WA 99352                  >>>>>>>>>>> Support the FSF <<<<<<<<<<<

jlg@lanl.gov (Jim Giles) (10/11/90)

From article <9010102226.AA16028@yoda.eecs.wsu.edu>, by dmiller@YODA.EECS.WSU.EDU:
> Just a minor clarification, you should probably be saying "no programmer-
> visible pointers" rather than "no pointers."  An unqualified "no pointers"
> can be read to imply that a particular implementation would not use pointers.

I have been saying that all along.  The word "pointer" or the phrase
"explicit pointer" means "a variable whose _value_ is an address".
It is my contention that no high-level language should have or
should need such a data type.

What the implementation does internally is the compiler writer's decision.
I assume that the people who champion GOTO free languages don't object
to the compiler generating jump instructions internally to support IFs
and SELECT/CASE, etc..

J. Giles

asylvain@felix.UUCP (Alvin "the Chipmunk" Sylvain) (10/13/90)

In article <2884@igloo.scum.com> nevin@igloo.UUCP (Nevin Liber) writes:
::[I added comp.lang.misc to the list of newsgroups; please follow-up to
::the appropriate newsgroup ONLY.]

Which is the appropriate newsgroup?  If comp.lang.misc is not appropriate,
why'd you add it?  Why do I get the feeling that no one in comp.society.futures
was or is or will be interested?  Anywho ...

::In article <5088@uqcspe.cs.uq.oz.au> brendan@batserver.cs.uq.oz.au writes:
::
::>The percieved need for
::>side-effects in terms is merely a by-product of the poor state of language
::>design, and would not be missed at all in better languages.
::
::I disagree.  This would throw out all functions which maintain their
::own state (eg: i/o).
[...]

Nope.  C (at least) allows for variables to be 'static'.  No need
for side-effects to maintain the function's internal state.

Unfortunately, Pascal doesn't suffer from this convenience.  This causes a
programmer to load up more and more junk into the 'program'-level 'var'
declaration, until it's almost as hard to debug as FORTRAN COMMON statements.
Heaven help you if two or more subroutines are both using a global 'IDX'
in different contexts.  There, your argument about side-effects probably holds.

Any post-modern (after Pascal) language allows the declaration of a
variable which is local to the routine, but doesn't change between
invokations.  Side-effects are not necessary for state-maintenance.

Period.
--
=============Opinions are Mine, typos belong to /bin/ucb/vi=============
"We're sorry, but the reality you have dialed is no   |            Alvin
longer in service.  Please check the value of pi,     |   "the Chipmunk"
or pray to your local diety for assistance."          |          Sylvain
= = = = = =I haven't the smoggiest notion what my address is!= = = = = =

peter@ficc.ferranti.com (Peter da Silva) (10/20/90)

In article <152323@felix.UUCP> asylvain@felix.UUCP (Alvin "the Chipmunk" Sylvain) writes:
> Nope.  C (at least) allows for variables to be 'static'.  No need
> for side-effects to maintain the function's internal state.

Those *are* side-effects, since they mean the same function may return
different values on successive calls with the same calling sequence. This
has the same effects on predictably and optimisation as more obvious side
effects.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

cik@l.cc.purdue.edu (Herman Rubin) (10/23/90)

In article <5EJ64J3@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes:
> In article <152323@felix.UUCP> asylvain@felix.UUCP (Alvin "the Chipmunk" Sylvain) writes:
> > Nope.  C (at least) allows for variables to be 'static'.  No need
> > for side-effects to maintain the function's internal state.
> 
> Those *are* side-effects, since they mean the same function may return
> different values on successive calls with the same calling sequence. This
> has the same effects on predictably and optimisation as more obvious side
> effects.

I have yet to see a random number. or pseudo-random number, procedure which
did not exploit this.  The same is true for uses of buffers, reading external
media, etc.  It is also the case when one makes calls by reference, and
uses code to change the values of the arguments.  This even applies to
a subroutine to multiply two matrices.

This means it is the programmer who must decide, and pass on the information
to the compiler, about the side-effects.  Sometimes, but not always, the
compiler can tell by looking at the global code.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)