[comp.lang.misc] JLG's flogging of horses

peter@ficc.uu.net (Peter da Silva) (04/08/90)

One of the reasons C is so popular is that (like Pascal) it's designed to be
easy to implement on a wide class of machines (basically, non-segmented
byte-addressible machines) which happen to be very common (for example, any
personal computer you can name). This "pandering to poor implementors" is
one of it's strengths... not one of it's weaknesses.
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

oz@yunexus.UUCP (Ozan Yigit) (04/08/90)

In article <14307@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:

>I can't see that the model described is particularly appropriate for array
>calculations.

What you can or cannot see is of no interest, nor relevant to this
discussion. [1]

>So, you are allowing the characteristics of some particularly poor 
>implementations to dictate your language design.

Read The Fine Standard instead of speculating on the design described
therein. 

>The _other_ problem here, of course, is that sophisticated users
>often don't care (at least, not at the time) about portability.

You are wrong.

>I can see that it has a subset of a property that might be useful, ie.
>allowing pointers to point anywhere.  I don't find it particularly
>attractive as it is.

See [1].

oz

-- 
The king: If there's no meaning	   	    Interned:  oz@nexus.yorku.ca
in it, that saves a world of trouble        ......!uunet!utai!yunexus!oz
you know, as we needn't try to find any.    Bitnet: oz@[yulibra|yuyetti]
Lewis Carroll (Alice in Wonderland)         Phonet: +1 416 736-5257x33976

anw@maths.nott.ac.uk (Dr A. N. Walker) (04/09/90)

In article <14304@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>Maybe not in C.  Actually, if pointer arithmetic is carried out as integer
>modulo 2^n (like most implementations), then the above expression _is_
>guaranteed to give p.

	What is the difference between "in most implementations, this is
guaranteed to work" and "this is not guaranteed to work"?  Isn't this the
whole portability issue?

>			However, you missed my point.  Suppose you have
>a large array and you alias *p to a small section of it.  In this case,
>incrementing p past the end of the data it points to is still appropriate.

	"*p" aliases just one element, so the point seems to betray a
confusion.  If you select a slice of an array [not a C construct, of
course], then going out of bounds on that slice is, and should be,
still an error:  eg in Algol, after

	[100] int a;
	ref [] int sa = a[10:20];

then "a" is an array with elements a[1], ..., a[100];  "sa" is a slice of
"a" with elements sa[1], ..., sa[11] and sa[1] aliassed to a[10], etc.
Referring to sa[0] or sa[12] is, and should be, simply an error;  if you
wanted "sa" to be larger, you should have declared it that way.

>Why the special dispensation?

	Historical necessity, caused by the prevalence of code like

		for (pa = a; pa < a + SIZE; pa++)
			...;

Such code used to be technically illegal, and some of us avoided it, but
it was too common to be ignored, and too convenient, so it was legitimised.

>				And why just _one_ element past the end?
>Why not N elements past - or is processing arrays with _stride_ something
>C doesn't do?  Why not have a dispensation for addresses before the start
>of the array - or is processing arrays in reverse order something C doesn't
>do either?

	Because (a) on looking at zillions of lines of code, virtually
everything fell into the above paradigm, and the other cases you mention
just didn't happen in practice -- if you scan an array backwards, you tend
to test against "a" rather than against "a-1", so that code is legal;
(b) the cost is ensuring that just one byte past "a" is still a legal
address to the operating system, whereas any further extension requires
at least one further virtual element, which may easily be rather large;
and (c) if you think that a zero-sized array is legal, you get the chance
of a legal pointer to it -- this is a murky area in both C and Algol.

>	     No, to be consistent with your suggested constraint on pointer
>arithmetic, there should be no dispensations at all [...]

	Agreed;  we should all design perfect languages first time.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

gudeman@cs.arizona.edu (David Gudeman) (04/10/90)

In article  <14305@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>From article <19844@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
>> [...]
>> Trivial.  Let A be an array of type t, pA be a pointer of type t, and
>> i be an integer.  All pointers and arrays are stored as a triple <base
>> address, index, max_index> where index = 0 for arrays.
>> [... rules of bounds checking, assignment, etc. ...]
>
>   t A [N] [M];    /* 2d array of type t */
>   t *pA;
>   ...
>   pA = A;
>
>...  So, the
>above assignment should have been:
>
>   pA = (t *) A;
>
>Now, what bounds are associated with pA under your scheme?  N clearly
>doesn't make sense.  But should the bound be M or N*M?  Whichever you
>pick, I'll (sooner or later) want it to be the other.

This is a non-problem.  The expressions "A" returns a triple <address
of A, 0, N*M> and the case does not change that.  If you want the
bound to be M you write pA = A[0] (as someone else has already pointed
out.  I think you are confusing the dynamic triples with the static
types.  The two are unrelated except that the size must be known to
interpret the indexes.  Someone else suggested using three addresses
instead of an address and two indexes, and this makes the triples
even more independent of the types.

>... If your algorithm is manipulating
>an array, it remains more readible, verifyable, and maintainable if
>all the array references use the array syntax.

I'm not sure whether you recognize that the above statement is one of
the basic sources of disagreement.  I claim that the above statement
is not true in general.  In fact I claim the following: (1)
readability is largely a subjective notion, and there will always be
some people who find pointer arithmetic to be more readable and some
who find indexes to be more readable.  (2) verifyability is a tenuous
notion at the present time, and any method of verification that can
not handle pointer arithmetic is inadequate.  (3) maintainability is
independent of which method of array access is used.

>Not only that, the array
>syntax allows your compiler greater freedom in optimization - so your
>code often runs _faster_ as well.  Unfortunately, C performs the array
>to pointer conversion implicitly in the procedure call mechanism.

This is another statement that is wrong.  Compilers can optimize
pointer arithmetic to the same extent as array indexing.  It is
true that the optimization of pointer arithmetic is more difficult,
but a good enough compiler can detect all possible aliases by global
control flow analysis, even with seperate compilation.
-- 
					David Gudeman
Department of Computer Science
The University of Arizona        gudeman@cs.arizona.edu
Tucson, AZ 85721                 noao!arizona!gudeman

jlg@lambda.UUCP (Jim Giles) (04/10/90)

From article <20026@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> [...]
> This is another statement that is wrong.  Compilers can optimize
> pointer arithmetic to the same extent as array indexing.

You can see why this issue never goes away.  People keep asserting the
above as if it were true.  Arrays can be optimized as well as pointers
can, but the reverse is not true without expensive (and rare) global
data flow analysis on the interprocedural level.

> [...]                                                         It is
> true that the optimization of pointer arithmetic is more difficult,
> but a good enough compiler can detect all possible aliases by global
> control flow analysis, even with seperate compilation.

That's just the point.  The _compiler_ can't do any such thing!  That's
the _definition_ separate compilation - the different program units
are unknown to each other at compile time.  The only implementation
so far proposed (on the net anyway) which does interprocedual data
flow analysis in this way has the _loader_ do code generation.

J. Giles

jlg@lambda.UUCP (Jim Giles) (04/10/90)

From article <9765@yunexus.UUCP>, by oz@yunexus.UUCP (Ozan Yigit):
> [...]
>>The _other_ problem here, of course, is that sophisticated users
>>often don't care (at least, not at the time) about portability.
> You are wrong.

Oh?  You have done a careful survey of large numbers of sophisticated users
to back up this claim?  I have worked in computing for over 18 years now
and my job never carries me far from the consulting end of the field.
I can tell you that sophisticated users tend to be 'full contact' programmers.
That is, on a given machine they try to get the last ounce of performance
out of the hardware.  If they have to use non-portable features to do so,
they do.

J. Giles

jlg@lambda.UUCP (Jim Giles) (04/10/90)

From article <1990Apr9.144123.13017@maths.nott.ac.uk>, by anw@maths.nott.ac.uk (Dr A. N. Walker):
> In article <14304@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> [... (p + largeint) - largeint ==0 ...]
->Maybe not in C.  Actually, if pointer arithmetic is carried out as integer
->modulo 2^n (like most implementations), then the above expression _is_
->guaranteed to give p.
> 	What is the difference between "in most implementations, this is
> guaranteed to work" and "this is not guaranteed to work"?  Isn't this the
> whole portability issue?

I should rephrase.  The above will work on all implementations that I'm aware
of.  Further, I see no reason for the language definition not to _require_
that the above identity be true.  If there is such a reason, you should
enlighten me instead of just claiming that I'm wrong without evidence.

> [...]
->			However, you missed my point.  Suppose you have
->a large array and you alias *p to a small section of it.  In this case,
->incrementing p past the end of the data it points to is still appropriate.
> 	"*p" aliases just one element, so the point seems to betray a
> confusion.  [...]

Perhaps you should get together with the other posters to this net in order
get your stories straight.  Others are claiming that keeping the pointer
as a 3-tuple is a correct implementation of C pointers.  You are claiming
that the pointer is to a single element (so any auxilliary bound information
is innappropriate).  I would agree with your interpretation that a pointer
is an address of a single object.  Pointer arithmetic allows you to move
to the next address (or, by scaled arithmetic, the address after the current
object).  If this next address is not in range, the only constraint should
be that you must not dereference the pointer.  The same goes whether you
increment the pointer by one or by 'largeint'.

> [...]      If you select a slice of an array [not a C construct, of
> course], then going out of bounds on that slice is, and should be,
> still an error:  eg in Algol, after
> 
> 	[100] int a;
> 	ref [] int sa = a[10:20];
> 
> then "a" is an array with elements a[1], ..., a[100];  "sa" is a slice of
> "a" with elements sa[1], ..., sa[11] and sa[1] aliassed to a[10], etc.
> Referring to sa[0] or sa[12] is, and should be, simply an error;  if you
> wanted "sa" to be larger, you should have declared it that way.

I agree.  But here you are referring to _arrays_ not pointers, and your
syntax reflects that fact.  In C (with the appropriate changes in
declaration), it is not only _allowed_ to refer to *(sa+15), it is 
common practice.  In fact, there is no way to limit the 'bound' of the
pointer to anything less than all of array a.

> [...]
->				And why just _one_ element past the end?
->Why not N elements past - or is processing arrays with _stride_ something
->C doesn't do?  Why not have a dispensation for addresses before the start
->of the array - or is processing arrays in reverse order something C doesn't
->do either?
> 	Because (a) on looking at zillions of lines of code, virtually
> everything fell into the above paradigm, [... ie. "for(p=a,p<a+n,p++);" ...]

What 'zillions' of lines were examined, I wonder?  Certainly not much
scientific code.  If you are doing, say, tensor calculations, you often
need to break into column vectors one place, row vectors in another, and
the diagonal in still another.  Computing arrays with stride is a _very_
common occurance.  That's why vector machines usually have stride built
right into the hardware.

> [...]
->	     No, to be consistent with your suggested constraint on pointer
->arithmetic, there should be no dispensations at all [...]
> 	Agreed;  we should all design perfect languages first time.

No.  I don't claim that.  I am willing to give latitude to language
design problems.  But, after 18+ years of evolution (not all of it
backward compatible) there seems to have been plenty of time to correct
the design before ANSI set it into concrete.  Further, this is
comp.lang.misc - we should not lightly assume that C mistakes won't
be carried into the future in other designs.  It is by discussing
them critically and publicly that future language designers can
learn by the mistakes of the past.

J. Giles

jlg@lambda.UUCP (Jim Giles) (04/10/90)

From article <14321@lambda.UUCP>, by jlg@lambda.UUCP (Jim Giles):
> From article <1990Apr9.144123.13017@maths.nott.ac.uk>, by anw@maths.nott.ac.uk (Dr A. N. Walker):
>> In article <14304@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>> [... (p + largeint) - largeint ==0 ...]

Sorry, it's 2:30 AM here and I can't even see these simple mistakes when I
proofread.  The above _should_ have been:
   (p + largeint) - largeint == p

I anticipate a whole raft of insulting remarks about this trivial error
which will all ignore the gist of the issue being discussed.  Oh well,
that's life on the net.

J. Giles

mcdonald@aries.scs.uiuc.edu (Doug McDonald) (04/10/90)

In article <14320@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>From article <9765@yunexus.UUCP>, by oz@yunexus.UUCP (Ozan Yigit):
>> [...]
>>>The _other_ problem here, of course, is that sophisticated users
>>>often don't care (at least, not at the time) about portability.
>> You are wrong.
>
>Oh?  You have done a careful survey of large numbers of sophisticated users
>to back up this claim?  I have worked in computing for over 18 years now
>and my job never carries me far from the consulting end of the field.
>I can tell you that sophisticated users tend to be 'full contact' programmers.
>That is, on a given machine they try to get the last ounce of performance
>out of the hardware.  If they have to use non-portable features to do so,
>they do.
>
>J. Giles


I seldom enter this silly discussion - but this has my political
hackles up. Mr. Giles is right here. Mr. Yigit has his terminology wrong -
the people he refers to as "sophisticated" are better described as
"politically correct" members of the elite CS fraternity, those
inhabiting ivory towers.


Doug McDonald

peter@ficc.uu.net (Peter da Silva) (04/11/90)

> I seldom enter this silly discussion - but this has my political
> hackles up. Mr. Giles is right here. Mr. Yigit has his terminology wrong -
> the people he refers to as "sophisticated" are better described as
> "politically correct" members of the elite CS fraternity, those
> inhabiting ivory towers.

Actually, I think the folks who "need" to get the last fraction of a cycle
out of a machine are the ivory-tower elite. Down here in the trenches we
can't afford to rewrite all our programs for each new machine every software
generation or so. We have to support 68000, 68020, 80286, 80386, VAX,
Unisys 1100, and so on. Who know's what's next... Sparc, MIPS, RIOS,...

Just going from a 68000 to a 68020 or an 80286 to an 80386 changes everything:
all your integers go from 16 bits (most efficient size on a 68000 or 80286)
to 32 bits. And that's just an evolutionary change in one processor family.

Better to write your code portably and take a small performance hit, so you can
go to a new machine tomorrow and quadruple your speed. That mightn't be very
satisfying for manic crystalography freaks, but it pays the bills.
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

jlg@lambda.UUCP (Jim Giles) (04/11/90)

From article <UEU2EUCxds13@ficc.uu.net>, by peter@ficc.uu.net (Peter da Silva):
> [...]
> Better to write your code portably and take a small performance hit, so you can
> go to a new machine tomorrow and quadruple your speed. That mightn't be very
> satisfying for manic crystalography freaks, but it pays the bills.

Finally!  Something I can, in general, agree with.  My own opinion is that
code should be written in an abstract language and the implementation
language only chosen after consideration of the aims of software and
the nature of the algorithm.  A subset of the implementation language
should be chosen which is as portable as possible and you should stick
to that subset until the time comes to meet performance requirements.
NOW - is the time to consider non-portable features (and only if you
carefully document the use and make it clear how to port it).

However, to pretend that sophisticated users _don't_ ever do this last
step is silly.  Perhaps _most_ of the programmer-time spent on large
scientific code consists of this last category.  Either optimizing
on a new system or converting from the old one (two processes which
overlap so much as to be seen as one) occupies a _lot_ of time in
scientific computing.  No surprise, few other computer uses run for
multiple days (weeks) - and the people who pay for the computer
time for such runs are quite eager to shave 2% here and 3% there.

J. Giles

brnstnd@stealth.acf.nyu.edu (04/11/90)

In article <14319@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> From article <20026@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman):
> > but a good enough compiler can detect all possible aliases by global
> > control flow analysis, even with seperate compilation.
> That's just the point.  The _compiler_ can't do any such thing!  That's
> the _definition_ separate compilation - the different program units
> are unknown to each other at compile time.

That isn't true under ANSI C. Everything must work *as if* the program
units were separately compiled; but optimizations don't have to respect
separate compilation.

---Dan

brnstnd@stealth.acf.nyu.edu (04/11/90)

In article <1990Apr10.151040.26800@ux1.cso.uiuc.edu> mcdonald@aries.scs.uiuc.edu (Doug McDonald) writes:
> In article <14320@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> >From article <9765@yunexus.UUCP>, by oz@yunexus.UUCP (Ozan Yigit):
    [ Somebody writes: ]
     [ sophisticated users often don't care about portability ]
> >> You are wrong.
> >Oh?  You have done a careful survey of large numbers of sophisticated users
> >to back up this claim?
> Mr. Giles is right here. Mr. Yigit has his terminology wrong -
> the people he refers to as "sophisticated" are better described as
> "politically correct" members of the elite CS fraternity, those
> inhabiting ivory towers.

Hmmm. I find myself caring about portability most of the time, simply
because I skip around between machines a lot. Does this make me a
``politically correct member of the elite CS fraternity?'' I hope not.

---Dan

jlg@lambda.UUCP (Jim Giles) (04/11/90)

From article <7376:Apr1102:35:5590@stealth.acf.nyu.edu>, by brnstnd@stealth.acf.nyu.edu:
> [...]
> Hmmm. I find myself caring about portability most of the time, simply
> because I skip around between machines a lot. Does this make me a
> ``politically correct member of the elite CS fraternity?'' I hope not.

No. The sophisticated user uses the tool of the language to meet his
particular needs.  If his need is for portability, he uses the tools
differently than if his needs are for speed.  The original claim in
this thread was that sophisticated users _never_ use non-portable
features - which is a load of ....

J. Giles

seanf@sco.COM (Sean Fagan) (04/11/90)

In article <1990Apr10.151040.26800@ux1.cso.uiuc.edu> mcdonald@aries.scs.uiuc.edu (Doug McDonald) writes:
>Mr. Giles is right here. Mr. Yigit has his terminology wrong -
>the people he refers to as "sophisticated" are better described as
>"politically correct" members of the elite CS fraternity, those
>inhabiting ivory towers.

I disagree.  I consider people such as Chris Torek, Doug Gwyn, and Dennis
Ritchie to be "sophisticated users" (of the C language).  As far as I know,
all three try to write portable programs from the start; Doug has some
extremely "machine independent" routines which are rather portable (through
a liberal use of ifdef's, maybe, but they're still very portable).

Actually, the terminology may have been wrong, but not in the way you mean.
"Lazy" may have been a better word.

-- 
-----------------+
Sean Eric Fagan  | "It's a pity the universe doesn't use [a] segmented 
seanf@sco.COM    |  architecture with a protected mode."
uunet!sco!seanf  |         -- Rich Cook, _Wizard's Bane_
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

peter@ficc.uu.net (Peter da Silva) (04/11/90)

In article <14329@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
> However, to pretend that sophisticated users _don't_ ever do this last
> step is silly.  Perhaps _most_ of the programmer-time spent on large
> scientific code consists of this last category.

But, Jim, the term "sophisticated users" doesn't necessarily imply "large
scientific code". Don't mistake the needs of your corner of the world with
everyone else's.
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

oz@yunexus.UUCP (Ozan Yigit) (04/12/90)

In article <14320@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>Oh?  You have done a careful survey of large numbers of sophisticated users
>to back up this claim?

Of course not. I have seen the code they have written. Also, people like
yourself told me so. :-) But than again, it is easy to claim things, and
other people may claim the opposite.

>I have worked in computing for over 18 years now
>and my job never carries me far from the consulting end of the field.

Sigh. Your expertise vs. mine huh ?? Get real, or try tv commercials:
That is where "credibility by illusion" has the most milage. "I am not a real
doctor, but I play one on TV... I think you should use arrays instead of
pointers". ;-)

>I can tell you that sophisticated users tend to be 'full contact' programmers.
>That is, on a given machine they try to get the last ounce of performance
>out of the hardware.  If they have to use non-portable features to do so,
>they do.

That must have been what "sophisticated user" meant 18 years ago. Time to
look around again. :-)

oz

-- 
The king: If there's no meaning	   	    Interned:  oz@nexus.yorku.ca
in it, that saves a world of trouble        ......!uunet!utai!yunexus!oz
you know, as we needn't try to find any.    Bitnet: oz@[yulibra|yuyetti]
Lewis Carroll (Alice in Wonderland)

jlg@lambda.UUCP (Jim Giles) (04/12/90)

From article <_=U28Y8xds13@ficc.uu.net>, by peter@ficc.uu.net (Peter da Silva):
> [...]
> But, Jim, the term "sophisticated users" doesn't necessarily imply "large
> scientific code". Don't mistake the needs of your corner of the world with
> everyone else's.

The same advice is obviously applicable to others on the net.  To be
"sophisticated" is a distinction which is _independent_ of application.
The person who made the claim tha "sophisticated users don't ..."
missed that point.  My point is that, sometimes, sophisticated users
_DO_!

J. Giles