[comp.arch] Semantics

chris@mimsy.UUCP (Chris Torek) (10/06/88)

In article <993@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>... I'm in favor of defining the
>behavior of every operator in a language on all of its operand set.

Why?

>Since NULL can be stored in a pointer, the actions of all pointer
>operators when applied to NULL should, in my opinion, be defined.

I disagree.

In particular, with regard to the more general statement, if we can
improve the performance of a language on existing architectures by
explicitly leaving certain semantics undefined, should we do so?  The
argumentam [? ablative case anyway] pro is simple: programs run X%
faster.  The argumentam con appears to be that programmers will use
the construct anyway.  So what?  Those programs are then by definition not
portable and may be considered just so many random bits: worthless.
It is up to the programmer, and the buyer of programs, to make sure
that programs in this language do not depend on undefined semantics.

Just because I can use a knife as a screwdriver does not mean that
all knives should also be screwdrivers. . . .
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

steveb@cs.utexas.edu (Steve Benz) (10/06/88)

In article <13889@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <993@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>...
>>Since NULL can be stored in a pointer, the actions of all pointer
>>operators when applied to NULL should, in my opinion, be defined.
>...
>It is up to the programmer, and the buyer of programs, to make sure
>that programs in this language do not depend on undefined semantics.

First off -- not all operations *can* be "defined" -- for instance,
square root of a real number cannot be defined over all real numbers.
But that's mostly due to the definition of "define" which seems
to be in vogue in this discussion.  In fact, square root of a negative
number is defined -- to an error.

The difficulty here is that not all systems recover or recognize errors
in the same way.

HOWEVER!  That should not be considered a factor in this discussion.
Vendors that sell software with semantic errors are selling software
with bugs.

Requiring every machine to detect and recover from semantic errors in
the same way only helps in that bugs will have the same symptoms on
every machine.

I would simply recommend that when vendors test their software, they
turn on compiler switches which enable strict semantic checking
(i.e. turn on stray pointer checking, index checking, 0-divide checking,
negative root checking....)

				- Steve Benz

lee@uhccux.uhcc.hawaii.edu (Greg Lee) (10/07/88)

From article <13889@mimsy.UUCP>, by chris@mimsy.UUCP (Chris Torek):
" In article <993@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
" >... I'm in favor of defining the
" >behavior of every operator in a language on all of its operand set.
" ...
" I disagree.
" 
" In particular, with regard to the more general statement, if we can
" improve the performance of a language on existing architectures by
" explicitly leaving certain semantics undefined, should we do so?  The
" argumentam [? ablative case anyway] pro is simple: programs run X%
" faster. ...

Why would programs run any faster in a language with some undefined
semantics?  To make a comparison, the same programs would have to
run in both versions of the language, and so could not make any
use of the semantics undefined in one of the versions.  So why
then would the definition of semantics cause a program which does
not use it to run slower?  I suppose you could arrange things that
way, but why would you need to, or want to?

"  The argumentam con appears to be that programmers will use
" the construct anyway.  So what?  Those programs are then by definition not
" portable and may be considered just so many random bits: worthless.
" It is up to the programmer, and the buyer of programs, to make sure
" that programs in this language do not depend on undefined semantics.

This appears to assume your conclusion -- that a proper version
of the language may leave syntactically correct constructions undefined.

		Greg, lee@uhccux.uhcc.hawaii.edu

chris@mimsy.UUCP (Chris Torek) (10/07/88)

>In article <13889@mimsy.UUCP> I wrote:
>>... if we can improve the performance of a language on existing
>>architectures by explicitly leaving certain semantics undefined,
>>should we do so?

In article <2472@uhccux.uhcc.hawaii.edu> lee@uhccux.uhcc.hawaii.edu
(Greg Lee) argues with my arguments.  Let me take the last first:
>This appears to assume your conclusion -- that a proper version
>of the language may leave syntactically correct constructions undefined.

All nontrivial languages have this property.

What is 1/0?  0/0?

What sort of rounding modes does the hardware use for floating point?

Ones complement or two?

>Why would programs run any faster in a language with some undefined
>semantics?

Put it this way: if the language defines 0/0 as `runtime error,
code = INDETERMINATE' while `1/0' is `runtime error, code = DIV
BY ZERO', but some hardware has imprecise faults and makes all
divide by zero errors a single code, then to implement the language
correctly on this hardware, we must check every divide before
dividing unless we can prove that neither the numerator nor the
denominator are zero.

>To make a comparison, the same programs would have to
>run in both versions of the language, and so could not make any
>use of the semantics undefined in one of the versions.

This is the thrust behind the `argumentam con'.  Recall that this whole
discussion came up when someone said that the action of *(type *)NULL
in C should be defined: it should do the same thing on every machine.
The only reason to define it is because people use it.  These people,
and their programs, are wrong; but they do exist.  And if you do not
know that a program depends on strcmp((char *)0,"f(")==0, and sell this
program without testing it on something other than a 3B first, then
those who buy it and try to run it on a Sun or a Vax will lose out.
(Assuming *(char *)0 == "f(" is a real example from a real program!)

>So why then would the definition of semantics cause a program which does
>not use it to run slower?

It requires two assertions to hold.  First, the machine's `native'
semantics must not match---easily demonstrated for *(type *)NULL and
for division by zero and for uninitialised local variables and so forth.
Second, and more importantly, it must be sufficiently hard to predict
whether a program does in fact use the `nonnative' semantics.  Again,
this is easily demonstrated for *(type *)NULL and for division by zero
and so forth:

	f()
	{
		float a, b = g() ? 1. : 0.;
		a = 1./b;
	}

Does this program depend on the semantics of 1.0/0.0?  If the
language defines it and your machine does not match the languages
definition, will you have to insert runtime checks?  And if you
must insert runtime checks, will not the program run slower?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (10/07/88)

In article <3481@cs.utexas.edu> steveb@cs.utexas.edu (Steve Benz) writes:

>
>I would simply recommend that when vendors test their software, they
>turn on compiler switches which enable strict semantic checking
>(i.e. turn on stray pointer checking, index checking, 0-divide checking,
>negative root checking....)
>
>
>
>
>
An excellent practice.  There is a wide disparity among different vendor's
compilers with respect to the number of such switches.  Such switches are
very useful.

Another switch that I would like to see added to many compilers is one
found on Cray compilers: truncate.  This switch affects the code
generation phase and masks off the least significant bits of floating
point expressions to any desired significance.  You would be surprised
by how many errors this can catch; it is a very fast way for an engineer
or scientist to check that some change to a code hasn't affected the
stability of the results.

-- 
  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117

lee@uhccux.uhcc.hawaii.edu (Greg Lee) (10/07/88)

From article <13898@mimsy.UUCP>, by chris@mimsy.UUCP (Chris Torek):
"...
" In article <2472@uhccux.uhcc.hawaii.edu> lee@uhccux.uhcc.hawaii.edu
"...

I see.  I was thinking only of cases where definedness is predictable
from the form of an expression -- (non-)orthogonality of instruction
sets.  All the same, I still say the second argument was circular.
		Greg, lee@uhccux.uhcc.hawaii.edu

chris@mimsy.UUCP (Chris Torek) (10/09/88)

In article <16603@shemp.CS.UCLA.EDU> casey@admin.cognet.ucla.edu
(Casey Leedom) writes:
>  As Steve pointed out, what we're probably dealing with is a
>misunderstanding about the usage of the word "define".

Probably true.  Someone wanted a Machine Independent Intermediate
Language to include assertions that would pin down the exact action of
various `undefined' actions (1/0, *(type*)NULL, etc).  I claim that
this is, or should be, unnecessary (regardless of the feasibility of
MIILs).  That these actions are `defined as undefined' is not self
contradictory, but it certainly does make for misunderstandings!

>... whether you get floating point truncation or rounding (and if
>this isn't one of the ANSI C machparam.h defined machine constants,
>it should be (or is that a POSIX standard?)),

This one is in <float.h>; others are in <limits.h>.  The name
<sys/machparam.h> is not in the ANSI standard (wrong directory, and
more important, too many characters---the magic limit is six, not including
the `.h' part).

>  I think that all Steve is asking is that we don't leave any loop
>holes.

Standards have very particular language for this; the key words are
`defined', `implementation defined', and `undefined'.  Defined
behaviour is simply that: integer multiplication of 2 and 3 always
gives 6.  Implementation-defined means that the standard itself does
not say what you get, but that the system must tell you what happens:
-1>>1 might be -1; it might be 32767; it might be something else; but
-1>>1 will always be something predictable (and usually chosen from a
limited set of possibilities).  `Undefined' means that anything can
happen: to borrow one of my own silly examples, maybe the machine
sometimes expodes, sending shrapnel through the computer room.

When I say `leave some semantics undefined', I mean in this last
sense.  We will tell you that *(type *)NULL is unpredictable; we will
not even limit it to (type)0 or `segmentation fault, core dumped'.
If you use it, it is your error.  By making it undefined, we need
not pin down the semantics of *(type *)NULL in a MIIL for that
language.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

henry@utzoo.uucp (Henry Spencer) (10/09/88)

In article <2472@uhccux.uhcc.hawaii.edu> lee@uhccux.uhcc.hawaii.edu (Greg Lee) writes:
>Why would programs run any faster in a language with some undefined
>semantics?  To make a comparison, the same programs would have to
>run in both versions of the language, and so could not make any
>use of the semantics undefined in one of the versions...

But the compiler doesn't necessarily know this, so it may have to take
precautions that in fact are unnecessary but nevertheless slow down the
code.  Checking all pointer dereferences for NULL pointers, for example.
The reason for leaving some semantics undefined is to avoid penalizing
*all* programs for the sake of predictable behavior of the few that bend
the rules.
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bpendlet@esunix.UUCP (Bob Pendleton) (10/10/88)

From article <13889@mimsy.UUCP>, by chris@mimsy.UUCP (Chris Torek):
> In article <993@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>>... I'm in favor of defining the
>>behavior of every operator in a language on all of its operand set.
> 
> Why?

So that I can know what the program is going to do without actually
having to run it. It is simply not possible to test a program on every
machine that it might some day be run on. How can I honestly sell a
program in MIIL or even in source code if I can't give some assurance
that the program will run correctly?

Defining the behavior of an operator on a specific subrange of
operands as a run time error is acceptable, and useful. In specific.
1/0, 0/0, *NULL, and sqrt(-1) should all, in my humble? opinion, cause
specific run time errors

> In particular, with regard to the more general statement, if we can
> improve the performance of a language on existing architectures by
> explicitly leaving certain semantics undefined, should we do so?  

NO!

> The
> argumentam [? ablative case anyway] pro is simple: programs run X%
> faster.  The argumentam con appears to be that programmers will use
> the construct anyway.  So what?  Those programs are then by definition not
> portable and may be considered just so many random bits: worthless.
> It is up to the programmer, and the buyer of programs, to make sure
> that programs in this language do not depend on undefined semantics.

Trying to avoid the religous argument lets look only at economics.

A year of software engineer time costs ~$100,000. Your mileage may
vary. I've heard numbers ranging from $70K to $150K. If I'm paying to
develop code, I can't take the chance that because of a poor language
definition my multimillion dollar program will turn out to be just
random worthless bits on the next generation of computers. Or, even
this generations computers from some other vendor.

For the cost of one year of porting effort I can afford to pay for an
awful lot of extra machine cycles. Not to mention that every year that
passes the cost of the cycles drops and the cost of engineering time
increases. So the argument gets stronger every day.

This argument fails when you start talking about programs that are
only barely possible on existing computers. But, that is the
exception, not the rule.-- 
              Bob Pendleton, speaking only for myself.
An average hammer is better for driving nails than a superior wrench.
When your only tool is a hammer, everything starts looking like a nail.
UUCP Address:  decwrl!esunix!bpendlet or utah-cs!esunix!bpendlet

bpendlet@esunix.UUCP (Bob Pendleton) (10/10/88)

From article <13914@mimsy.UUCP>, by chris@mimsy.UUCP (Chris Torek):
> In article <16603@shemp.CS.UCLA.EDU> casey@admin.cognet.ucla.edu
> (Casey Leedom) writes:
>>  As Steve pointed out, what we're probably dealing with is a
>>misunderstanding about the usage of the word "define".
> 
> Probably true.  Someone wanted a Machine Independent Intermediate
                  ^^^^^^^

Fame is so short lived. Leave the discussion for 48 hours and you
become just an anonymous "someone."

> Language to include assertions that would pin down the exact action of
> various `undefined' actions (1/0, *(type*)NULL, etc).  I claim that
> this is, or should be, unnecessary (regardless of the feasibility of
> MIILs).  That these actions are `defined as undefined' is not self
> contradictory, but it certainly does make for misunderstandings!

Before I go on let me say that I've realized that I've been using the
term portable to mean machine independent. Where portable should mean
something like "can be moved to a new machine with some effort", and
machine independent should mean something like "can be moved to a new
machine with at most a recompilation."

Defined as undefined is a perfectly valid way to define the actions of
an operand on particular subranges, like * on NULL and square root on
negative numbers.  But, from the point of view of trying to make
programs machine independent it is a useless definition. This kind of
definition forces me to test a program on every existing machine
architecture, using every existing compiler for each architecture, and
even on every new release of each compiler, before I can claim that
the program is machine independent. Using a language that has features
that are defined as undefined and limited resources I can have no
assurance that the program is machine independent. I can only hope
that it is portable.

Clearly, I hope, there is a gap between the goals of a MIIL (Machine
Independent Intermediate Language) and a langauge that is defined in
such a way as to be machine dependent. This gap requires that the
machine dependent parts of such a language must be bound to MIIL
dependent definitions before the language can used as a source
language for compilation to MIIL.
-- 
              Bob Pendleton, speaking only for myself.
An average hammer is better for driving nails than a superior wrench.
When your only tool is a hammer, everything starts looking like a nail.
UUCP Address:  decwrl!esunix!bpendlet or utah-cs!esunix!bpendlet

nevin1@ihlpb.ATT.COM (Liber) (10/12/88)

In article <2472@uhccux.uhcc.hawaii.edu> lee@uhccux.uhcc.hawaii.edu (Greg Lee) writes:

>Why would programs run any faster in a language with some undefined
>semantics?  To make a comparison, the same programs would have to
>run in both versions of the language, and so could not make any
>use of the semantics undefined in one of the versions.  So why
>then would the definition of semantics cause a program which does
>not use it to run slower?

The things that usually (never make blanket all-encompassing
statements; uh, I mean, *almost* never ... :-)) need defining are
exceptions to a rule (eg: dereferencing NULL is an exception to the
dereferencing operation).

If exceptions are to be defined, they are usually defined so that they
are consistent with the rest of the rules.  In mathematics, for
example, 1 is defined to not be prime, 0! is defined to be 1, x to the
0 power (x<>0) is defined to be 1, etc., because the rest of the rules
need not make exceptions for these cases (the rules for what a prime
is, what factorial means, what exponentiation means, etc., however, have
these exceptions).

So, given that *NULL is an exception to dereferencing, what would be a
good way to define it?  One way is to allow it to be dereferenced just
like any other pointer.  What does this definition give you?  With this
definition, you never (at the compiler level) have to check to see
whether or not the value of a pointer is NULL.  Now, what does this
definition mean in terms of the semantics of the high-level language?
Since pointers and memory locations are OS/machine/compiler dependent,
the semantics of *NULL are implementation-dependent, and are considered
undefined in terms of the semantics of the language.

Most (if not all) of the other definitions of *NULL on the high-level
(semantics of C) cause more exceptions (which translates into slower
and larger code) to be required on the lower-level (implementation level).
-- 
 _ __		NEVIN J. LIBER  ..!att!ihlpb!nevin1  (312) 979-4751  IH 4F-410
' )  )  "I catch him with a left hook. He eels over. It was a fluke, but there
 /  / _ , __o  ____  he was, lying on the deck, flat as a mackerel - kelpless!"
/  (_</_\/ <__/ / <_	As far as I know, these are NOT the opinions of AT&T.

lee@uhccux.uhcc.hawaii.edu (Greg Lee) (10/13/88)

From article <8916@ihlpb.ATT.COM>, by nevin1@ihlpb.ATT.COM (Liber):
" In article <2472@uhccux.uhcc.hawaii.edu> lee@uhccux.uhcc.hawaii.edu (Greg Lee) writes:
" 
" >Why would programs run any faster in a language with some undefined
" >semantics?  To make a comparison, the same programs would have to
" ...
" like any other pointer.  What does this definition give you?  With this
" definition, you never (at the compiler level) have to check to see
" whether or not the value of a pointer is NULL.  Now, what does this

Hmmm.  I thought the main point here was that you can't check this at
compile time, except occasionally for constant expressions, and so
it would have to be checked at run time, slowing up the program.
		Greg, lee@uhccux.uhcc.hawaii.edu

news@ism780c.isc.com (News system) (10/13/88)

In article <8916@ihlpb.ATT.COM> nevin1@ihlpb.UUCP (55528-Liber,N.J.) writes:
>
>So, given that *NULL is an exception to dereferencing, what would be a
>good way to define it?
> _ __		NEVIN J. LIBER  ..!att!ihlpb!nevin1  (312) 979-4751  IH 4F-410

Might I suggest the definition used in the Pascal Standard.  First, the term
'error' means an error in the source program.  The standard says:  "It is an
error if the pointer-variable of an of an identified-variable denotes a
nil-value".  Or translated in to English, it is an error to dereference a
pointer to no object.

The standard also says: "A complying processor is required to provide
documentation concerning its treatment of errors".  So the way in which a
a processor treats an error is up to the implementer, the only requirement is
the implementor must inform the user.  A user who wants erroneous programs
to behave the same way on all implementations is just plain out of luck.

I hope the proposed C standard says approximately the same thing about
erroneous source programs.

    Marv Rubinstein

news@ism780c.isc.com (News system) (10/14/88)

In article <1000@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>Defined as undefined is a perfectly valid way to define the actions of
>an operand on particular subranges, like * on NULL and square root on
>negative numbers.  But, from the point of view of trying to make
>programs machine independent it is a useless definition. This kind of
>definition forces me to test a program on every existing machine
>architecture, using every existing compiler for each architecture, and
>even on every new release of each compiler, before I can claim that
>the program is machine independent.

Defining it won't make it so.  Even if I wrote programs in a language where
the semantics for every legal syntactical construct was *EXACTLY* specified,
I would still test my program on every combination of machine and compiler
for which I would like to guarantee that my program runs correctly.  For
although I never make mistakes :-), I cannot say the same for the people who
write operating systems and compilers.  (I have even encountered hardware
that does not work as advertised.)

Would anyone guarantee a program distributed in a form that required that the
customer compile it and link it before using it?

   Marv Rubinstein