[comp.arch] Double Width Integer Multiplication and Division

crowl@cs.rochester.edu (Lawrence Crowl) (06/27/89)

In article <22218@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
>When we were [designing integer division support on the MIPS processors], we
>couldn't think of any languages whose natural implementation on a 32-bit
>machine would expect generation of a 64 by 32 bit divide as the expected
>operation.  Does anybody know of any such, or is this strictly something one
>would use in handcoded routines?
 
Languages which support fixed-point arithmetic, e.g. PL/I and Ada, could use
such a feature to substantially improve fixed-point performance.  In addition,  languages that support a more general notion of integers, such as Lisp bignums 
or declared ranges, can use them to improve the performance on integers larger
than a single word.  There are not a lot of languages currently that support 
such arithmetic, but newer languages are more likely to.
 
Such support is especially crucial for fixed-point arithmetic.  Fixed-point
arithmetic is often a win in both time and precision over floating point.  I
did a Mandelbrot program using fixed-point (via the 68020 32x32->64 multiply)
that was 50% faster than using the FPA on a Sun 3/260 and 480% (5x) faster
than  using the 68881.  However, such advantages cannot be effectively
realized without language support.  Fixed point is generally not used much
because most languages do not support it.
-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627

pcg@aber-cs.UUCP (Piercarlo Grandi) (06/28/89)

In article <1989Jun26.195044.4197@cs.rochester.edu> crowl@cs.rochester.edu
(Lawrence Crowl) writes:
     
    Languages which support fixed-point arithmetic, e.g. PL/I and Ada, could
    use such a feature to substantially improve fixed-point performance.  In
    addition, languages that support a more general notion of integers, such
    as Lisp bignums or declared ranges, can use them to improve the
    performance on integers larger than a single word. [ .... ]
     
    Such support is especially crucial for fixed-point arithmetic.
    Fixed-point arithmetic is often a win in both time and precision over
    floating point.  I did a Mandelbrot program using fixed-point (via the
    68020 32x32->64 multiply) that was 50% faster than using the FPA on a
    Sun 3/260 and 480% (5x) faster than using the 68881.  However, such
    advantages cannot be effectively realized without language support.

All true, but for language support. There is an enlightening article by
Martin Richards in a a very old Sw.Pract.&Exp. issue (middle seventies I
think) in which he uses a "muldiv()" function (implemented in assembler)
for BCPL that multiplies two single length numbers, giving a double length
result that is divided by a single length number, returning a single length
result. Something like (but more efficiently):

	int muldiv(a,b,c) { return (int) (((long) a * (long) b)/(long) c); }

This is almost enough to make fixed point comfortable in any language, however
inhospitable (BCPL for example has only one length of integer).

Richards describes in his paper having been able thanks to "muldiv()" to
run a realistic flight simulator on a quite slow and small machine without
floating point.

Final remarks on the issue of RISC vs. multiplication/division:

	[1] You are not supposed to write assembler programs on a RISC
	    machine. The compilers have sophisticated algorithms to generate
	    "optimal" multiplication/division out of simpler instructions.

	[2] Fixed point and the "muldiv()" idea to me look very much in the
	    RISC philosphy of minimalism. Fixed point, and multiple precision
	    integer arithmetic, are in many many cases preferable to floating
	    point and its complexities. Too bad that mathematicians are lulled
	    in a false sense of familiarity by floating point's "scientific
	    notation" and apparent support for real numbers.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

bartho@obs.unige.ch (PAUL BARTHOLDI) (06/30/89)

In article <1989Jun26.195044.4197@cs.rochester.edu>, crowl@cs.rochester.edu (Lawrence Crowl) writes:
>>When we were [designing integer division support on the MIPS processors], we
>>couldn't think of any languages whose natural implementation on a 32-bit
>>machine would expect generation of a 64 by 32 bit divide as the expected
>>operation.  Does anybody know of any such, or is this strictly something one
>>would use in handcoded routines?
>  
> Languages which support fixed-point arithmetic, e.g. PL/I and Ada, could use
> such a feature to substantially improve fixed-point performance ...

FORTH is the only language to my knowledge that use explicitely integer double
precision intermediate for multiplication/division/modulo.  The original and
standard forth have 16 bits integers, so double are 32 bits.  The point is
that, as soon you do integer arithmetic, you need to (re)scale your operands,
specialy during multiplication/division.  So, most of time, a multiplication
is immediately followed by a division, or a division preceded by a 
multiplication, and a double precision intermediate results means you do not
loose precision.  For example, suppose you want to multiply a number by 'pi'.
You would then multiply this number by 31415 and divide the result by 10000
(16 bits) or by 314159265 and 100000000 ...  Notice that the high level
operator has now three operands and should be called 'ternary'.  This is
exactely what forth does.  forth use reverse polish notation, which is very
nice for ternary operators :    a b c */  ('*/" is the operator) means
take the product of a and b in double precision, and divide it immediately by
c, getting a single precision integer as result.  you can have *mod that would
leave a*b mod c or even */mod that will leave both quotient and modulo of
a*b by c.  all these operators have been extremely usefull in process control,
data acquisition etc, in particular with small computer or micro without FPA.

for more information about forth, see :
  MG Kelly and N Spies : FORTH: a text and reference , Prentice-Hall 1986
  L Brodie             : Starting FORTH,               Prentice-HAll 1982
  L Brodie             : Thinking FORTH,               Prentice-Hall 1984

regards,               Paul Bartholdi

  Dr Paul Bartholdi                 bartho@cgeuge54.bitnet
  Observatoire de Geneve            bartho@obs.unige.ch
  CH-1290 Sauverny                  02284682161350::bartho (psi)
  Switzerland                       20579::ugobs::bartho

cik@l.cc.purdue.edu (Herman Rubin) (06/30/89)

In article <1035@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes:
> In article <1989Jun26.195044.4197@cs.rochester.edu> crowl@cs.rochester.edu
> (Lawrence Crowl) writes:
> 
> 	[1] You are not supposed to write assembler programs on a RISC
> 	    machine. The compilers have sophisticated algorithms to generate
> 	    "optimal" multiplication/division out of simpler instructions.

I find this attitude arrogant.  Neither you nor anyone else knows what 
complicated operations I want to do.  Why should you try to make it 
difficult for me to use the power of the computer?

> 	[2] Fixed point and the "muldiv()" idea to me look very much in the
> 	    RISC philosphy of minimalism. Fixed point, and multiple precision
> 	    integer arithmetic, are in many many cases preferable to floating
> 	    point and its complexities. Too bad that mathematicians are lulled
> 	    in a false sense of familiarity by floating point's "scientific
> 	    notation" and apparent support for real numbers.

Do any mathematicians take this attitude?  Maybe if their only knowledge of
computers comes from HLLs and they swallow the hype.  Outside of the field
of numerical analysis, the "scientific notation" is rarely used in mathematics.

Mathematicians who know that floating point numbers are, in reality, only
approximations to real numbers will not be so confused.  And good numerical
analysts do not make this confusion.  In fact, mathematicians complain that
they are only offered limited precision floating point.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

mdeale@mira.acs.calpoly.edu (Myron Deale) (07/01/89)

In article <1370@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>In article <1035@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes:
>> In article <1989Jun26.195044.4197@cs.rochester.edu> crowl@cs.rochester.edu
>> (Lawrence Crowl) writes:
>> 
>> 	[1] You are not supposed to write assembler programs on a RISC
>> 	    machine. The compilers have sophisticated algorithms to generate
>> 	    "optimal" multiplication/division out of simpler instructions.
>
>I find this attitude arrogant.  Neither you nor anyone else knows what 
>complicated operations I want to do.  Why should you try to make it 
>difficult for me to use the power of the computer?

   A good point. I spend much of my time "on the outside of the envelope"
writing strange/ugly code because I'm always doing something that isn't
supported by the HLL's I have access to. For example, my present experiment
has to do with division techniques. My routine requires a 64-bit product
which the 68020 has but my compiler won't generate. Tweak time.

   It would be inane for me to complain to the compiler writers about not
having feature X. What happens is that next week I'll want feature Y, etc.
Let the compiler writers make a robust, production oriented tool, but I
won't be happy if there isn't a way to work around a problem, the HLL
itself, within reason.

   Now that it is possible to make 1 million transistor chips, it is
lame not to have capable arithmetic. The RISC generation knows how to
read and write, but their math could use working on. I'd pay a certain
price for better extended- and multi-precision arithmetic, although I
realize us math types aren't exactly a driving force in the marketplace.

>
>> 	[2] Fixed point and the "muldiv()" idea to me look very much in the
>> 	    RISC philosphy of minimalism. Fixed point, and multiple precision
>> 	    integer arithmetic, are in many many cases preferable to floating
...
>
>Mathematicians who know that floating point numbers are, in reality, only
>approximations to real numbers will not be so confused.  And good numerical
>analysts do not make this confusion.  In fact, mathematicians complain that
>they are only offered limited precision floating point.

   Perhaps it would be wise to concentrate on porting Mathematica (TM)
to an i860 or BIT SPARC, etc., in a few years. It's not fast, but it is
the best I've run across as far as multi-precision math ... and you could
afford to waste some MFLOPs on those machines. Not that you'd want to,
but for home-grown experimental code this may be an adequate approach.

Myron
// mdeale@cosmos.acs.calpoly.edu

stevew@wyse.wyse.com (Steve Wilson xttemp dept303) (07/01/89)

In article <1370@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>In article <1035@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes:
>> In article <1989Jun26.195044.4197@cs.rochester.edu> crowl@cs.rochester.edu
>> (Lawrence Crowl) writes:
>> 
>> 	[1] You are not supposed to write assembler programs on a RISC
>> 	    machine. The compilers have sophisticated algorithms to generate
>> 	    "optimal" multiplication/division out of simpler instructions.
>
>I find this attitude arrogant.  Neither you nor anyone else knows what 
>complicated operations I want to do.  Why should you try to make it 
>difficult for me to use the power of the computer?

Ok, now we have a chance to start the HLL versus assembly wars again ;-)

Seriously, there are machines that were NEVER meant to be programmed in
assembly!  Try programming a VLIW machine in assembly.  Chances are that unless
your the guy that architected it, or wrote the compilers for it, you can't.
(I've worked on such a box, at best it certainly ain't easy ;-)

I don't find it an arrogant attitude so much as an assumption that in
the GENERAL case allows the computer architect to make the best trade-
offs he can to allow the machine go as fast as possible for the largest
number of users in the machine's targeted market place.  

Steve Wilson

pcg@aber-cs.UUCP (Piercarlo Grandi) (07/01/89)

In article <1370@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:

    In article <1035@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes:
    > In article <1989Jun26.195044.4197@cs.rochester.edu> crowl@cs.rochester.edu
    > (Lawrence Crowl) writes:
    > 
    > 	[1] You are not supposed to write assembler programs on a RISC
    > 	    machine. The compilers have sophisticated algorithms to generate
    > 	    "optimal" multiplication/division out of simpler instructions.

    I find this attitude arrogant.  Neither you nor anyone else knows what 
    complicated operations I want to do.

Ehi! Two notes: First, I wrote point [1]. Lawrence Crowl has no fault :->.

Second, The attitude of RISC designers is not "arrogant"; one of the well
publicized tenets of their philosophy is to make the instruction set
simpler, and compensate at the compiler level. This is in most cases
remarkably successful.

    Why should you try to make it difficult for me to use the power of the
    computer?

Well, RISC advocates don't try to make life harder for you; they try to make
the machine faster by having simpler hardware at the price of more complex
code generation. They don't see more complex code generation as a goal in
itself (well, occasionally I have my doubts :->).

    > 	[2] Fixed point and the "muldiv()" idea to me look very much in the
    > 	    RISC philosphy of minimalism. Fixed point, and multiple precision
    > 	    integer arithmetic, are in many many cases preferable to floating
    > 	    point and its complexities. Too bad that mathematicians are lulled
    > 	    in a false sense of familiarity by floating point's "scientific
    > 	    notation" and apparent support for real numbers.

    Do any mathematicians take this attitude?

A lot, a lot. Most mathematicians doing programming are not numerical
analysts (and even numerical analysts often are not as diffident of floating
point as they should be).

    Maybe if their only knowledge of computers comes from HLLs and they
    swallow the hype.

Fortran is very popular, isn't it?  Well, I would also argue that Fortran is
so popular with mathematicians precisely because it was *designed* as a
FORmula TRANslator, i.e. to give the illusion of dealing in familiar
mathematical formulas, to the point of using the same temrinology.

    Mathematicians who know that floating point numbers are, in reality, only
    approximations to real numbers will not be so confused.

But in my experience many mathematicians (or physicists) are not numerical
analysts at all, and expect the computer to do their calculations as they
intend them to be. Such people are not aware of the fact that floating point
is *not* (emphatically!) an approximation of R^n; floating point is
*radically* different from R^n (even fundamental properties of arithmetic are
not the same!).

As a palliative, a lot of effort in the design of the IEEE floating point
standard has been devoted to make floating point calculations *safer*, i.e.
a bit less surprising to those that think in terms of R^n. A laudable goal,
in some respects, but one that has costs in architectural terms, and in
perpetuating comfortable delusions.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

cik@l.cc.purdue.edu (Herman Rubin) (07/03/89)

In article <1046@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes:
> In article <1370@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> 
>     In article <1035@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes:
>     > In article <1989Jun26.195044.4197@cs.rochester.edu> crowl@cs.rochester.edu
>     > (Lawrence Crowl) writes:
>     > 
>     > 	[1] You are not supposed to write assembler programs on a RISC
>     > 	    machine. The compilers have sophisticated algorithms to generate
>     > 	    "optimal" multiplication/division out of simpler instructions.
>     
>     I find this attitude arrogant.  Neither you nor anyone else knows what 
>     complicated operations I want to do.
> 
> Ehi! Two notes: First, I wrote point [1]. Lawrence Crowl has no fault :->.
> 
> Second, The attitude of RISC designers is not "arrogant"; one of the well
> publicized tenets of their philosophy is to make the instruction set
> simpler, and compensate at the compiler level. This is in most cases
> remarkably successful.

You do not understand.  A very good optimizinng compiler (and I have not
seen one) can do a better job of doing a particular complex operation than
I can, PROVIDED that I want to do that operation.  If my algorithm is
necessarily broken down into operations that the compiler writers have
taken into consideration. the compiler might be able to do better than
I can.  But even this has a caveat; I am not confined to a single algorithm,
and my choice of algorithms is decidedly influenced by machine properties.
In most cases, I do not need run-time profiling; I know my algorithms.

>     Why should you try to make it difficult for me to use the power of the
>     computer?

And what if my algorithms use instructions which I can form out of the machine
primitives easily, but which are at best very clumsy in the HLL operations the
unknowing HLL designer provides me?  You do not know what I have thought of.
I do not know what I will think of tomorrow.  I have both "portable" and
highly non-portable examples of this.

> Well, RISC advocates don't try to make life harder for you; they try to make
> the machine faster by having simpler hardware at the price of more complex
> code generation. They don't see more complex code generation as a goal in
> itself (well, occasionally I have my doubts :->).
>     
>     > 	[2] Fixed point and the "muldiv()" idea to me look very much in the
>     > 	    RISC philosphy of minimalism. Fixed point, and multiple precision
>     > 	    integer arithmetic, are in many many cases preferable to floating
>     > 	    point and its complexities. Too bad that mathematicians are lulled
>     > 	    in a false sense of familiarity by floating point's "scientific
>     > 	    notation" and apparent support for real numbers.
>     
>     Do any mathematicians take this attitude?
> 
> A lot, a lot. Most mathematicians doing programming are not numerical
> analysts (and even numerical analysts often are not as diffident of floating
> point as they should be).

>     Maybe if their only knowledge of computers comes from HLLs and they
>     swallow the hype.
> 
> Fortran is very popular, isn't it?  Well, I would also argue that Fortran is
> so popular with mathematicians precisely because it was *designed* as a
> FORmula TRANslator, i.e. to give the illusion of dealing in familiar
> mathematical formulas, to the point of using the same temrinology.

It is.  And anybody who thinks that Fortran is an adequate language for any
serious mathematics is either ignorant or stupid.  To quote the number
theorist D. H. Lehmer, pioneer in computational number theory even BC (before
computers) [quote from memory; may be inaccurate]

	None of the high level languages are adequate for number theory.
	Furthermore, I would not be able to design an adequate one.

Now Algol had already been produced, with the claim that it was a good
language for any problem in numerical mathematics.  It is not.

Few mathematicians know, unlike Lehmer and me, just how a computer works.
They are given to believe that it is a black box executing Fortran or
whatever.  Like the example cited by Klamkin of the use of oodles of
computer time to solve an initial value problem, where < 5 minutes of
thinking (I am not assuming that they are very good) and a small amount
of computation give an accurate answer, more accurate than they achieved.

What is there to know?  I have not seen a computer instruction set where
the user instructions (I am excluding much of the supervisory and kernel
stuff, which a user will not be able to access on a multi-user system,
although the user should be able to discuss problems with those who can)
are anywhere near as complicated as Fortran or C.

>     Mathematicians who know that floating point numbers are, in reality, only
>     approximations to real numbers will not be so confused.
> 
> But in my experience many mathematicians (or physicists) are not numerical
> analysts at all, and expect the computer to do their calculations as they
> intend them to be. Such people are not aware of the fact that floating point
> is *not* (emphatically!) an approximation of R^n; floating point is
> *radically* different from R^n (even fundamental properties of arithmetic are
> not the same!).

So why not let them know what's going on?

> As a palliative, a lot of effort in the design of the IEEE floating point
> standard has been devoted to make floating point calculations *safer*, i.e.
> a bit less surprising to those that think in terms of R^n. A laudable goal,
> in some respects, but one that has costs in architectural terms, and in
> perpetuating comfortable delusions.

This is, as stated, a palliative.  Palliatives do not always work.  BTW, I
find the IEEE standard frequently a pain; I would prefer the option of
having unnormalized floating point when I want it.  If you cannot see
why I want it, and I assure you that those cases are not handled as well
with normalized, you are in no position to criticize.

I am not saying that these HLLs may not save time.  I use them myself.  But
they are not adequate.  They are not complete.  They do not have the ability
to accommodate simple extensions.  And no matter what you put in, tomorrow
somebody will want something else, easily handled at the machine level.

I had to write a function program in our horribly designed assembler language
because the HLLs available did not have the constructs needed.  The machine
is easy to understand, but most assembler languages are designed to make life
difficult for the programmer.  Some HLL ideas can, and should, be used at the
assembler level.  Some should not even be considered.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

hascall@atanasoff.cs.iastate.edu (John Hascall) (07/03/89)

In article <1380@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>In article <1046@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes:
>> In article <1370@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>>  In article <1035@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes:
>>  > In article <1989Jun26.195044.4197@cs.rochester.edu> crowl@cs.rochester.edu
>>  > (Lawrence Crowl) writes:
>>  > 
>>  > 	[1] You are not supposed to write assembler programs on a RISC
>>  > 	    machine. The compilers have sophisticated algorithms to generate
>>  > 	    "optimal" multiplication/division out of simpler instructions.
 
>You do not understand.  A very good optimizinng compiler (and I have not
>seen one) can do a better job of doing a particular complex operation than
>I can, PROVIDED that I want to do that operation.  If my algorithm is
>necessarily broken down into operations that the compiler writers have
>taken into consideration. the compiler might be able to do better than
>I can.  But even this has a caveat; I am not confined to a single algorithm,
>and my choice of algorithms is decidedly influenced by machine properties.
>In most cases, I do not need run-time profiling; I know my algorithms.
 
    [...continues this thread for a while (quite well, I might add)...]

  Then makes the bold statement:

>What is there to know?  I have not seen a computer instruction set where
>the user instructions(1)
>are anywhere near as complicated as Fortran or C.

   I would venture some of the VAX instructions are more "complicated"
   than Fortran or C (this is why "RISC"ers love to flame it so much--
   besides its commercial success, of course).  A few examples:

      FF{CS}         Find First {Clear|Set} Bit in a field.
      CMP[Z]V        Compare [un]signed bitfield with [un]signed integer
      INDEX          Compute (multi-dimensional) array subscript
      {INS|REM}QUE   Insert/Remove entry to/from a queue
      EMOD           Extended Multiply and Integerize
      POLY           Polynomial Evaluation
      MATCHC         Find substring in a string
      MOVTUC         Move translated until ("escape") character
      SPANC          Span Characters (read the descripion of this one sometime)
      CRC            Calculate Cyclic Redundancy Check
      EDITPC         Edit Packed to Character Strings (i.e., COBOL PICtures)

(1) [original parenthetical material moved here for clarity]
>  (I am excluding much of the supervisory and kernel stuff, which a user
>  will not be able to access on a multi-user system, although the user
>  should be able to discuss problems with those who can)

    I would assert that the above instructions are at least as complicated
    as any single concept (statement-type, operator, etc.) in Fortran or C.
    (And we haven't even looked at addressing modes yet.)

John Hascall  /  ISU Comp Center  /  Ames, IA

pmontgom@sonia.math.ucla.edu (Peter Montgomery) (07/04/89)

In article <1370@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>In article <1035@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes:
>> In article <1989Jun26.195044.4197@cs.rochester.edu> crowl@cs.rochester.edu
>> (Lawrence Crowl) writes:
>> 
>> 	[1] You are not supposed to write assembler programs on a RISC
>> 	    machine. The compilers have sophisticated algorithms to generate
>> 	    "optimal" multiplication/division out of simpler instructions.
>
>I find this attitude arrogant.  Neither you nor anyone else knows what 
>complicated operations I want to do.  Why should you try to make it 
>difficult for me to use the power of the computer?
>
	Given integers A, B, C where  0 <= A, B, < C,  I want to be able to
find q, r such that A*B = q*C + r and 0 <= r < C.  I do multiple-precision
arithmetic with large numbers, and this is such an important operation that
I cannot afford to call a subroutine every time I do it.
So rather than have just a small assembly routine to do this function,
I write the entire loop or the entire procedure in assembly code.

	I want to be able to define primitives like this in my language,
telling the compiler which sequence of instructions to generate whenever it
encounters my primitive (this sequence of instructions will be defined
ONCE, in the machine dependent part of my program, but the code
referencing the primitives will be scattered throughout).  Many languages
allow one to define user primitives in terms of other language elements
(macros), but few languages allow us to go deeper and say things like (MC68020)

	"DEFINE QUOT_REM_64(arg1:unsigned long, register type D;
	                    arg2:unsigned long, register type D;
			    arg3:unsigned long, register type D),
	        RETURNS    (arg4:unsigned long, register type D;
			    arg5:unsigned long, register type D);
		LOCAL upper: register type D;
		LOCAL lower: register type D;
			movl  arg1, lower
			mulul arg2, upper:lower  /* 64-bit product arg1*arg2 */
			divul arg3, upper:lower  /* Divide by arg3 */
			movl  lower, arg4	 /* quotient */
			movl  upper, arg5	 /* remainder */
	END QUOT_REM;"

	When the compiler subsequently encounters an expression like 
(q, r) := QUOT_REM_64(A, B, C), the compiler would evaluate A, B, and C, 
converting them to unsigned long if necessary.  Each time these are 
referenced in the body, the values would be moved to a D register and 
the appropriate operation done.  The outputs get assigned to q and r. 
With a good optimizing compiler, the movl's could probably be eliminated
(and the compiler would be allowed to interchange arg1 and arg2 in the 
mulul since the instruction is computationally commutative).  The 
programmer expresses his algorithm in terms of the available instructions, 
while the compiler worries about the things it is good at (e.g., storage 
and register allocation, common subexpression recognition, loop invariants).  
The body of the definition would be allowed to reference more registers than 
are available, with the compiler responsible for handling the overflow.

	Note the ASM primitive of C is unsatisfactory, for it forces
the programmer to know where the compiler has put the operands.  I once
used FORTRAN statement functions on the Control Data 7600 to do
double-length integer multiplies (the same hardware instruction was
used for the upper half of floating and integer multiplications, and
I was able to tell the compiler to treat my original operands
as floating point without changing the bit-pattern), but nowhere
else have I succeeded. 
--------
        Peter Montgomery
        pmontgom@MATH.UCLA.EDU

lewitt@Alliant.COM (Martin E. Lewitt) (07/04/89)

In article <2274@wyse.wyse.com> stevew@wyse.UUCP (Steve Wilson xttemp dept303) writes:
--- much deleted ---

>Seriously, there are machines that were NEVER meant to be programmed in
>assembly!  Try programming a VLIW machine in assembly.  Chances are that unless
>your the guy that architected it, or wrote the compilers for it, you can't.
>(I've worked on such a box, at best it certainly ain't easy ;-)

--- some deleted ---

>Steve Wilson 

I'm not sure what machines out there are giving you the impression that
VLIW assembly is difficult, but that wasn't my experience at all.  I found
programming the FPS AP-120B array processors and their 64 bit mini-super
follow-on products, simple and elegant, especially compared to CISCs such
as the 68000 and the VAX.  I don't think I'd want to try the 8086.

With the FPS products, I seldom had to consult an architecture or
assembly manual.  I just kept a diagram of the architecture in front
of me and pictured the data being latched onto this bus or into that
register or functional unit.  There were a few simple rules to follow,
and most of the restrictions made sense, like don't latch two things
onto the same bus in the same instruction.  At the end you counted
your instructions and you knew how fast you were running.

On the CISCs, there are multiple cycle instructions and addressing modes,
instruction alignment and resource dependency stalls, etc.  At the end
of a task, I still find myself wondering if I missed some instruction or
addressing mode which might have been faster.

Maybe some VLIWs out there are more difficult because they are pushing
the technology harder, trying to encode more in an instruction word or
something.  They might sacrifice some of the generality that their
bus structure diagram would lead you to believe was there.  I'm curious
about these experiences with other VLIW architectures.

lewitt@Alliant.COM (Martin E. Lewitt) (07/04/89)

Sorry about the missing signature on that last posting, some kind of
permission problem on my signature file.  It should be appended now:
-- 
Phone: (206) 931-8364			Martin E. Lewitt      My opinions are
Domain: lewitt@alliant.COM		2945 Scenic Dr. SE    my own, not my
UUCP: {linus|mit-eddie}!alliant!lewitt  Auburn, WA 98002      employer's.

cik@l.cc.purdue.edu (Herman Rubin) (07/05/89)

In article <255@obs.unige.ch>, bartho@obs.unige.ch (PAUL BARTHOLDI) writes:
> In article <1989Jun26.195044.4197@cs.rochester.edu>, crowl@cs.rochester.edu (Lawrence Crowl) writes:
> >>When we were [designing integer division support on the MIPS processors], we
> >>couldn't think of any languages whose natural implementation on a 32-bit
> >>machine would expect generation of a 64 by 32 bit divide as the expected
> >>operation.  Does anybody know of any such, or is this strictly something one
> >>would use in handcoded routines?
> >  
> > Languages which support fixed-point arithmetic, e.g. PL/I and Ada, could use
> > such a feature to substantially improve fixed-point performance ...

	[There follows a discussion of how this happens in FORTH]

The language designers did not put something in the language, therefore the
hardware people do not put it on the machine, therefore the new language
designers do not put it in the language, and the recursion continues.  Despite
the claims of the Algolniks, there has been no language which comes close to
treating what I consider the hardware-natural operations which I would find
useful.  Many of them were present 35 years ago on a larger proportion of
machines than now.

What languages, other than Lisp and some similar ones, have the idea that
an operation or function can return a string of values?  What languages allow
the user to introduce addition operator symbols?  A few allow additional types.
Now these deficiencies in languages go back to day 1.

What is double precision?  We first need to know what single precision is.
And (no surprise) it is different on different machines, and can even be
different for addition and multiplication on the same machine.  On the
CRAY, for integer addition one can use 64-bit 2's complement numbers.
But for multiplication, only 48-bit sign-magnitude mantissas.  However,
the best integer multiplication possible is 24x24->48.  And all division
has to be programmed.

The CYBER 205 has only 48-bit 2's complement numbers for arithmetic.
There are even problems in integer addition.  But a double length product
can be produced with two multiplies, with minor problems.  Floating point
division is present, but fixed point must be programmed.  Interesting
things can be done with vectors, and a programmed 32x32->64 inner product
might be a good idea.

Now if you define single precision as 16 bits, a 32-bit machine is 
automatically double-precision.  If you define it as 256 bits, you
would need 256x256->512 and 512/256 giving a 256-bit quotient and
remainder to have double precision.  For those needing good integer
arithmetic, the bigger the better.  And a few machine operations make
a big difference.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

mac@mrk.ardent.com (Michael McNamara) (07/06/89)

	[ An ongoing discussion of the lack of certain interesting
math primitives in HLLs, unease of assembly programming risc chips,
and the need for extensible HLLs ].  

	On the extensible HLL front, why not go to the authors of the
extensible editor? the FSF's C compiler, GCC, has extended asm macro
support which allow you to symbolicly hook up your asm routines to HLL
variables. An excerpt from the gcc manual appears below.  You can
obtain gcc via ftp from a number of sites, as well as via uucp from
osu-cis.

	Note this is a bit of a long posting, but, gcc is USEFUL to
allow mortals to use the whole machine...

>  In article <1333@sunset.MATH.UCLA.EDU> pmontgom@sonia.math.ucla.edu
>  (Peter Montgomery) writes:
>  	Given integers A, B, C where  0 <= A, B, < C,  I want to be able to
>  find q, r such that A*B = q*C + r and 0 <= r < C.  I do multiple-precision
>  arithmetic with large numbers, and this is such an important operation that
>  I cannot afford to call a subroutine every time I do it.
>  So rather than have just a small assembly routine to do this function,
>  I write the entire loop or the entire procedure in assembly code.
>  
>  	I want to be able to define primitives like this in my language,
>  telling the compiler which sequence of instructions to generate whenever it
>  encounters my primitive (this sequence of instructions will be defined
>  ONCE, in the machine dependent part of my program, but the code
>  referencing the primitives will be scattered throughout).  Many languages
>  allow one to define user primitives in terms of other language elements
>  (macros), but few languages allow us to go deeper and say things like (MC68020)
>  
>  	"DEFINE QUOT_REM_64(arg1:unsigned long, register type D;
>  	                    arg2:unsigned long, register type D;
>  			    arg3:unsigned long, register type D),
>  	        RETURNS    (arg4:unsigned long, register type D;
>  			    arg5:unsigned long, register type D);
>  		LOCAL upper: register type D;
>  		LOCAL lower: register type D;
>  			movl  arg1, lower
>  			mulul arg2, upper:lower  /* 64-bit product arg1*arg2 */
>  			divul arg3, upper:lower  /* Divide by arg3 */
>  			movl  lower, arg4	 /* quotient */
>  			movl  upper, arg5	 /* remainder */
>  	END QUOT_REM;"
>  
>  	When the compiler subsequently encounters an expression like 
>  (q, r) := QUOT_REM_64(A, B, C), the compiler would evaluate A, B, and C, 
>  converting them to unsigned long if necessary.  Each time these are 
>  referenced in the body, the values would be moved to a D register and 
>  the appropriate operation done.  The outputs get assigned to q and r. 
>  With a good optimizing compiler, the movl's could probably be eliminated
>  (and the compiler would be allowed to interchange arg1 and arg2 in the 
>  mulul since the instruction is computationally commutative).  The 
>  programmer expresses his algorithm in terms of the available instructions, 
>  while the compiler worries about the things it is good at (e.g., storage 
>  and register allocation, common subexpression recognition, loop invariants).  
>  The body of the definition would be allowed to reference more registers than 
>  are available, with the compiler responsible for handling the overflow.
>  
>  	Note the ASM primitive of C is unsatisfactory, for it forces
>  the programmer to know where the compiler has put the operands.  I once
>  used FORTRAN statement functions on the Control Data 7600 to do
>  double-length integer multiplies (the same hardware instruction was
>  used for the upper half of floating and integer multiplications, and
>  I was able to tell the compiler to treat my original operands
>  as floating point without changing the bit-pattern), but nowhere
>  else have I succeeded. 
>  --------
>          Peter Montgomery
>          pmontgom@MATH.UCLA.EDU 

From the GCC info node, Extended Asm Support:

Assembler Instructions with C Expression Operands
=================================================

In an assembler instruction using `asm', you can now specify the
operands of the instruction using C expressions.  This means no more
guessing which registers or memory locations will contain the data you want
to use.

You must specify an assembler instruction template much like what appears
in a machine description, plus an operand constraint string for each
operand.

For example, here is how to use the 68881's `fsinx' instruction:

     asm ("fsinx %1,%0" : "=f" (result) : "f" (angle));

Here `angle' is the C expression for the input operand while
`result' is that of the output operand.  Each has `"f"' as its
operand constraint, saying that a floating-point register is required.  The
constraints use the same language used in the machine description
(*Note Constraints::).

Each operand is described by an operand-constraint string followed by the C
expression in parentheses.  A colon separates the assembler template from
the first output operand, and another separates the last output operand
from the first input, if any.  Commas separate output operands and separate
inputs.  The number of operands is limited to the maximum number of
operands in any instruction pattern in the machine description.

Output operand expressions must be lvalues; the compiler can check this.
The input operands need not be lvalues.  The compiler cannot check whether
the operands have data types that are reasonable for the instruction being
executed.  It does not parse the assembler instruction template and does
not know what it means, or whether it is valid assembler input.  The
extended `asm' feature is most often used for machine instructions
that the compiler itself does not know exist.

If there are no output operands, and there are input operands, then you
should write two colons in a row where the output operands would go.

The output operands must be write-only; GNU CC will assume that the values
in these operands before the instruction are dead and need not be
generated.  For an operand that is read-write, or in which not all bits are
written and the other bits contain useful information, you must logically
split its function into two separate operands, one input operand and one
write-only output operand.  The connection between them is expressed by
constraints which say they need to be in the same location when the
instruction executes.  You can use the same C expression for both operands,
or different expressions.  For example, here we write the (fictitious)
`combine' instruction with `bar' as its read-only source operand
and `foo' as its read-write destination:

     asm ("combine %2,%0" : "=r" (foo) : "0" (foo), "g" (bar));

The constraint `"0"' for operand 1 says that it must occupy the same
location as operand 0.

Only a digit in the constraint can guarantee that one operand will be in
the same place as another.  The mere fact that `foo' is the value of
both operands is not enough to guarantee that they will be in the same
place in the generated assembler code.  The following would not work:

     asm ("combine %2,%0" : "=r" (foo) : "r" (foo), "g" (bar));

Various optimizations or reloading could cause operands 0 and 1 to be in
different registers; GNU CC knows no reason not to do so.  For example, the
compiler might find a copy of the value of `foo' in one register and
use it for operand 1, but generate the output operand 0 in a different
register (copying it afterward to `foo''s own address).  Of course,
since the register for operand 1 is not even mentioned in the assembler
code, the result will not work, but GNU CC can't tell that.

Unless an output operand has the `&' constraint modifier, GNU CC may
allocate it in the same register as an unrelated input operand, on the
assumption that the inputs are consumed before the outputs are produced.
This assumption may be false if the assembler code actually consists of
more than one instruction.  In such a case, use `&' for each output
operand that may not overlap an input.  *Note Modifiers::.

Some instructions clobber specific hard registers.  To describe this,
write a third colon after the input operands, followed by the names of
the clobbered hard registers (given as strings).  For example, on the vax,

     asm volatile ("movc3 %0,%1,%2"
                   : /* no outputs */
                   : "g" (from), "g" (to), "g" (count)
                   : "r0", "r1", "r2", "r3", "r4", "r5");

Usually the most convenient way to use these `asm' instructions is to
encapsulate them in macros that look like functions.  For example,

     #define sin(x)       \
     ({ double __value, __arg = (x);   \
        asm ("fsinx %1,%0": "=f" (__value): "f" (__arg));  \
        __value; })

Here the variable `__arg' is used to make sure that the instruction
operates on a proper `double' value, and to accept only those
arguments `x' which can convert automatically to a `double'.

Another way to make sure the instruction operates on the correct data type
is to use a cast in the `asm'.  This is different from using a
variable `__arg' in that it converts more different types.  For
example, if the desired type were `int', casting the argument to
`int' would accept a pointer with no complaint, while assigning the
argument to an `int' variable named `__arg' would warn about
using a pointer unless the caller explicitly casts it.

GNU CC assumes for optimization purposes that these instructions have no
side effects except to change the output operands.  This does not mean that
instructions with a side effect cannot be used, but you must be careful,
because the compiler may eliminate them if the output operands aren't used,
or move them out of loops, or replace two with one if they constitute a
common subexpression.  Also, if your instruction does have a side effect on
a variable that otherwise appears not to change, the old value of the
variable may be reused later if it happens to be found in a register.

You can prevent an `asm' instruction from being deleted, moved or
combined by writing the keyword `volatile' after the `asm'.  For
example:

     #define set_priority(x)  \
     asm volatile ("set_priority %0": /* no outputs */ : "g" (x))

It is a natural idea to look for a way to give access to the condition
code left by the assembler instruction.  However, when we attempted to
implement this, we found no way to make it work reliably.  The problem
is that output operands might need reloading, which would result in
additional following "store" instructions.  On most machines, these
instructions would alter the condition code before there was time to
test it.  This problem doesn't arise for ordinary "test" and
"compare" instructions because they don't have any output operands.

--
_________________
Michael McNamara 
  mac@ardent.com

suitti@haddock.ima.isc.com (Stephen Uitti) (07/06/89)

In article <1380@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
(and various others wrote:)
>[1] You are not supposed to write assembler programs on a RISC
>machine. The compilers have sophisticated algorithms to generate
>"optimal" multiplication/division out of simpler instructions.
>>     I find this attitude arrogant.  Neither you nor anyone else knows what 
>>     complicated operations I want to do.

Still, I haven't needed to learn an assembly language just to
make some operation faster for over five years.  The machine I
have on my desk is four times faster than the machine I used to
share with 35 people at a time (a VAX 780).  It has as much disk
space (and much more for me).  It has more RAM (9 MB vs 4 or 8).
It runs the same operating system, though it has improved
capabilities (bitmapped screen, other local devices).  The reason
i can have it on my desk is that it costs 35 times less than the
old VAX.  Cost of ownership is also lower in similar proportion.

>You do not understand.  A very good optimizing compiler (and I have not
>seen one) can do a better job of doing a particular complex operation than
>I can, PROVIDED that I want to do that operation.

The current gcc output is OK, but i can do better.  I can
generally do minimally 20% better than any compiler.  If my
original idea maps well to the hardware, but not to the compiler,
then speedups can be an order of magnitude.  Then again, in one
case, the compiler writer provided a long divide, and i was able
to speed it up by 15% or so.  Three points: Computers are getting
real fast, so coding for speed is not required as often (which
means code will tend to get sloppier, which means that our
throughput will not increase as fast as our machine speeds...  of
course, i tend not to write that much stuff that needs to run for
very long).  Compilers do not code as well as a trained human can
(this will probably always be true - on the other hand, compilers
do their work faster than humans, and this will also continue to
be true.)  Any non-trivial piece of work can be improved on with
the addition of more work and attention (the more complex the
problem or system, the more that can be gained by additional attention).

>In most cases, I do not need run-time profiling; I know my algorithms.

Profiling often turns up surprises.  One graphics application
looked to be I/O bound.  Profiling showed that significant time
was being spent converting bitmaps.  An algorithmic change
yielded a factor of ten speed improvement.  Since no similar
improvement could be made with the addition of assembler, none
was added (it was tried, but it didn't help that much, was a lot
of work to finish since there were so many possible high-use
paths, and it made the application non-portable).

I've also used profiling for interactive programs to reduce the
time to respond to the user.  It turned out that the program was
performing a floating point calculation that could be precomputed
into a small table which could compiled into the program.  It just
wasn't known how slow the floating point was on the machine in
question.

>What is there to know?  I have not seen a computer instruction set where
>the user instructions
>are anywhere near as complicated as Fortran or C.

The VAX instruction set is pretty complex.  There are 16
registers, some have various dedicated purposes (stack, frame,
temporaries for bcopy, etc.).  There are lots of addressing
modes.  There are simply lots of instructions, and one needs to
know many of them.  Then you have to learn the local syscalls...
Fortunately, UNIX tends to be pretty good here.  Most OS's are
pretty complex.  Try opening a file under VMS...  It can be hard,
especially for a novice, to get started.

Fortran and C are pretty complex, but one doesn't need to know
that much to get started.  Further, there are structures that
allow readable representations, which allows the user to write
maintainable code (it doesn't mean he'll do it, but at least he can).
Of course, then there's portability.  How long will your current
machine last?  How long do you want to use your program?  Chances
are, by the time you upgrade your machine, you'll have forgotten
how your program worked.  Of course, you could just buy a CDC 6600
and hang onto it for a quarter century.  Then you could just retire.

>BTW, I find the IEEE standard frequently a pain; I would prefer
>the option of having unnormalized floating point when I want it.

Not all machines give you this capability.  One architecture i
worked with recently had entries for many instructions stating
that if you somehow managed to feed this instruction an
unnormalized value that it would either produce an incorrect
result or a floating point exception.  Given that all floating point
instructions produced normalized results on this machine, it was
hard to imagine how you'd get into a such a situation.

There are apparently good reasons for hardware types to restrict
the use of the hardware.  Most of them are probably speed and cost.

The main problem that i've had with the IEEE standard is that
it specifies more precision than my home computer really had the
power to support.  That's changed.  I bought a faster computer.
My C64 died...  and a good thing too - i'd still be in the dark ages.

>I am not saying that these HLLs may not save time.  I use them myself.  But
>they are not adequate.  They are not complete.  They do not have the ability
>to accommodate simple extensions.

I have yet to see a C compiler without hooks to allow assembly
language access.  Some have in-line capability.  Some have
function call and separate compilation capability.  Some have
both.  Most Fortrans, and even COBOLs i've seen have similar
capability.  I don't see a problem with this.  I assume you
aren't talking about interpreted BASIC...

What i'd like to see in language support is more extensive
libraries.  UNIX comes with all sort of neat libraries for C.
People have had various Fortran libraries for a long time.  Yet,
i still constantly write and rewrite time/date parsing functions,
misc database-like routines, etc.  With a rich library of code
and proper documentation, even assembler can be reasonable to
deal with.  Of course, now we're getting way off the beaten
comp.arch track...

Stephen.

suitti@haddock.ima.isc.com (Stephen Uitti) (07/06/89)

In article <1387@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>What languages, other than Lisp and some similar ones, have the idea that
>an operation or function can return a string of values?

C.  One can pass structures and unions back & forth by name or
value.  While it is true that some C compilers are broken when
passing large things by value, it has been defined in the
language since 1978.  OK, so the syntax is a little awkward in
that you have to define the structures, but at least it tends not
to add large amounts of additional overhead.  Also, in C, people
commonly use the code sequences:
caller:
	int i, j;
	char *p;
	i = foo(&j, &p);
callie:
	int foo(j, p)
	int *j;
	char **p;
	{
		*j = secondvalue;
		strcpy(*p, "third value");
		return value;
	}
Which is OK, as long as the interface is properly documented.

Now you want the language to use syntax that YOU defined?
Compiler writers are doing that now too.  I'd rather they spent
their time on designing unambiguous and coherant languages for
which code can be easily written, read and modified, that have
real scope rules for libraries and other seperately compiled
environments, and produce more optimal code for real machines,
and provide for special needs of real people in the way of
specialized preprocessors (lex, yacc) and prewritten library
code.

>What languages allow the user to introduce additional operator
>symbols?  A few allow additional types.  Now these deficiencies
>in languages go back to day 1.

C++ and Ada.  My wristwatch (Casio) has more compute resources
than was available on day 1.  I don't expect it to allow
additional operators and types either.  It does have a 64 bit
(BCD) divide.  Throw out your '205 and get a Casio.

Stephen.

khb@gammara.Sun.COM (gammara) (07/06/89)

In article <13943@haddock.ima.isc.com> suitti@haddock.ima.isc.com (Stephen Uitti) writes:
>In article <1380@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>(and various others wrote:)

>very long).  Compilers do not code as well as a trained human can
>(this will probably always be true - on the other hand, compilers

Depends on the compiler and the machine. The famed PL/.8 for the IBM
801 often produced provably perfect code (say those involved with the
project). Try hand coding a Multiflow Trace/28 or a Cydra-5 ... from
scratch (taking the f77 compiler output and hacking is cheating! :>).
There are some humans who can do better than the compiler on these
machines ... but for non-trivial codes the number of people is very
small, and cost very high.

The overall cost of a project is typically "maintence" bound (meaning
changing the definition of the problem now that we have a trial
solution ... all in the guise of making a few modest code changes).
Then comes coding and design costs, etc. 

Handcrafted assembly language is seldom economically viable today.
Given advances in compilers, and hardware evolution, it is often
counter productive (coded SAXPY is slower than a compiler which
inlines and outer loop unrolls).

Specialized instructions which are of great interest to (say) 5% of
the population are simply not worth putting into hardware (since it
means that "real estate" can be dedicated to solving 95% of the worlds
problems better). 

When folks have an instruction which is important enough to them (say
$200M) they can probably get it in. The vast majority of purchasers
have been voting with their feet ... simpler is more popular. (I am
counting the 80xxx as simpler than a mainframe ... this is not just
RISC vs. CISC this is thinking in economic, bottom line terms).

Keith H. Bierman      |*My thoughts are my own. Only my work belongs to Sun*
It's Not My Fault     |	Marketing Technical Specialist    ! kbierman@sun.com
I Voted for Bill &    |   Languages and Performance Tools. 
Opus  (* strange as it may seem, I do more engineering now     *)

khb@gammara.Sun.COM (gammara) (07/06/89)

In article <13946@haddock.ima.isc.com> suitti@haddock.ima.isc.com (Stephen Uitti) writes:
>In article <1387@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>
>>What languages allow the user to introduce additional operator
>>symbols?  A few allow additional types.  Now these deficiencies
>>in languages go back to day 1.
>
>C++ and Ada.  My wristwatch (Casio) has more compute resources
>than was available on day 1.  I don't expect it to allow
>additional operators and types either.  It does have a 64 bit
>(BCD) divide.  Throw out your '205 and get a Casio.

Well, as long as we are leaving .arch and moving to .lang :>,
Fortran8x has operator overloading and user defined operators.

Keith H. Bierman      |*My thoughts are my own. Only my work belongs to Sun*
It's Not My Fault     |	Marketing Technical Specialist    ! kbierman@sun.com
I Voted for Bill &    |   Languages and Performance Tools. 
Opus  (* strange as it may seem, I do more engineering now     *)

cik@l.cc.purdue.edu (Herman Rubin) (07/06/89)

In article <13946@haddock.ima.isc.com>, suitti@haddock.ima.isc.com (Stephen Uitti) writes:
> In article <1387@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> >What languages, other than Lisp and some similar ones, have the idea that
> >an operation or function can return a string of values?
> 
> C.  One can pass structures and unions back & forth by name or
> value.

NO!.  For the n-th time, a list is not a struct.  A struct is a memory
allocation scheme.  It presumes a particular "physical" ordering of the
items.  Even if one can implement a struct in registers, it still 
restricts which registers are used far more than a list does.  For
example, on a machine in which there is division yielding a quotient
and a remainder, I want to write

	q,r = x/y;

On the VAX, for example, q could be in register 3 and r in register 10.
Since the hardware allows simple things like this, the language should.
At the present time and in the future, loads and stores must slow things
down unless they can be overlapped.  The overlapping problem is greatly
exascerbated by having to do too many of them.
> 
> Now you want the language to use syntax that YOU defined?
> Compiler writers are doing that now too.  I'd rather they spent
> their time on designing unambiguous and coherant languages for
> which code can be easily written, read and modified, that have
> real scope rules for libraries and other seperately compiled
> environments, and produce more optimal code for real machines,
> and provide for special needs of real people in the way of
> specialized preprocessors (lex, yacc) and prewritten library
> code.

How many people would have difficulty with the above quotient-remainder
notation even without explanation?  With commenting, lots more could be
done without difficulty.  A struct is more complicated than a new type.
>
> >What languages allow the user to introduce additional operator
> >symbols?  A few allow additional types.  Now these deficiencies
> >in languages go back to day 1.
> 
> C++ and Ada.  My wristwatch (Casio) has more compute resources
> than was available on day 1.  I don't expect it to allow
> additional operators and types either.  It does have a 64 bit
> (BCD) divide.  Throw out your '205 and get a Casio.
> 
I am not very familiar with Ada.  C++ allows the introduction of new
types, although they are not called types.  The stupid use of the word
"typedef" in C rather than the more accurater "typealias" is responsible
for this.  But I do not believe that C++ or Ada allows one to define
symbols or symbol combinations as infix operators.  As a simple case,
I should be able to tell the compiler that \ is the bit clear operation,
which &~ may or may not handle properly, and has the same machine
structure as OR on the VAX (or any other machine which has it in
hardware.)  If I am using a 205, I will use an algorithm appropriate
for the 205.  If I am using a CRAY, I will use an appropriate one.
If it involves double precision multiplication or square root, I will try
hard to find a workaround on the CRAY.


-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

suitti@haddock.ima.isc.com (Stephen Uitti) (07/07/89)

In article <1388@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>> >What languages, other than Lisp and some similar ones, have
>> >the idea that an operation or function can return a string of values?
>> C.  One can pass structures and unions back & forth by name or
>> value.
>NO!.  For the n-th time, a list is not a struct.  It presumes a
>particular "physical" ordering of the items.

LISP does not actually pass whole lists around - it uses pointers
and linked lists.  In C, linked lists are typically implemented
with a struct for each node.  There are no lists in C, but
structs can be lists.

You wanted to have a function be able to return more than one
item.  I suggested two syntactic solutions: returning a struct,
or having the caller specify where to put the answers.  When it
comes down to it, you don't want the compiler to necessarily
generate a function with call/return overhead.  You really want
functions or intrinsics that the compiler can turn into inline
code.  You are thinking of inline code as a single instruction,
though i'm not limiting this to anything that simple.

Is there anything really evil with this:
	q = divrem(top, bottom, &remainder);
Divrem can "return" q, and stuff "remainder" where it is supposed
to go.  Compilers are getting smart enough to do this the way you
want it.  This is documentable.  This is manageable.  Type
checking can be done everywhere to reduce the error rate.

Let's say you have a '205 and a VAX.  You want to develop your
code on the VAX, because it has better debuggers, faster response,
and it is much cheaper than '205 time.  If you break the grammer
of your standard language, then you have to break it for both
machines.  The above 'divrem' function can be written in real C,
and a (slow) version can run, and be debugged on the VAX.  Then,
the very same code can be run on the target machine.

The way '205 C's vector stuff was actually done broke all sorts
of things.  Writing portable high speed code required heavy
ifdefs for each application.  It could have been done in a manner
that a simple library could be written for the VAX (etc.) that
would allow the code to be just run.

>I want to write
>	q,r = x/y;
Unfortunately, this already means something in C.

>On the VAX, for example, q could be in register 3 and r in register 10.

The syntax i suggested above would not require things to be in
any given place.  The compiler should be able to figure this
sort of thing out.

>Since the hardware allows simple things like this, the language should.

The language does.  It just doesn't allow you to do it with the
syntax you want.  This is (by comparison to register allocation
and procedure inlining) a hard problem.  Languages that have
fixed syntax are difficult enough to prove unambiguous.  If the
language allows free redefinition, it is easy for the user to
generate something that can't be parsed.  It took years to get a
real Ada compiler to work.

>At the present time and in the future, loads and stores must slow things
>down unless they can be overlapped.

Or optimized out.

>The stupid use of the word "typedef" in C rather than the more
>accurater "typealias" is responsible for this.

OK, so the terminology for C is a little broken.  I've used
languages that have (for example) German words for keywords.
That didn't mean that i felt the need to change the language's
syntax or terminology.

>As a simple case,
>I should be able to tell the compiler that \ is the bit clear operation,
>which &~ may or may not handle properly, and has the same machine
>structure as OR on the VAX.

How do you want to tell it what the operator means?  Do you want
to give it an assembler template?  Do you want to give it an
algorithm already expressible in the language?  If it is
assembler, you probably want gcc's asm feature that allows
symbolic names to be given to assembler arguments.  Then, if it
turns out to be more complicated, you might try preprocessor
macros.  If it turns out to be more complicated than that, you
might put it into an inlineable function.  Otherwise, it should
probably be a true function.  You may already have the tools you need.

Stephen.

cik@l.cc.purdue.edu (Herman Rubin) (07/07/89)

In article <13961@haddock.ima.isc.com>, suitti@haddock.ima.isc.com (Stephen Uitti) writes:
> In article <1388@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> >> >What languages, other than Lisp and some similar ones, have
> >> >the idea that an operation or function can return a string of values?
> >> C.  One can pass structures and unions back & forth by name or
> >> value.
> >NO!.  For the n-th time, a list is not a struct.  It presumes a
> >particular "physical" ordering of the items.
> 
> LISP does not actually pass whole lists around - it uses pointers
> and linked lists.  In C, linked lists are typically implemented
> with a struct for each node.  There are no lists in C, but
> structs can be lists.

And this is the reason that production programs and library subroutines
should not be written in Lisp.  But it is possible to get much done
efficiently.  This should be done by enabling those who can see how
to do things to do them and to communicate that to others.  Anything
can be done on a universal Turing machine, but I am not willing to
wait a month for it to multiply two 10-digit numbers by reducing
everything to successor.

> You wanted to have a function be able to return more than one
> item.  I suggested two syntactic solutions: returning a struct,
> or having the caller specify where to put the answers.  When it
> comes down to it, you don't want the compiler to necessarily
> generate a function with call/return overhead.  You really want
> functions or intrinsics that the compiler can turn into inline
> code.  You are thinking of inline code as a single instruction,
> though i'm not limiting this to anything that simple.
> 
> Is there anything really evil with this:
> 	q = divrem(top, bottom, &remainder);
> Divrem can "return" q, and stuff "remainder" where it is supposed
> to go.  Compilers are getting smart enough to do this the way you
> want it.  This is documentable.  This is manageable.  Type
> checking can be done everywhere to reduce the error rate.

There are three things wrong with this.  First, I do not know of 
any version of C which can pass the address of a register.  In
fact, I know of few machines in which that construct makes sense.
And how would you pass it, anyhow?

The second is that I do not want the compiler to even think that this
should be a subroutine call.  For what I consider obvious, I should not
have to rely on the compiler writer being equally astute.  Of course,
this is only the case if the hardware is on the machine.  But even if
not, it probably should be inlined.

The third is, in my opinion, the most important.  It is why I execrate
most of the present assembler languages.  Notation is very important.
For example, I consider the introduction of algebraic symbols by
Diophantus as one of the really great contributions to mathematical
communication.  In fact, I would make it THE mathematical requirement
for admission to college.  But the same holds for algebraic operators.

I am no more willing to write 	q = divrem(top, bottom, &remainder);
every time I want to use it than you are to write  x = sum(y,z);
In fact, I would consider the first as more confusing notationally
than the second.  Is it any easier for a human to read than

	q,r = top/bottom;

What about

	exp,mant =UP (d);

to unpack a double into an int exponent and a double long mant, both to
be in registers?  I have used this.

> Let's say you have a '205 and a VAX.  You want to develop your
> code on the VAX, because it has better debuggers, faster response,
> and it is much cheaper than '205 time.  If you break the grammer
> of your standard language, then you have to break it for both
> machines.  The above 'divrem' function can be written in real C,
> and a (slow) version can run, and be debugged on the VAX.  Then,
> the very same code can be run on the target machine.
> 
What gives you the impression that I would want to use the same
code on a 205 and a VAX?  I have used both machines, and I would
consider it a mistake much of the time.  On the VAX I would use
a long long, but on the 205 only a one-word integer.  To me they
are somewhat different.  Only at the bright human level should one
consider a common code.  For division with remainder, the code should
be machine language, not C.  Would you use C to do machine-independent
implementation of division?  You would not think of it.  Neither would
I think of making similar operations such as frexp, divrem, exponentiation,
absolute value, integer part, etc., machine independently coded in C.
I would not consider coding the =UP above in C.  It is an additional
primitive operation, like +, *, /, &.

> The way '205 C's vector stuff was actually done broke all sorts
> of things.  Writing portable high speed code required heavy
> ifdefs for each application.  It could have been done in a manner
> that a simple library could be written for the VAX (etc.) that
> would allow the code to be just run.

I consider the 205 hardware the best designed vector hardware I have
seen.  It is the C language which is inadequate.  And there is a severe
limit to what should be portable.

> >I want to write
> >	q,r = x/y;
> Unfortunately, this already means something in C.

What does it mean?  Would putting the q,r in parentheses or something else
avoid the conflict?  The use of a list to the left of the = should have
been noticed as a deficiency in the languages 35 years ago.  

> >On the VAX, for example, q could be in register 3 and r in register 10.
> 
> The syntax i suggested above would not require things to be in
> any given place.  The compiler should be able to figure this
> sort of thing out.

I have yet to see a fair compiler.

> >Since the hardware allows simple things like this, the language should.
> 
> The language does.  It just doesn't allow you to do it with the
> syntax you want.  This is (by comparison to register allocation
> and procedure inlining) a hard problem.  Languages that have
> fixed syntax are difficult enough to prove unambiguous.  If the
> language allows free redefinition, it is easy for the user to
> generate something that can't be parsed.  It took years to get a
> real Ada compiler to work.

It is not hard.  I will very gladly settle for a highly flexible easily
written simple language.  The problem is that there are a limited number
of basic macro substitutions in the present languages, and because of this,
elegance is attempted.  If the language had to allow for the user using the
same types of macros, more simplicity would be required.

> >At the present time and in the future, loads and stores must slow things
> >down unless they can be overlapped.
> 
> Or optimized out.

I consider a programmer quite inept who does not know what is temporary and
what is not.  I consider someone quite bright who can look over code not
making this distinction and figure it out.  A compiler is fast, but does
not compare in intelligence with a human imbecile.

> >The stupid use of the word "typedef" in C rather than the more
> >accurater "typealias" is responsible for this.
> 
> OK, so the terminology for C is a little broken.  I've used
> languages that have (for example) German words for keywords.
> That didn't mean that i felt the need to change the language's
> syntax or terminology.

There is a difference between using a German word and using an incorrect
word.  There were languages before C which allowed the user to define types.
K&R even comment that "typedef" does not allow one to define a new type.

Also, computer language and system people seem prone to taking standard
mathematical symbols and using them for other purposes, even to the extent
that the standard use is precluded by it.

> >As a simple case,
> >I should be able to tell the compiler that \ is the bit clear operation,
> >which &~ may or may not handle properly, and has the same machine
> >structure as OR on the VAX.
> 
> How do you want to tell it what the operator means?  Do you want
> to give it an assembler template?  Do you want to give it an
> algorithm already expressible in the language?  If it is
> assembler, you probably want gcc's asm feature that allows
> symbolic names to be given to assembler arguments.  Then, if it
> turns out to be more complicated, you might try preprocessor
> macros.  If it turns out to be more complicated than that, you
> might put it into an inlineable function.  Otherwise, it should
> probably be a true function.  You may already have the tools you need.

Whatever is appropriate.  But the C preprocessor, or any other present
macro language in my ken, will not do it.  A macro template processor,
in which the macro "name" is scattered through the macro, and which is
weakly typed and may even use storage classes, would do the job.

For example, I would consider x ={t} y - z the "=-" macro with arguments
x, y, and z.  Its precise meaning involves the types of the arguments,
unless the optional t field is present, which overrides the type.  A more
complicated one is the basic vector operation collection on the 205, which
could be written

	c{'z} ={t} {-}{|} a{'x} OP{m} {|} b{'y} {/\{~}w}

Here we have the "=OP" macro with 3-7 arguments and 10 optional fields.
Since the fields operate independently and simply, this is not really
difficult.  If a or b is scalar, the displacement fiels following
them must be absent, and if OP is Boolean, the - and | fields must
be absent, etc.  So what?  With such a macro processor, I would not
be too inconvenienced by the lack of a compiler.

> Stephen.

The value of a compiler and language is ease of programming.  In this,
notation and flexibility are extremely important.  If a mathematician
finds notation missing, he invents it for his paper.  It may or may not
be adopted by others, but that does not prevent people from reading his
paper.

The same should hold for languages and compilers.  If I see a missing
construct, I should be able to define it using the power of the machine
on which I intend to run it.  A simple example of the lack of desirability
of C portability is the addition of two vectors.  The good coding of this
depends on the costs of the *x++ in loading and storing and the use of
index in loading and storing.  These depend both on the machine and the
storage classes of the adresses.  There are others.  A good programmer
must know these things and discuss the alternatives with the user.  This
is even the case if the programmer and user are the same person.
flexibility are important.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

peter@ficc.uu.net (Peter da Silva) (07/07/89)

In article <1388@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes:
> For example, on a machine in which there is division yielding a quotient
> and a remainder, I want to write

> 	q,r = x/y;

(1) What does this do when no such operation exists?
(2) How does this fit in with the rest of the expression syntax?
(3) Get a good optimiser and say:
	q = x/y;
	r = x%y;
(4) Is it obvious which is the quotient and which is the remainder
    when you have something like:
	tens,relx = muscle0.fiber/potential;
(5) What if x or y contain other operations that could require a
    quotient or remainder?
(6) What does this mean:
	q,r,s = x/y/z;
(7) What does this sort of micro-optimisation gain you? Except in a
    tight loop, very little. Since you're intending to be machine-
    specific anyway, why not squirrel that loop away in an assembly
    routine?
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.
Business: peter@ficc.uu.net, +1 713 274 5180. | "WHAT HAPPENED TO ALL
Personal: peter@sugar.hackercorp.com.   `-_-' |  THE WOMEN IN TEXAS?"
Quote: Have you hugged your wolf today?  'U`  |  -- ACS1W@jane.uh.edu (meesh)

chansen@NeXT.UUCP (Craig Hansen) (07/08/89)

There's been some discussion of why double-length multiply and divide
were or were not included on various machines. However, the real
reasons behind why RISC machines don't like double-length multiply and
divide hasn't been hit. I can speak most authoritatively on the Mips
RISC processor.

Most operations on a RISC processor can be expressed as a function of
the contents of two general registers, yielding a single result.
Double-length multiply (two sources; two results) and divide (three
sources; two results) both violate this generalization, which makes
them more expensive to implement. A Nice Clean Architecture would just
add register specifiers and read and write ports until there were
enough, but that's not way of RISC.

The Mips R2000 uses two special-purpose registers, each of which hold
half of the double-length result of a multiply, or the quotient and
remainder of a divide. These registers are very carefully handled in
the implementation to permit them to be written into immediately on
starting up the operation, in order to avoid additional bypass and
staging latches - they're written into two cycles earlier than the
general registers.

This is the reason why operations that modify these registers must not
occur within two cycles after an instruction that reads them: if they
are any closer, an interrupt or exception may require restarting the
instruction stream at the special-register read, even though the
register was previously modified by an instruction that modified the
register and was aborted.

For double-length divides, you'd need three words of source operand,
and the R2000 only have two general register read ports. Yes, there's
room in the instruction encoding for another register specifier, but
there's no hardware to get a third value from the register file.
Because of the way the special-purpose registers are not bypassed,
it would not be possible to use one them to hold the third value;
if an interrupt or exception required restarting a double-length
divide, that third value would be corrupted.

So that's why there's no double-length divide: although the divider
itself is intrinsically able to perform the operation, it would have
cost more latches and multiplexors to get the data into the unit.
Latches and multiplexors make up a surprising amount of the cost of a
RISC processor; one works very hard to minimize the number of them.

There are other reasons not to bother: a double-length divide can
overflow in difficult to detect ways; a single-length divide can also
overflow, but it's easy to detect divide by zero and divide of MININT
by -1 in software while the divide is running in parallel. Of course,
as has been mentioned before, there's no expression of this operation
in C, so a C compiler won't generate it.

Finally, it should be noted that an integer divide is actually more
complex (time x space-wise) than a floating-point divide: there are
fast redundant-representation techniques that only work for normalized
numbers. You'd probably find that a multiple-precision divide can be
implemented faster using floating-point arithmetic than fixed-point on
most RISC machines.

Regards,
Craig Hansen

byrd@portia.Stanford.EDU (Greg Byrd) (07/08/89)

In article <13961@haddock.ima.isc.com> suitti@haddock.ima.isc.com (Stephen Uitti) writes:

[original posting about returning multiple values...]

>LISP does not actually pass whole lists around - it uses pointers
>and linked lists.  [stuff deleted]
>
>You wanted to have a function be able to return more than one
>item.  [more stuff deleted]

Just wanted to point out that you *can* return multiple values in
Lisp without returning a list -- e.g., the 'values' and 
'multiple-value-bind' forms in Common Lisp.

...Greg Byrd
   Stanford University

stevew@wyse.wyse.com (Steve Wilson xttemp dept303) (07/08/89)

In article <3243@alliant.Alliant.COM> lewitt@Alliant.COM (Martin E. Lewitt) writes:
>I'm not sure what machines out there are giving you the impression that
>VLIW assembly is difficult, but that wasn't my experience at all.  I found
> stuff deleted...

The Cydra-5.  I worked on this hardware directly and have a fair idea 
of the complexity for that specific architecture.  Things are easy
enough when your running linear code, but when you have 4 interations
of the same loop hot at the same time, and each resource is involved
with a different interation during the same clock, and how do you 
deal with the register allocation such that you don't cause the various
interations to colide with each other, i.e. don't break the dependencies,
etc.  It just gets worse from here, i.e. my hats off to the compiler
guys that could get this thing to run!

Steve Wilson

suitti@haddock.ima.isc.com (Stephen Uitti) (07/08/89)

>For division with remainder, the code should
>be machine language, not C.  Would you use C to do machine-independent
>implementation of division?  You would not think of it.  Neither would
>I think of making similar operations such as frexp, divrem, exponentiation,
>absolute value, integer part, etc., machine independently coded in C.
>I would not consider coding the =UP above in C.  It is an additional
>primitive operation, like +, *, /, &.

Not only would i think of it, i've done it.  A check is made to see that
the compiler really generates the right instructions.

>I consider the 205 hardware the best designed vector hardware I have
>seen.

No real debate here.  The '205 was a real ambitious project.
Very, very difficult and costly to maintain, but real nice.
Current RISC machines are also ambitious, but in a differant way.
The goal is to make something simple enough to support that is
also fast enough to be viable.  The '205 lost the commercial race.
Mr Cray, who has been designing RISC architectures since the
dawn of time, is one of the winners.

>  It is the C language which is inadequate.  And there is a severe
>limit to what should be portable.
>> >I want to write
>> >	q,r = x/y;
>> Unfortunately, this already means something in C.
>
>What does it mean?

It means "evaluate the expression 'q', then the expression 'r =
x/y'.  The fact that 'q' is a variable doesn't mean anything.  It
is treated as an 'rvalue' (one that gives a result) rather than
as an 'lvalue' (where to put a result).  The comma operator is
extremely basic to the language.  It is probably a flaw in the
language design that it is programmer visible.  It is not a flaw
that i would suggest fixing.

>Would putting the q,r in parentheses or something else
>avoid the conflict?

Parenthesis will screw up something.  There is probably
a fix, but it won't be provable off the top of my head.

>> >At the present time and in the future, loads and stores must slow things
>> >down unless they can be overlapped.
>> 
>> Or optimized out.
>
>I consider a programmer quite inept who does not know what is temporary and
>what is not.

A compiler evaluating an expression generally produces all sorts
of temporary variables.  These are typically kept in registers,
but sometimes overflow - often to the stack.  On a machine like
the 205, with its near infinite register count, one would imagine
that there would never be register overflow.

>making this distinction and figure it out.  A compiler is fast, but does
>not compare in intelligence with a human imbecile.

Sure it compares.  The compiler is faster.  The compiler is
correct more often.  The human probably knows how to breath and
eat.  The compiler has no need for breathing or eating.  Of
course, there's the 'idiot savant', but i've never met one who
was actually much of an imbecile.

>The value of a compiler and language is ease of programming.  In this,
>notation and flexibility are extremely important.  If a mathematician
>finds notation missing, he invents it for his paper.  It may or may not
>be adopted by others, but that does not prevent people from reading his
>paper.

While i find such papers hard to read (especially since the
definition of the new notation syntax is seldom given), for
programs it is worse.  Even with the C preprocessor's limited
capabilities, one can write code that doesn't bear any
resemblence to C.  Maintanence of this code is nearly impossible.
Thus, sadly, notation flexibiltiy and maintainability are at odds.

>A simple example of the lack of desirability
>of C portability is the addition of two vectors.  The good coding of this
>depends on the costs of the *x++ in loading and storing and the use of
>index in loading and storing.  These depend both on the machine and the
>storage classes of the adresses.

The code sequence is:

void lvecadd(a, b, c, s) /* a = b + c, length s */
long *a; long *b; long *c; long s;
{
	do {
		*a++ = *b++ + *c++;
	} while (--s != 0);
}

or:

void lvecadd(a, b, c, s) /* a = b + c, length s */
long *a; long *b; long *c; long s;
{
	register long t;
	for (t = 0; i < s; t++)
		a[t] = b[t] + c[t];
}

A '205 C compiler should (doesn't but should) take either
piece of code and translate it into the instruction.

I've seen a C compiler which did better for the second example
than the first.  What was odd was that the target machine, a
68000, does better with the first algorithm.  Worse, the second
peice of code was effectively translated into the first, using
redundant temporary variables for the pointers.  Somehow, the
compiler thought that extra work had to take place for the first
example.  The point, though, is that the programmer visible
syntax need not bear any resemblence to the quality of the
produced code.

Portability has its costs - usually about 10 to 20% of the
optimum machine speed.  Compiler technology has been attempting
to narrow this.  However, it has enourmous benifits.  UNIX, being
somewhat portable, allows new architectures to become useful in
an extremely short time after manufacture (something like a
year).  The speed penalty that comes from UNIX not being designed
for the architecture is offset by the years of development that
it takes to create a new OS.  By that time, the architecture may
well be obsolete.  User applications are the same way.  The value
of using a portable language is more than just being able to use
code that was written on one machine on another.  It takes time
for programmers to become proficient in a language.  If you write
a new language for every program you write, no one will be able
to make heads or tails of what you've done.

dik@cwi.nl (Dik T. Winter) (07/08/89)

Note also the reply why it is difficult to define syntax/semantics in
article <4909@ficc.uu.net> by peter@ficc.uu.net (Peter da Silva) and
the excellent explanation why MIPS does not provide a double width by
single width division in article <4016@bauhaus.NeXT.UUCP> by
chansen@NeXT.UUCP (Craig Hansen).  And I would also like to have
the double width operations directly visible, but I think it is more
difficult than Herman Rubin envisages.  In my opinion the gcc solution,
where you can use your own variables in asm statements, looks best.

A correction to the next statement:
In article <13970@haddock.ima.isc.com> suitti@haddock.ima.isc.com (Stephen Uitti) writes:
 > The code sequence is:
[a routine to add two vectors, using pointers]
 > or:
[an alternative]
 > A '205 C compiler should (doesn't but should) take either
 > piece of code and translate it into the instruction.
This is not true.  Because 'noalias' is out, the compiler has no knowledge
whether the sources overlap the destination.  So a lot more has to be done.
Unless the compiler is willing to do inlining, in which case it might (but
need not) have knowledge about overlap.

Another remark from the same article:
 > >I consider the 205 hardware the best designed vector hardware I have
 > >seen.
 > 
 > No real debate here.  The '205 was a real ambitious project.
 > Very, very difficult and costly to maintain, but real nice.
Some debate here (or simply, I do not agree).  However, the real problem
with the 205 is the abominable compiler.  Architecture is not really an
issue here; given a good compiler it would perform well.  Given the
compiler it has you have a heavy task to get acceptable performance.

On to the RISC vs. CISC debate:  how many of the 60000+ instructions
of the 205 do you think are ever used?  I know quite a lot that are
never used unless you use 'special calls', which is nothing more than
'asm' in C.

For discussions about the 205 hardware please follow up with a
different subject, or do it by mail.
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

mwm@eris.berkeley.edu (Mike (I'll think of something yet) Meyer) (07/08/89)

In article <1395@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin)
writes his usual continuing diatribe against modern (or, in the case
of C, not-so-modern) programming language designers.

Herman wants a language that is both notationally flexible enough to
allow him to use whatever notation is comfortable for him, and
sufficiently smart that he can write code that obviously maps 1-1 onto
machine hardware for the machine he's writing it for.

Since modern programming language designers are more concerned with
features like portability, maintainability and reusability (and the
ability to say *ability :-), they tend to ignore the things he wants.
Notational flexibility leads to nearly unreadable code, and incredible
support libraries, the antithesis of readability and maintainability.
The ability to write code specific to a machine architechture is
obviously antagonistic to the goal of portability.

Herman isn't the only person running around claiming some major area
of CS is going in the wrong direction. He isn't even the only one to
argue about it a lot, and do nothing to prove his point. He may even
be right.

However, flaming about it on USENet isn't going to do anything other
than make AT&T a lot of money.

Herman, you need to do three things. 1) Write or find a language that
is what you want. 2) Arrange to give it, along with enough
documentation for it to be useable, to anyone who wants it. 3) Write a
paper acceptable to a refereed journal (or SNot) describing the
language, and why it's superior to everything else available.

I predict that you'll be roundly ignored, as what you want is really
little more than a flexible template macro processor on top of an
assembler. Spiffy assemblers have been proposed before (Johnson at
Bell Labs wrote one; Whitesmith sold one with their C compilers), and
were roundly ignored.

	<mike
--
Cheeseburger in paradise                                Mike Meyer
Making the best of every virtue and vice                mwm@berkeley.edu
Worth every damn bit of sacrifice                       ucbvax!mwm
To get a cheeseburger in paradise                       mwm@ucbjade.BITNET

cik@l.cc.purdue.edu (Herman Rubin) (07/08/89)

In article <4909@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) writes:
> In article <1388@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes:
> > For example, on a machine in which there is division yielding a quotient
> > and a remainder, I want to write
> 
> > 	q,r = x/y;
> 
> (1) What does this do when no such operation exists?

What does q = x/y; get you if division does not exist?  There are such
machines.  Would it be totally rash to suggest that the reason that this
does not exist on more machines may be related to the fact that it does
not exist in the languages?

> (2) How does this fit in with the rest of the expression syntax?

I do not see the problem here.  The fact that there are a fair number
of people now using it is a good reason to keep it.  If you mean, how
does it fit into compound expressions, there are ways of doing it, but
I am not sure that I care.  If I have to give them up, but can specify
local temporaries, I will.

> (3) Get a good optimiser and say:
> 	q = x/y;
> 	r = x%y;

Does your language even have long long?  The same remark holds about
temporaries.  And if I have to put them in adjacent registers because
of the hardware, I will.

> (4) Is it obvious which is the quotient and which is the remainder
>     when you have something like:
> 	tens,relx = muscle0.fiber/potential;

All languages have conventions.  I find many of them more annoying
than this problem.

> (5) What if x or y contain other operations that could require a
>     quotient or remainder?

If it is necessary to break a complex operation into simple parts, do it.
This does not apply to the original problem, as the operation wanted is,
on many machines, a single machine instruction.  If the machine can do
it in one instruction, I demand that I can put it in the language as one
statement.  I do not expect everything to be portable.

> (6) What does this mean:
> 	q,r,s = x/y/z;

I do not know.  I am not even sure what a = x/y/z; means.  I would not
use that without parentheses.  I suspect that what is wanted is a pair
of statements, with an intermediate result a declared temporary.

> (7) What does this sort of micro-optimisation gain you? Except in a
>     tight loop, very little. Since you're intending to be machine-
>     specific anyway, why not squirrel that loop away in an assembly
>     routine?

Notation can be very important.  I consider the introduction of algebraic
symbols by Diophantus a major contribution.  My complaint about assembly
language (not machine language) is that it is written in a thoroughly
obfuscated notation.  But machine language needs an "alphabet."

I am perfectly willing to write in what I call a HLL machine language.
Given a decent macro processor, in which the syntax, including the
macro name, is arbitrary, I would write that.  This is really what
a compiler is; a macro processor, possibly with an optimizer added.
Some "optimizers" do little more than add to the macro processor.

So what I am saying is that if the machine can do it, I want a reasonable
way to include it.  I do not consider the current assemblers reasonable,
and I have pointed out some of the reasons why.
> Peter da Silva, Xenix Support, Ferranti International Controls Corporation.
> Business: peter@ficc.uu.net, +1 713 274 5180. | "WHAT HAPPENED TO ALL
> Personal: peter@sugar.hackercorp.com.   `-_-' |  THE WOMEN IN TEXAS?"
> Quote: Have you hugged your wolf today?  'U`  |  -- ACS1W@jane.uh.edu (meesh)

-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

stuart@bms-at.UUCP (Stuart Gathman) (07/11/89)

[ previous poster wants "q,r = x/y" to assign quotient to q, remainder to r ]

Even on fairly stupid compilers,

	q = x / y;
	r = x % y;

results in what you want (unless x & y have side effects - in that case assign
to temporaries).  In the absence of branches, even simple CSE is amazingly good.
-- 
Stuart D. Gathman	<stuart@bms-at.uucp>
			<..!{vrdxhq|daitc}!bms-at!stuart>

rhorn@infinet.UUCP (Rob Horn) (07/12/89)

In article <MAC.89Jul5102157@mrk.ardent.com> mac@mrk.ardent.com (Michael McNamara) writes:
>[ A reference to GCC for a good way to integrate (Peter Montgomery)'s
>  need for access to the 68020 32x32->64bit multiply]

I can also recommend using the GCC approach for this.  I tweaked up
one mpqs routine in a couple hours work (mostly paranoid testing
because I found a gcc asm bug in the process) to replace zaddmulp.
(The routine is basically a 31x31+31+carry->62 bit muladd.)  I didn't
go further primarily because rewriting more routines into inline asm
macros then caused the optimizer to have fits and die on the large
single source module.  This was back on gcc rev 1.31.  The current rev
may well be better able to handle this.

-- 
				Rob  Horn
	UUCP:	...harvard!adelie!infinet!rhorn
		...ulowell!infinet!rhorn, ..decvax!infinet!rhorn
	Snail:	Infinet,  40 High St., North Andover, MA

jeff@aiai.ed.ac.uk (Jeff Dalton) (07/22/89)

In article <1395@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>In article <13961@haddock.ima.isc.com>, suitti@haddock.ima.isc.com (Stephen Uitti) writes:
>> LISP does not actually pass whole lists around - it uses pointers
>> and linked lists.
>
>And this is the reason that production programs and library subroutines
>should not be written in Lisp.

Why?  Because Lisp uses pointers?

Some of the things C library procedures do are much worse than that.

 >But it is possible to get much done efficiently.  This should be done
 >by enabling those who can see how to do things to do them and to
 >communicate that to others.  Anything can be done on a universal
 >Turing machine, but I am not willing to wait a month for it to
 >multiply two 10-digit numbers by reducing everything to successor.

I'm not sure if this is still supposed to be about Lisp.
If so, you should note that Lisp has more than one data type.