[comp.arch] Assembly or ....

cik@l.cc.purdue.edu (Herman Rubin) (11/27/88)

In article <6529@june.cs.washington.edu>, kolding@june.cs.washington.edu (Eric Koldinger) writes:
> In article <1961@crete.cs.glasgow.ac.uk> orr@cs.glasgow.ac.uk (Fraser Orr) writes:
> >I don't agree that there is ever any necessity to code in assembler. We
> >have languages that produce code just as good as hand crafted assembler
> >(such as C), so why not use them for this sort of thing.
> 
> Ah, but wouldn't that be nice.  Optimizing compilers that could generate
> code as good as we can generate by hand in all cases.  Let me know when
> someone writes one.

I agree completely.  Also, on the various machines I know, there are operations
I want to use about which the compiler knows nothing.  To SOME extent, SOME
of these can be added to the HLL.  But I have not seen a scheme to do this
which will not mess up attempts at automatic optimization.  

Also, these interesting and useful instructions vary somewhat from machine to
machine.  I am appalled by what the language designers have left out, and also
what has been relegated to subroutines.  What can one think of a compiler 
designer who has relegated to a subroutine an operation whose inline code
is shorter than the caller's code to use the subroutine?  This is rampant.

I recently had the occasion to produce a subroutine for a peculiar function.
In principle, it should have been done in a HLL.  I would prefer to do so.
BUT, the following was needed:

  Take a double floating number, and extract the exponent and the mantissa.
  This involves having a double long type.

  Reverse the above operation.

  Addition, shifting, Boolean operations on double long.

  Checking an "overflow field."  Not the usual overflow.

If these are available, C would do a good job PROVIDED it put everything
in registers, if possible.

A few of the compilers I have seen do a fair job; the others would get 
a D--.  (Since D-- = C in C, this might be why C is so bad :-))  But one
of the most amazing things that I have seen in the workings of the 
designers is the assumption that the compiler has all the information
necessary to produce optimized code!  There is no provision for input
as to frequency of branches.  Should the common condition be the branch
or the rare condition?  Does it make a difference in the combination?
Since I have examples where the branches are of comparable frequencies,
examples where the ratio of the frequencies are from 10-200, where the
ratio is a few thousand, and where one branch may never occur, I certainly
feel that I have input.  I think the compilers should be interactive, and
discuss the various possibilities with the programmer.  I can even give
cases where the dictum to remove a calculation from within a loop is wrong.

All of mankind does not know enough to produce a good language, or to 
produce a good implementation of a language.  There are more operations
appropriate to hardware than are dreamed of in all computer scientists'
philosophies.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/28/88)

In article <1031@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>... on the various machines I know, there are operations
>I want to use about which the compiler knows nothing.

There are many of us who don't feel obligated to make use of every
little whiz-bang feature that hardware designers have so thoughtfully
provided.  Life is too short to spend it in an effort to accommodate
hardware quirks.

orr@cs.glasgow.ac.uk (Fraser Orr) (11/29/88)

In article <1031@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
<A few of the compilers I have seen do a fair job; the others would get 
<a D--.  (Since D-- = C in C, this might be why C is so bad :-))  But one
<of the most amazing things that I have seen in the workings of the 
<designers is the assumption that the compiler has all the information
<necessary to produce optimized code!  There is no provision for input
<as to frequency of branches.  Should the common condition be the branch
<or the rare condition?  Does it make a difference in the combination?
<Since I have examples where the branches are of comparable frequencies,
<examples where the ratio of the frequencies are from 10-200, where the
<ratio is a few thousand, and where one branch may never occur, I certainly
<feel that I have input.  I think the compilers should be interactive, and
<discuss the various possibilities with the programmer.  I can even give
<cases where the dictum to remove a calculation from within a loop is wrong.

I agree entirely (including with the cheap jibe against C:^>). I think
that there would be great improvements if compilers were interactive.
Not perhaps realised with low level languages like C and its ilk,
but certainly realised in much higher level programming languages.
Interactive isn't necessarily the best by the way, since it means
you have to repeat yourself, and the decisions are not documented
in the program. There is no reason why you can't have an "advice"
command, that advises the compiler the best implementation. (Doesn't
"advise" sound so much better than "pragma"?)

<Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907

==Fraser Orr ( Dept C.S., Univ. Glasgow, Glasgow, G12 8QQ, UK)
UseNet: {uk}!cs.glasgow.ac.uk!orr       JANET: orr@uk.ac.glasgow.cs
ARPANet(preferred xAtlantic): orr%cs.glasgow.ac.uk@nss.cs.ucl.ac.uk

henry@utzoo.uucp (Henry Spencer) (11/29/88)

In article <1031@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>... What can one think of a compiler 
>designer who has relegated to a subroutine an operation whose inline code
>is shorter than the caller's code to use the subroutine? ...

If the operation is infrequently used and not efficiency-critical, then
what one can think is "this guy knows his tradeoffs".  Adding an operation
to a compiler is not free.

>... But one
>of the most amazing things that I have seen in the workings of the 
>designers is the assumption that the compiler has all the information
>necessary to produce optimized code!  There is no provision for input
>as to frequency of branches...

Not true; some C implementations will take advice from the profiler on
this.  It's practically a necessity for the VLIW folks.  (Oh, you meant
advice from the *programmer*? :-)  The guy who's wrong 3/4 of the time
about such things?  Silly idea.)  (No, I'm not kidding -- programmer
intuition is vastly inferior to measured data in such matters.  This
has been known for many years.)
-- 
SunOSish, adj:  requiring      |     Henry Spencer at U of Toronto Zoology
32-bit bug numbers.            | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

shawn@pnet51.cts.com (Shawn Stanley) (11/29/88)

cik@l.cc.purdue.edu (Herman Rubin) writes:
>All of mankind does not know enough to produce a good language, or to 
>produce a good implementation of a language.  There are more operations
>appropriate to hardware than are dreamed of in all computer scientists'
>philosophies.

What might be more to the point is that there is no single language that is
perfect for every application.  The right tool for the job, they always say.

UUCP: {rosevax, crash}!orbit!pnet51!shawn
INET: shawn@pnet51.cts.com

cik@l.cc.purdue.edu (Herman Rubin) (11/29/88)

In article <8993@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes:
> In article <1031@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> >... on the various machines I know, there are operations
> >I want to use about which the compiler knows nothing.
> 
> There are many of us who don't feel obligated to make use of every
> little whiz-bang feature that hardware designers have so thoughtfully
> provided.  Life is too short to spend it in an effort to accommodate
> hardware quirks.

But these are the natural operations (to me as the algorithm designer).
Which of them are on a given machine varies.  A machine has few operations,
and if the information were provided for a mathematician as a reader, it 
would take not 1 day but 2-3 hours to understand a machine's instructions.

I find that the designers of the various languages have not considered the
type of operations which I would want to use WITHOUT KNOWLEDGE OF THE
MACHINE.  A trivial example is having a list of results to the left of
the replacement operator.  I do not mean a vector or a struct; the items
may be of different types, and should not be stored in adjacent memory
locations.  Most of the time, they should end up in registers.  I have
not seen a language which is claimed to produce even reasonably efficient
code with this property.  Some of these operations are even hardware.

Another effect of the HLL tyranny is that the operations which are beyond
the ken of th HLL designers are disappearing from the machines.  Other
useful operations are not getting in.  For example, suppose we want to
divide a by b, obtaining an integer result i and a remainder c.  I know
of no machine with this instruction, and this is not that unusual an 
instruction to demand.  It is cheap in hardware, and extremely expensive
in software--at least 4 instructions.

-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

mash@mips.COM (John Mashey) (11/29/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
...
>Another effect of the HLL tyranny is that the operations which are beyond
>the ken of th HLL designers are disappearing from the machines.  Other
>useful operations are not getting in.  For example, suppose we want to
>divide a by b, obtaining an integer result i and a remainder c.  I know
>of no machine with this instruction, and this is not that unusual an 
>instruction to demand.  It is cheap in hardware, and extremely expensive
>in software--at least 4 instructions.

Although I don't necessarily subscribe to Herman's opinions, R2000 divides
actually do this (leave both results in registers).  Although I shouldn't
have been, I was surprised to find that the following C code, compiled
unoptimized (optimized, it essentially disappears, of course):

main() {
	register i,j,k;
	i = j / 7;
	k = j % 7;
}

generates one divide instruction to get both of the results.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

ok@quintus.uucp (Richard A. O'Keefe) (11/29/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>A trivial example is having a list of results to the left of
>the replacement operator.  I do not mean a vector or a struct; the items
>may be of different types, and should not be stored in adjacent memory
>locations.

If you mean things like being able to write
	x, i, s := x/(1.0,+x), i-2, s || "x" || s
CPL had it, BCPL has it (in sequential form), MESA has it (you have to
put record constructor functions around both sides, but that's merely
notational), and it has appeared in several experimental languages.
Common Lisp supports it under the name PSETQ:
	(psetq x (/ x (+ 1.0 x))
	       i (- i 2)
	       s (however-you-concatenate-strings s "x" s))
I have never understood why it was omitted from Pascal, Modula, ADA:
	x, y := y, x
is the clearest way to exchange two variables I know.  I once put
this construct into the compiler for an Algol-like language; it was
easy, and if the compiler had done any optimisation it would still
have been easy.

I don't see this as a question of getting at the machine level, but as
providing one of the constructs in Dijkstra's notation (:-).

cjosta@taux01.UUCP (Jonathan Sweedler) (11/29/88)

In article <8938@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
|In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
|...
|> suppose we want to
|>divide a by b, obtaining an integer result i and a remainder c.  I know
|>of no machine with this instruction, and this is not that unusual an 
|>instruction to demand.  It is cheap in hardware, and extremely expensive
|>in software--at least 4 instructions.
|
|Although I don't necessarily subscribe to Herman's opinions, R2000 divides
|actually do this (leave both results in registers).  

The 32000 series has a DEI (Divide Extended Integer) instruction that
also does this. 

-- 
Jonathan Sweedler  ===  National Semiconductor Israel
UUCP:    ...!{amdahl,hplabs,decwrl}!nsc!taux01!cjosta
Domain:  cjosta@taux01.nsc.com

colwell@mfci.UUCP (Robert Colwell) (11/29/88)

In article <1988Nov28.205129.2216@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
]In article <1031@l.cc.purdue.edu] cik@l.cc.purdue.edu (Herman Rubin) writes:
]]... But one
]]of the most amazing things that I have seen in the workings of the 
]]designers is the assumption that the compiler has all the information
]]necessary to produce optimized code!  There is no provision for input
]]as to frequency of branches...
]
]Not true; some C implementations will take advice from the profiler on
]this.  It's practically a necessity for the VLIW folks.  (Oh, you meant
]advice from the *programmer*? :-)  The guy who's wrong 3/4 of the time
]about such things?  Silly idea.)  (No, I'm not kidding -- programmer
]intuition is vastly inferior to measured data in such matters.  This
]has been known for many years.)

Programmers ARE often wrong.  We've even seen cases where the programmer
had put in assertions for his Cray code that were provably incorrect for
certain interesting sets of input data.

But for branching, I wouldn't say it's a necessity for us to have the
ability for the profiler to tell the compiler where branches are going.
We have that mechanism, but it isn't often used.  The heuristics built
in to the compiler for predicting directions work awfully well.  And if
the programmer provides branch assertions, and they're wrong, and the
performance is adversely affected, presumably he or she will discover 
that and fix it.

Bob Colwell               ..!uunet!mfci!colwell
Multiflow Computer     or colwell@multiflow.com
175 N. Main St.
Branford, CT 06405     203-488-6090

desnoyer@Apple.COM (Peter Desnoyers) (11/30/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>
> For example, suppose we want to
>divide a by b, obtaining an integer result i and a remainder c.  I know
>of no machine with this instruction, and this is not that unusual an 
>instruction to demand.  

I can't think of a machine that doesn't. The only machine language
manuals on my desk (68000 and HPC16083) show this instruction.
If anyone knows of a machine that doesn't leave the remainder in
a register, I would be interested in knowing of it. Such a machine
either throws the remainder away or uses some algorithm besides
shift-and-subtract. (for instance hard-wired.)

				Peter Desnoyers

pardo@june.cs.washington.edu (David Keppel) (11/30/88)

>>cik@l.cc.purdue.edu (Herman Rubin) writes:
>>>divide a by b, obtaining an integer result i and a remainder c.
>>>I know of no machine with this instruction.  It is cheap in hardware,
>>>and extremely expensive in software--at least 4 instructions.

>mash@mips.COM (John Mashey) writes:
>>[R2000 has 1-instruction divide that does this ]

cjosta@taux01.UUCP (Jonathan Sweedler) writes:
>The 32000 series has a DEI (Divide Extended Integer) instruction that
>also does this. 

The rather obscure but still-available VAX series of computers made by
a small Nashua, New Hampshire company (Digital Equipment Corporation)
introduced the EDIV instruction recently, about 1978.  I heard a rumor
that (at least in some early/small VAXen) it was faster to do two
separate instructions than to use EDIV, although I suspect that this
was an artifact of those implementations.

Followups to comp.arch.

	;-D on  ( Now how about that `editpc' opcode? )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

consult@osiris.UUCP (Unix Consultation Mailbox ) (11/30/88)

I've trimmed all the language-specific groups out of the posting list for
this followup and directed followups to just comp.lang.misc and comp.arch.
(I feel the issues being discussed are too general to make it useful to post
outside of comp.lang.misc, with the exception of comp.arch, where many of
the points being raised will look very familiar.)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>A trivial example [of useful operations unsupported in HLLs] is having a
>list of results to the left of the replacement operator.  I do not mean a
>vector or a struct; the items may be of different types, and should not be
>stored in adjacent memory locations.  Most of the time, they should end up
>in registers.  I have not seen a language which is claimed to produce even
>reasonably efficient code with this property.  Some of these operations are
>even hardware.

I'll have to ask... what kind of things are you talking about here (beyond
the really obvious one, see below)?  (Which of them "are even hardware"?)
Why "shouldn't" the results be stored in adjacent memory locations?  And if
you want to be able to take advantage of the knowledge that they're in
registers, you'll probably have to use assembly anyway, so why is this a
requirement for HLLs?  The only "languages" I've seen where any of this
stuff makes any sense are all assembly, and then only when catering to the
specifics of the hardware, which can be different even between models of the
"same" box.

>Another effect of the HLL tyranny is that the operations which are beyond
>the ken of th HLL designers are disappearing from the machines....  For
>example, suppose we want to divide a by b, obtaining an integer result i
>and a remainder c.  I know of no machine with this instruction, and this
>is not that unusual an instruction to demand.  It is cheap in hardware,
>and extremely expensive in software--at least 4 instructions.

I know of at least one machine with this instruction (the LSI-11 with EIS
for integer divide and multiply).  There are probably many more.  As for it
being "cheap in hardware", that's not really true, although if you have the
divide it's virtually free to offer divide with remainder.

I tend to agree with Herman that it would be frequently useful to be able to
do this in a HLL.  However, frequently useful may not mean useful enough, as
readers of comp.arch well know.  HLL designers must justify features they
include, lest their language be named Ada (TM Dept of Redundancy Dept :-).
Also, there's that little problem Herman mentioned in his followup to Doug
Gwyn, that different hardware may not implement desired features, or may
implement them differently enough to make incorporating them into a standard
HLL very very difficult.  Successful HLLs tend to deal with common subsets
of hardware features, which are increasingly small these days what with
diverging architectures.  The newfangled RISC chips and boards are helping
somewhat although even their designers can't seem to agree on what
constitutes an ideal instruction set (aside from the Z/OISC people who think
even Turing machines are too complicated).

Bear in mind that I am not an expert, merely someone with lots of experience
with various machines and HLLs.  In other words, I'm probably wrong, but
probably not by much.

I think that if Herman had been following comp.arch religiously for the last
few years he may not have had to ask any of these questions.  Maybe we need
a list of answers to commonly asked questions... it might help avoid things
like the recent "which is better, 80386 or 68030" posting...

                                                                 Phil Kos
                                                      Information Systems
...!uunet!pyrdc!osiris!phil                    The Johns Hopkins Hospital
                                                            Baltimore, MD

cik@l.cc.purdue.edu (Herman Rubin) (11/30/88)

In article <949@taux01.UUCP<, cjosta@taux01.UUCP (Jonathan Sweedler) writes:
< In article <8938@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
> |In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> |...
> |> suppose we want to
> |>divide a by b, obtaining an integer result i and a remainder c.  I know
> |>of no machine with this instruction, and this is not that unusual an 
> |>instruction to demand.  It is cheap in hardware, and extremely expensive
> |>in software--at least 4 instructions.
> |
> |Although I don't necessarily subscribe to Herman's opinions, R2000 divides
> |actually do this (leave both results in registers).  
< 
< The 32000 series has a DEI (Divide Extended Integer) instruction that
< also does this. 

I do not know if I made it clear in my initial posting, but the problem
arises if the types of a, b, and c are floating.  Not that the quote from
my paper specifically has i an integer.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

dik@cwi.nl (Dik T. Winter) (11/30/88)

In article <21390@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
 > In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
 > >
 > > For example, suppose we want to
 > >divide a by b, obtaining an integer result i and a remainder c.  I know
 > >of no machine with this instruction, and this is not that unusual an 
 > >instruction to demand.  
 > 
 > I can't think of a machine that doesn't. The only machine language
 > manuals on my desk (68000 and HPC16083) show this instruction.
 > If anyone knows of a machine that doesn't leave the remainder in
 > a register, I would be interested in knowing of it. Such a machine
 > either throws the remainder away or uses some algorithm besides
 > shift-and-subtract. (for instance hard-wired.)
 > 
Well, if I remember right, the we32000 does that (the thing in a 3b2 and a
tty5620).  It has a divide, quotient, remainder and modulus instruction
like the ns32000, but unlike the ns32000 no instruction that gives you
both.  This is (I think) an example of a machine for which the instructions
are dictated by a HLL (C in this case), which Herman Rubin so deplores.
On the other hand, this need not be wrong if the operations can be
pipelined.  E.g. if i/j and i%j take 23 cycles each, but the second can
start one cycle after the first you do not lose very much.

In article <1034@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
 > I do not know if I made it clear in my initial posting, but the problem
 > arises if the types of a, b, and c are floating.  Not that the quote from
 > my paper specifically has i an integer.

Indeed you did not make that clear (seems evident :-)).
I know no machines that gives you both quotient and remainder on a floating
point division, stronger still, before the IEEE standard there was (I
believe) no machine that would give you the remainder, only quotients
allowed.  IEEE changed that and many machines now have floating point
division and floating point remainder, but the result is floating point
for both.  This explains also why not both are returned.  To obtain the
(exact) remainder may require a larger number of steps than the
(inexact) quotient.  I agree however that those machines could return the
quotient when doing a remainder operation.  Note that IEEE did *not*
mandate operations that return two results.  So no machine I know does
return two results; even with sine and cosine, which is implemented on
many (as an extension to IEEE?).  The CORDIC algorithm will give you fast
both sine and cosine, but is only efficient in hardware.  But also here,
pipelining may be a deciding point.
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

news@ism780c.isc.com (News system) (11/30/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
...
> suppose we want to
>divide a by b, obtaining an integer result i and a remainder c.  I know
>of no machine with this instruction, and this is not that unusual an
>instruction to demand.  It is cheap in hardware, and extremely expensive
>in software--at least 4 instructions.

Then he clarifies:

>I do not know if I made it clear in my initial posting, but the problem
>arises if the types of a, b, and c are floating.  Not that the quote from
>my paper specifically has i an integer.
>-- 
>Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907

Ok here is one.  Quoting from the IBM/709 Reference Manual (Circa 1959)

FDH - Floating Divide or Halt
    THe c(ac) [means contents of ac register] are divided by c(y). The
    quotent appears in the mq and the remainder appears in the ac. ...

I leave it to Herman to find out why more modern machines leave this out.
Hint: Try profiling your aplication to see what percent of the total process
time is used producing the remainder.

    Marv Rubinstein.

ok@quintus.uucp (Richard A. O'Keefe) (11/30/88)

In article <21390@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
>In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>>For example, suppose we want to
>>divide a by b, obtaining an integer result i and a remainder c.  I know
>>of no machine with this instruction, and this is not that unusual an 
>>instruction to demand.  
>
>I can't think of a machine that doesn't.

Note that Rubin was careful to identify *i* as integer, but NOT the other
variables!  Consider, for example, Burroughs Extended Algol, where
	x DIV y
was defined to be
	SIGN(x/y)*ENTIER(ABS(x/y))
and	x MOD y
was defined to be
	x - (x DIV y)*y

These definitions make perfect sense for any real x, y for which x/y is
defined.  And Burroughs Algol did in fact extend the definitions to real
and double precision x and y as well as integers.  But the B6700 did not
have any instructions with two results that I can recall (discounting
SWAP), so it only meets part of Rubin's requirements.

In fact, if we define SIGN(x) = if x = 0 then 0 else x/ABS(x), you can
extend DIV and MOD to the complex numbers as well, not that it's quite
as useful...

As for machines which lack an *integer* divide with those properties,
the VAX has an EDIV instruction, but it is awkward to set up.  The 4.1BSD
C compiler would generate the equivalent of x-(x/y)*y for x%y.  For those
with SPARCs, a look at the code generated for x%y might be illuminating.

jmacdon@cg-atla.UUCP (Jeff MacDonald) (11/30/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
...
> suppose we want to
>divide a by b, obtaining an integer result i and a remainder c.  I know
>of no machine with this instruction, and this is not that unusual an 
>instruction to demand.  It is cheap in hardware, and extremely expensive
>in software--at least 4 instructions.

Both the 80x86 and 680x0 machines have integer divide instructions
which leave quotient and remainder.  So there.

Think we've beat Herman up enough?
-- 
Jeff MacDonald ((decvax||ulowell)!cg-atla!jmacdon)
c/o Compugraphic Corporation   200-3-5F
200 Ballardvale, Wilmington, MA   01887
(508) 658-0200,  extension 5406

barmar@think.COM (Barry Margolin) (12/01/88)

In article <1989@crete.cs.glasgow.ac.uk> orr@cs.glasgow.ac.uk (Fraser Orr) writes:
>I think that there would be great improvements if compilers were interactive.
...
>Interactive isn't necessarily the best by the way, since it means
>you have to repeat yourself, and the decisions are not documented
>in the program.

How about if the compiler saved the answers in the program text as
explicit ADVICE statements?  The compiler becomes a simple form of
Programmer's Apprentice, helping the programmer determine where
potential optimizations can be specified.  The decisions become
documented, and they don't have to be repeated.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

barmar@think.COM (Barry Margolin) (12/01/88)

In article <771@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>In fact, if we define SIGN(x) = if x = 0 then 0 else x/ABS(x), you can
>extend DIV and MOD to the complex numbers as well, not that it's quite
>as useful...

Which is, in fact, precisely how Common Lisp (which has complex and
rational numbers built in) defines SIGNUM.  This definition computes
the unit vector colinear with the vector in the complex plane that the
original number specifies.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

desnoyer@Apple.COM (Peter Desnoyers) (12/01/88)

In article <1034@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>
>I do not know if I made it clear in my initial posting, but the problem
>arises if the types of a, b, and c are floating.  Not that the quote from
>my paper specifically has i an integer.

This was only implicitly stated, at best, in the original posting by
use of Fortran variable naming conventions. I read the article in
comp.lang.c. Enough said. Now that I understand the original question,
I have a new answer -

 There is no natural divide-type operation mapping (real,real) ->
(real,integer). The operation mentioned was either poorly defined, or
merely consisted of:
  convert a, b to int
  div a,b -> int quotient, remainder
  convert remainder to float

If what you mean is that for any integer or floating-point operation
op(i1,i2,...in,o1,o2...,om) with n inputs 'i' and m outputs 'o', the
full set of 2^^n+m combinations of i1 : {int, float}, ... om : {int,
float} should be supported as hardware instructions, then the idea is
patently absurd. 

				Peter Desnoyers

alverson@decwrl.dec.com (Robert Alverson) (12/01/88)

In fact, the Intel 8087 almost computes

     i, r := a/b, a%b;

The floating point remainder function returns the reminader as the
function result, and the least 3 bits of the integer quotient are
stored in the condition codes.  For some typical uses of the
remainder function (range reduction), this is all you need.

Bob

gandalf@csli.STANFORD.EDU (Juergen Wagner) (12/01/88)

In article <32354@think.UUCP> barmar@kulla.think.com.UUCP (Barry Margolin) writes:
>...
>How about if the compiler saved the answers in the program text as
>explicit ADVICE statements?  The compiler becomes a simple form of
>Programmer's Apprentice, helping the programmer determine where
>potential optimizations can be specified.  The decisions become
>documented, and they don't have to be repeated.

If you have such a sophisticated compiler, that would be more a tool for
automated program generation than a compiler in the classical sense. Such a
tool could also be fed with a pure declarative specification of what you
intend to do, and it should have knowledge about all optimizations for the
architectures you're going to run that program on.

Of course, if you learn something really nifty about optimizing code, you would
like to tell that "compiler" to use the new technique with your code. I guess,
this comes close to what some LISP environments offer, where you can modify
the compiler if you don't like it the way it is.

In such an environment, the programmer should not only have the opportunity to
choose between alternatives the compiler already knows, but also to introduce
new specialized strategies for certain applications. A major problem would
then be consistency of the resulting system...

If I had to use such a tool in the C environment, I would certainly ask for
incremental compilation, too. It is a pain in the neck if you have to recompile
and relink some of your modules every time you are changing a single character
in a function.

Hmm... I guess, I am shifting to the more general question of suitable
software development environments for C or other languages...

'nuff said.

-- 
Juergen Wagner		   			gandalf@csli.stanford.edu
						 wagner@arisia.xerox.com

smryan@garth.UUCP (Steven Ryan) (12/01/88)

>I can't think of a machine that doesn't. The only machine language

6600/Cyber 170, Cyber 205 don't return the remainder, nor is there a mod
instruction.
-- 
                                                   -- s m ryan
+------------------------------------------------------+-----------------------+
| ...and they were all in exactly the same nightmarish | The nerve-agent       |
| state: their faces were wholly burned, their         | causality...will die  |
| eyesockets where hollow, the fluid from their melted | of asphyxiation within|
| eyes had run down their cheeks.                      | a few minutes....     |
+------------------------------------------------------+-----------------------+

ech@poseidon.ATT.COM (Edward C Horvath) (12/01/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> suppose we want to
>divide a by b, obtaining an integer result i and a remainder c.  I know
>of no machine with this instruction,...

In article <8938@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
> Although I don't necessarily subscribe to Herman's opinions, R2000 divides
> actually do this (leave both results in registers).  

From article <949@taux01.UUCP>, by cjosta@taux01.UUCP (Jonathan Sweedler):
> The 32000 series has a DEI (Divide Extended Integer) instruction that
> also does this. 

Hmm, add the MC68K, the PDP-11, and the IBM s/360 et fils.  Put another way,
does anyone have an example of a common processor that DOESN'T give you the
remainder and quotient at the same time?  I don't know the Intel chips, so
perhaps the original author just knows that the *86 divide doesn't do this.

It's interesting, though, how few languages provide such a "two-valued"
functions (all right, I can feel the mathematicians cringing.  So few
languages provide functions with ranges like ZxZ, OK?).  I've seen
implementations of FORTH, by the way, where the expression
	a b /%
for example, divides a by b, leaving a/b and a%b on the stack.  Of
course, if your favorite flavor of forth didn't provide the /% operator
("word") you'd just define it...

=Ned=

ok@quintus.uucp (Richard A. O'Keefe) (12/01/88)

In article <21440@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
>In article <1034@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>>[about the iq, r := x div y, x mod y instruction]

>The operation mentioned was either poorly defined, or merely consisted of:
>  convert a, b to int
>  div a,b -> int quotient, remainder
>  convert remainder to float

Wrong.  The important thing is that the remainder is
	remainder = a - b*INT(a/b)
I am sure that the IEEE floating-point committee would be interested to
learn that it is not a "natural divide-type operation"; this is
precisely the IEEE drem(a,b) function.  Quoting the SunOS manual:
	drem(x, y) returns the remainder r := x - n*y where n is the
	integer nearest the exact value of x/y; moreover if |n-x/y|=1/2
	then n is even.  Consequently the remainder is computed exactly
	and |r| <= |y|/2.  ... drem(x, 0) returns a NaN.

This is obviously a range reduction operator.  Oddly enough, on most
present-day machines, there is a good excuse for _not_ returning the
quotient (n) as well:  with 64-bit floats and 32-bit integers there
is no reason to expect n to be representable as a machine "integer".
Both results would have to be floats.  And in its use as a range
reduction operation, you normally aren't interested in n.

cik@l.cc.purdue.edu (Herman Rubin) (12/01/88)

In article <2061@garth.UUCP>, smryan@garth.UUCP (Steven Ryan) writes:
> >I can't think of a machine that doesn't. The only machine language
> 
> 6600/Cyber 170, Cyber 205 don't return the remainder, nor is there a mod
> instruction.

On both of these machines, the division must be done in floating point and
the quotient converted to integer, etc.  On the CRAYs, division does not even
exist--a reciprocal must be computed, and then a floating point quotient, etc.
Since the machines are scoreboarded, a mod operation not tying up the hardware
should be spread over some 50 instructions for the 6600 or CRAY and 75
instructions on the 205 to be efficient.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

seanf@sco.COM (Sean Fagan) (12/01/88)

In article <6547@june.cs.washington.edu> pardo@cs.washington.edu (David Keppel) writes:
-->>>cik@l.cc.purdue.edu (Herman Rubin) writes:
-->>>>divide a by b, obtaining an integer result i and a remainder c.
-->>>>I know of no machine with this instruction.  It is cheap in hardware,
-->>>>and extremely expensive in software--at least 4 instructions.
-->>mash@mips.COM (John Mashey) writes:
-->cjosta@taux01.UUCP (Jonathan Sweedler) writes:
-->The rather obscure but still-available VAX series of computers made by
-->a small Nashua, New Hampshire company (Digital Equipment Corporation)
-->introduced the EDIV instruction recently, about 1978.  I heard a rumor
-->that (at least in some early/small VAXen) it was faster to do two
-->separate instructions than to use EDIV, although I suspect that this
-->was an artifact of those implementations.

What the hell.  Here is a list of machines that I've worked on that had this
type of instruction:

	PDP-11, 8086, 80286, 80386, VAX, NS32k, 68k, WE32100.
	I wouldn't swear to it, but I think the Elxsi might also have this.

On reflection, that is *every* machine I've worked on (at least, in assembly
language), except for the following:

	Z80, 6502, Cyber 170-state machines.

The Z80 and 6502 don't have divide, so I think we can ignore them.

This sequence, by the way, is *really* painful on a Cyber (and, I would
expect, a Cray), because the Cyber has no integer divide.  The mnemonic for
an integer divide actually converts into a floating point divide, after
trashing a couple of registers for you, and then packs the result back into
an integer.  As a result, when I wanted a modulus, I ended up writing *many*
instructions (but they overlapped!), using a, *ahem*, very interesting idea
of register allocation.
-- 
Sean Eric Fagan  | "Engineering without management is *ART*"
seanf@sco.UUCP   |     Jeff Johnson (jeffj@sco)
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

cik@l.cc.purdue.edu (Herman Rubin) (12/01/88)

In article <2416@osiris.UUCP>, consult@osiris.UUCP (Unix Consultation Mailbox ) writes:
> I've trimmed all the language-specific groups out of the posting list for
> this followup and directed followups to just comp.lang.misc and comp.arch.
> (I feel the issues being discussed are too general to make it useful to post
> outside of comp.lang.misc, with the exception of comp.arch, where many of
> the points being raised will look very familiar.)
> 
> 
> In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> >A trivial example [of useful operations unsupported in HLLs] is having a
> >list of results to the left of the replacement operator.  I do not mean a
> >vector or a struct; the items may be of different types, and should not be
> >stored in adjacent memory locations.  Most of the time, they should end up
> >in registers.  I have not seen a language which is claimed to produce even
> >reasonably efficient code with this property.  Some of these operations are
> >even hardware.
> 
> I'll have to ask... what kind of things are you talking about here (beyond
> the really obvious one, see below)?  (Which of them "are even hardware"?)
> Why "shouldn't" the results be stored in adjacent memory locations?  And if
> you want to be able to take advantage of the knowledge that they're in
> registers, you'll probably have to use assembly anyway, so why is this a
> requirement for HLLs?  The only "languages" I've seen where any of this
> stuff makes any sense are all assembly, and then only when catering to the
> specifics of the hardware, which can be different even between models of the
> "same" box.

Besides the quotient and remainder, there are operations like unpacking
floating point numbers into exponent and mantissa, which are hardware on
some machines, and should be on others.  I know of machines for which the
hardware instuction to do this even puts the results in different TYPES of
registers.  A few HLL implementations attempt to make use of registers,
although most of them do things badly.

Besides having this situation for hardware operations, it should also be
use for software.  Take, for example, the C function frexp.  The syntax
is
	y = frexp(x,&n)

instead of the more reasonable

	y,n = frexp(x)

There is no reason why the compiler must store the integer n in some location.
But even if this is to be done, the second notation is more appropriate, as it
clearly specifies that the operation returns TWO values.

A commonly used procedure in numerical mathematics is interpolation.  If one
is using fixed interval interpolation, one has the operation

	n,x = m*y

where n is the integer part of the product and x the fraction.  This is 
hardware on the VAX, and probably should be done in fixed-point arithmetic
on most other machines.  (What HLLs support fixed-point "real" arithmetic?)
This is especially the case if m is a power of 2, in which case the multiply
instruction should usually not be used.  But in any case, the pair n,x is the
result of the operation.

Also, one frequently needs a function and its derivative.  The usual bad 
practice is to use two calls.  In most cases, a common calculation is more
efficient.  This would be much more common if the list on the left of =
notation were in use.  Mathematicians have been using functions returning
a list for centuries.  How come the language gurus do not understand this
yet?

> >Another effect of the HLL tyranny is that the operations which are beyond
> >the ken of th HLL designers are disappearing from the machines....  For
> >example, suppose we want to divide a by b, obtaining an integer result i
> >and a remainder c.  I know of no machine with this instruction, and this
> >is not that unusual an instruction to demand.  It is cheap in hardware,
> >and extremely expensive in software--at least 4 instructions.
> 
> I know of at least one machine with this instruction (the LSI-11 with EIS
> for integer divide and multiply).  There are probably many more.  As for it
> being "cheap in hardware", that's not really true, although if you have the
> divide it's virtually free to offer divide with remainder.

I do not know if I made this clear, but a, b, and c are floating.

> I tend to agree with Herman that it would be frequently useful to be able to
> do this in a HLL.  However, frequently useful may not mean useful enough, as
> readers of comp.arch well know.  HLL designers must justify features they
> include, lest their language be named Ada (TM Dept of Redundancy Dept :-).
> Also, there's that little problem Herman mentioned in his followup to Doug
> Gwyn, that different hardware may not implement desired features, or may
> implement them differently enough to make incorporating them into a standard
> HLL very very difficult.  Successful HLLs tend to deal with common subsets
> of hardware features, which are increasingly small these days what with
> diverging architectures.  The newfangled RISC chips and boards are helping
> somewhat although even their designers can't seem to agree on what
> constitutes an ideal instruction set (aside from the Z/OISC people who think
> even Turing machines are too complicated).
> 
> Bear in mind that I am not an expert, merely someone with lots of experience
> with various machines and HLLs.  In other words, I'm probably wrong, but
> probably not by much.
> 
> I think that if Herman had been following comp.arch religiously for the last
> few years he may not have had to ask any of these questions.  Maybe we need
> a list of answers to commonly asked questions... it might help avoid things
> like the recent "which is better, 80386 or 68030" posting...

A check of comp.arch would show that I have been posting to comp.arch for the
past few years.  These questions have not been answered.  I also have been
posting to several of the comp.lang groups.

The fact that operations are not available on all machines does not necessarily
mean that they should not be in the languages.  I recently had to use the
unpack described above from double to long long on a VAX.  It SHOULD have 
been possible to add this to the language as a macro without having to go
through gobs of overhead, as well as the other operations used not present
in any HLL.  The algorithm is statable in an augmented HLL and the machine
dependencies are handled by having a few machine parameters available in
a #include file.  I have a fair amount of experience with machine operations
on different machines, and I find it easier to think in terms of them and 
macros than to use some of the extremely clumsy HLL constructs.  But by a
macro processor, I mean something like what a compiler goes through in 
translating 
		x = y - z

into machine code.

The problem with assembler language is that the above would have to be
written

some-absurd-mnemonic	some permutation of x,y,z

in most assemblers.  I do know of exceptions.  Now in the early days, when
it was hard for computers to handle character strings, the above was needed.
What is a compiler but a typed, overloaded operator, macro converter with a
few gadgets added?  Some of these gadgets can be very useful, but augmenting
the first part to allow more types and more overloading (C++ does this, but
clumsily), user-defined operators, and more flexibility would be far more 
useful.  Also, people who understand the use of hardware-type operations
can suggest them.  The quotient and remainder in floating-point should be
based on simialar hardware to the integer version with an internal trap if
the divisor has larger magnitude than the divident to avoid loss of accuracy.
I have given other examples where hardware was much better than software.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

cik@l.cc.purdue.edu (Herman Rubin) (12/01/88)

In article <7740@boring.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes:
> In article <21390@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
< < In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
< < >
< < > For example, suppose we want to
< < >divide a by b, obtaining an integer result i and a remainder c.  I know
< < >of no machine with this instruction, and this is not that unusual an 
< < >instruction to demand.  
< < 
< < I can't think of a machine that doesn't. The only machine language
< < manuals on my desk (68000 and HPC16083) show this instruction.
< < If anyone knows of a machine that doesn't leave the remainder in
< < a register, I would be interested in knowing of it. Such a machine
< < either throws the remainder away or uses some algorithm besides
< < shift-and-subtract. (for instance hard-wired.)

The CDC6000s and its CYBER successors and the CYBER 205 (a totally different
architecture) do not even have integer division.  The CRAYs do not even have
division.  Certainly the VAX and the PYRAMID only have the above properties
if an EDIV rather than a DIV is used for division, and that had a few more
problems than one would expect because they do not use sign-magnitude.  It
takes two instructions even to do the sign extension.

> Well, if I remember right, the we32000 does that (the thing in a 3b2 and a
> tty5620).  It has a divide, quotient, remainder and modulus instruction
> like the ns32000, but unlike the ns32000 no instruction that gives you
> both.  This is (I think) an example of a machine for which the instructions
> are dictated by a HLL (C in this case), which Herman Rubin so deplores.
> On the other hand, this need not be wrong if the operations can be
> pipelined.  E.g. if i/j and i%j take 23 cycles each, but the second can
> start one cycle after the first you do not lose very much.

Can division be pipelined?  In scalar mode on the CYBER 205, division is not
subject for pipelining.  I believe that this is the case because too much of
the dividend must be retained throughout.

> In article <1034@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
< < I do not know if I made it clear in my initial posting, but the problem
< < arises if the types of a, b, and c are floating.  Not that the quote from
< < my paper specifically has i an integer.
> 
> Indeed you did not make that clear (seems evident :-)).
> I know no machines that gives you both quotient and remainder on a floating
> point division, stronger still, before the IEEE standard there was (I
> believe) no machine that would give you the remainder, only quotients
> allowed.  IEEE changed that and many machines now have floating point
> division and floating point remainder, but the result is floating point
> for both.  This explains also why not both are returned.  To obtain the
> (exact) remainder may require a larger number of steps than the
> (inexact) quotient.  I agree however that those machines could return the
> quotient when doing a remainder operation.  Note that IEEE did *not*
> mandate operations that return two results.  So no machine I know does
> return two results; even with sine and cosine, which is implemented on
> many (as an extension to IEEE?).  The CORDIC algorithm will give you fast
> both sine and cosine, but is only efficient in hardware.  But also here,
> pipelining may be a deciding point.
> -- 
> dik t. winter, cwi, amsterdam, nederland
> INTERNET   : dik@cwi.nl
> BITNET/EARN: dik@mcvax

The implementation of quotient and remainder should be a simple modification
of the fixed point algorithm.  Those machines which have floating point
remainder in hardware presumably do this.

Even if CORDIC is not used, returning the sine and cosine would be faster in
software than two function calls.  The corresponding function is in some of
the libraries I know, but I have not seen any other implementation than the
two calls.  This also holds for many other reasonable situations in software.
In software, pipelining is certainly not possible except in vector situations.
The lack of inclusion of a list to the left of the = in a replacement statement
should have been obvious to all HLL designers from day 1; mathematicians have
used functions returning a list for centuries.

-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

cik@l.cc.purdue.edu (Herman Rubin) (12/01/88)

In article <19848@ism780c.isc.com>, news@ism780c.isc.com (News system) writes:
> In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> ...

 
> Then he clarifies:
> 
> >I do not know if I made it clear in my initial posting, but the problem
> >arises if the types of a, b, and c are floating.  Not that the quote from
> >my paper specifically has i an integer.

 
> Ok here is one.  Quoting from the IBM/709 Reference Manual (Circa 1959)
> 
> FDH - Floating Divide or Halt
>     THe c(ac) [means contents of ac register] are divided by c(y). The
>     quotent appears in the mq and the remainder appears in the ac. ...

The format of this allows the multiple precison division of a single precision
floating point number by a single precision number, and was probably used in
computing double precision quotients, but it has nothing to do with the
question I raised.  The quotient is a truncated floating point quotient,
and the remainder is the remainder from this operation.  Even denormalization,
which was possible on that machine, could not get the quotient as an integer.  
The easiest way to do it on the 709 was first to check if the quotient was 0,
in which case the result is clear.  Otherwise, do the appropriate unpacking
and shifting, do an integer divide, and repack the integer remainder.  This
was more efficient on the 709 than on current machines, because division was
relatively slower and comparisons and transfers were FAST instructions in those
days.  (No, they have not gotten slower; the others have gotten relatively much
faster.)
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

khb%chiba@Sun.COM (Keith Bierman - Sun Tactical Engineering) (12/02/88)

In article <606@poseidon.ATT.COM> ech@poseidon.ATT.COM (Edward C Horvath) writes:
>
>It's interesting, though, how few languages provide such a "two-valued"
>functions (all right, I can feel the mathematicians cringing.  So few
>languages provide functions with ranges like ZxZ, OK?).  I've seen
>implementations of FORTH, by the way, where the expression
>	a b /%
>for example, divides a by b, leaving a/b and a%b on the stack.  Of
>course, if your favorite flavor of forth didn't provide the /% operator
>("word") you'd just define it...
>

F88 does (define a new data type, define a function which returns it,
possibly overload some operator).


Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus

ecphssrw@solaria.csun.edu (Stephen Walton) (12/02/88)

Can this discussion be ended, or at least moved to ONLY
comp.lang.misc?  This is the second extended language war in the last
few months (the last one was C vs. Fortran), and these two have
convinced me to auto-KILL all comp.lang.fortran mesages cross-posted
to comp.lang.c. 
-- 
Stephen Walton, Dept. of Physics & Astronomy, Cal State Univ. Northridge
RCKG01M@CALSTATE.BITNET       ecphssrw@afws.csun.edu
swalton@solar.stanford.edu    ...!csun!afws.csun.edu!bcphssrw

khb%chiba@Sun.COM (Keith Bierman - Sun Tactical Engineering) (12/02/88)

In article <1039@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>
>Can division be pipelined?  In scalar mode on the CYBER 205, division is not
>subject for pipelining.  I believe that this is the case because too much of
>the dividend must be retained throughout.

The cydra 5 had it pipelined, but the pipe was interlocked during
operation. This was said to be due to a lack of real estate (viz could
have been fully pipelined if (a) not for gradual underflow (the cydra
5 was a fully ieee compliant machine), (b) there was more space it
could have been done anyway.

>
>Even if CORDIC is not used, returning the sine and cosine would be faster in
>software than two function calls.  The corresponding function is in some of
>the libraries I know, but I have not seen any other implementation than the
>two calls.

DEC's famous math$sincos does this. SunFORTRAN takes two calls (in the
user code) and turns them into one call to the sun sincos routine.
Unless I am mistaken this is a very old and well known optimization
(which is typically invisible to the user).

On many machines the cost of both is only a couple of % more than the
cost of one.

Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus

consult@osiris.UUCP (Unix Consultation Mailbox ) (12/02/88)

In article <1034@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>I do not know if I made it clear in my initial posting, but the problem
>arises if the types of a, b, and c are floating.  Not that the quote from
>my paper specifically has i an integer.

And I, ex-FORTRAN programmer that I am, should have picked up on this
immediately... :-)

                                                                 Phil Kos
                                                      Information Systems
...!uunet!pyrdc!osiris!phil                    The Johns Hopkins Hospital
                                                            Baltimore, MD

ok@quintus.uucp (Richard A. O'Keefe) (12/02/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> suppose we want to
>divide a by b, obtaining an integer result i and a remainder c.  I know
>of no machine with this instruction,...

Various people have listed machines with integer quotient-and-remainder
instructions.  I seldom agree with Herman Rubin, but he is quite capable
of reading an instruction set manual.  At any rate, he is better at that
than most of the repliers were at reading his message.  He was asking for
the equivalent of
	double a, b, r;
	long int i;
	r = drem(a, b);
	i = (int)( (a-r)/b );

Another poster says
>I've implementations of FORTH, by the way, where the expression
>	a b /%
>for example, divides a by b, leaving a/b and a%b on the stack.
	
Pop-2 has a//b which does much the same thing.

Common Lisp has
	(floor number divisor)
plus ceiling, truncate, and round.  All four functions return TWO
values: the quotient and the remainder.  E.g.
	(multiple-value-setq (Q R) (truncate N D))
In particular, Common Lisp's (round - -) function, if given floating-
point arguments, is the function that Rubin wants a single instruction for.
(Well, that's drem() anyway.  He might want one of the other three.)

I don't know, but I wouldn't be surprised if the Symbolics machines had
an instruction for this.

ok@quintus.uucp (Richard A. O'Keefe) (12/02/88)

In article <1037@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>> In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>> >A trivial example [of useful operations unsupported in HLLs] is having a
>> >list of results to the left of the replacement operator.  I do not mean a
>> >vector or a struct; the items may be of different types, and should not be
>> >stored in adjacent memory locations.

>Besides the quotient and remainder, there are operations like unpacking
>floating point numbers into exponent and mantissa, which are hardware on
>some machines, and should be on others.

The problem of functions returning a single result has long been a thorn
in the side of Lisp programmers (who didn't have VAR parameters either).
The Lisp community finally fixed this.  See "multiple-" in the index of
"Common Lisp the Language".  As a specific example of this:

	(multiple-value-setq (Significand Exponent Sign)
	    (decode-float SomeFloatingPointExpression))

[see P218 of CLtL.]  The fact that the result variables are in a list
has no non-syntactic significance.  A Common Lisp implementation can support
multiple values any way the implementor pleases (a list or array of results
*might* be built, or it might *not*) but no structure is visible to the
programmer.  (For example, if you just write
	(decode-float SomeFloatingPointExpression)
you will not get a list of results, just the first one.)  If the compiler
knows how to generate inline code for this particular operation, it is
quite at liberty to do so.

Prolog does not distinguish in any way between "inputs" and "outputs".
If a version of Prolog had this particular operation, it would be
	decode_float(ANumber, Significand, Exponent, Sign)
and you could use this to test for zero the hard way:
	decode_float(ANumber, 0.0, _, _)

All functions in MESA *look* as though they receive a single argument
(a record) and return a single result (another record).  It's a long
time since I read a MESA manual, so I've probably got this wrong, but
the function in question would look like
	function decode_float[x: real]
	returns [significand: real, exponent: integer, sign: real];
	begin ... end;

	... [m, e, s] := decode_float[27.3];
Although the syntax of a record is used, this has no more significance
than the list of variables in Common Lisp.  The implementation *might*
construct a record, or it might not.  (VAR parameters might be used
instead.)

I don't understand why ADA didn't pick this up from MESA.  The fact
that procedures and record constructors both support keyword association
is automatic in MESA, but requires separate statement in ADA.  And
MESA multiple results also support keyword association.

mch@ukc.ac.uk (Martin Howe) (12/02/88)

In article <1039@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>In article <7740@boring.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes:
>> In article <21390@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
>< < In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>> mandate operations that return two results.  So no machine I know does
>> return two results; even with sine and cosine, which is implemented on
>> many (as an extension to IEEE?).

What exactly, then is the 80387 FSINCOS instruction, which takes ST(0) and
returns it's sine and cosine in ST(0) and ST(1), ready for division (tangent) or
whatever else. In what way does it not return two results ? For that matter
how are the flags set if both results return the same exception ?

Id be interested to know, if anyone will post or mail ...

- Martin

-- 
Martin C. Howe (mch@ukc.ac.uk)  | "See them coming, Atlantis will rise,
LAN/MICRO Group, Computing Lab. |  it's absolute lunacy, they seek to despise!"
The University, CANTERBURY, UK. |              
Tel. (+44 227) 764000 x 7592    |  - Jon Cyriss

cik@l.cc.purdue.edu (Herman Rubin) (12/02/88)

In article <787@quintus.UUCP>, ok@quintus.uucp (Richard A. O'Keefe) writes:
> In article <21440@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
> >In article <1034@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:

		.............................

> Wrong.  The important thing is that the remainder is
> 	remainder = a - b*INT(a/b)
> I am sure that the IEEE floating-point committee would be interested to
> learn that it is not a "natural divide-type operation"; this is
> precisely the IEEE drem(a,b) function.  Quoting the SunOS manual:
> 	drem(x, y) returns the remainder r := x - n*y where n is the
> 	integer nearest the exact value of x/y; moreover if |n-x/y|=1/2
> 	then n is even.  Consequently the remainder is computed exactly
> 	and |r| <= |y|/2.  ... drem(x, 0) returns a NaN.
> 
> This is obviously a range reduction operator.  Oddly enough, on most
> present-day machines, there is a good excuse for _not_ returning the
> quotient (n) as well:  with 64-bit floats and 32-bit integers there
> is no reason to expect n to be representable as a machine "integer".
> Both results would have to be floats.  And in its use as a range
> reduction operation, you normally aren't interested in n.

I disagree.  In the varied situations I have wanted this, one of the 
following happens.

	I know the result is small, so that sometimes even 8 bits is enough.

	I only care about the last few bits.

	I want an overflow trap if it is too large.

BTW, I think there should be alternatives about the range of the remainder.
There are situations in which I want the remainder forced positive.  This is
much cheaper in hardware than in software.

Thus we have several instructions wanted, or one can look at it as one instruc-
tion with a "tag field."  Either way, it should be done.  And the integer is
wanted in many situations.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

einari@rhi.hi.is (Einar Indridason) (12/03/88)

In article <606@poseidon.ATT.COM> ech@poseidon.ATT.COM (Edward C Horvath) writes:
>
>Hmm, add the MC68K, the PDP-11, and the IBM s/360 et fils.  Put another way,
>does anyone have an example of a common processor that DOESN'T give you the
>remainder and quotient at the same time?  I don't know the Intel chips, so
>perhaps the original author just knows that the *86 divide doesn't do this.

Are we talking about the old 8-bit processors as well, or are we just talking
about the "new" processors.  (Define it for your self :-)
If we are talking about those 8-bitters then the MOSTEK-6502 does not contain
dividing or multiplication instructions.  And if memory serves me right, neither
did the Z-80.

Now, one question: does the ARM chip in Acorn Archimedes include multiply and/or
divide instructions.

To quote Alfred E. Neuman: "What! Me worry????"

Internet:	einari@rhi.hi.is
UUCP:		..!mcvax!hafro!rhi!einari

-- 
To quote Alfred E. Neuman: "What! Me worry????"

Internet:	einari@rhi.hi.is
UUCP:		..!mcvax!hafro!rhi!einari

consult@osiris.UUCP (Unix Consultation Mailbox ) (12/03/88)

In article <1037@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>[lots of stuff in reply to my reply to his reply to Doug Gwyn's reply to
>his original posting]

I'd like to thank Herman publicly for taking the time to gently correct my
misreading of his postings.  He's certainly asking for some unusual
programming constructs, and I'm more and more convinced that he's got a
good case.  I'm also more and more convinced that he's describing a brand
new HLL which would make mathematical programming a horse of a very
different color for folks like him.  (The usual comments about the inherent
unsuitability of digital computers for complicated mathematical computation
rear their ugly head but I'm going to ignore them for the nonce.)

Maybe Herman should get together with an experience language designer or
two and try to come up with something.  I for one would be interested in
hearing how it turned out.

I guess my current job (providing tools to support the development of
reliable, regular information systems) has really blinded me to the
differing needs of other computer users.. I'll need to consider other
angles the next time I put my foot into one of these discussions...

                                                                 Phil Kos
                                                      Information Systems
...!uunet!pyrdc!osiris!phil                    The Johns Hopkins Hospital
                                                            Baltimore, MD

dave@lethe.UUCP (David Collier-Brown) (12/03/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
| Another effect of the HLL tyranny is that the operations which are beyond
| the ken of th HLL designers are disappearing from the machines. 

From article <8938@winchester.mips.COM>, by mash@mips.COM (John Mashey):
|  Although I don't necessarily subscribe to Herman's opinions, R2000 divides
|  actually do this (leave both results in registers).  Although I shouldn't
|  have been, I was surprised to find that the following C code, compiled
|  unoptimized (optimized, it essentially disappears, of course):
|  main() {
|  	register i,j,k;
|  	i = j / 7;
|  	k = j % 7;
|  }
|  generates one divide instruction to get both of the results.

  Actually its isn't too surprising: one of the primitives I want
at code-generator-generation time is an n-tuple consisting of:
  instruction name
  input register(s) and content constraints (aka types)
  output register(s) ditto
  pattern to generate instruction
  size
  time (or other figure of merit)

  If I have these, I claim I can write a code generator that will do
a "good" job [see note below].  If I preserve these, however, I
also get the ability to do usefull, if simple, improvements in the
code generated to do setup/teardown before and after the operation.  If I
plan on doing so, I get the ability to take a new instruction,
specified by the programmer, and provide the same kind of results.
  The simple case is old hat [press n if you've seen this one before]
Define a system call in Ada by defining the representation of a trap
word, provide the types of input and output by declaring it in
functional/procedural form and the register constraints by pragmas.
Then look at the code generated (about 3 years ago on a small
honeybun) and note that no redundant register-register moves were
generated on either call or return.


--dave (this also answers the question of rarely-used 
        instructions being skimped on) c-b

note: No, I **can't** write you a code-generator-generator. I'm not
      a good enough academic.  But I've met people who have... I can
      only write code-generators themselves.

seanf@sco.COM (Sean Fagan) (12/04/88)

In article <1039@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>
>Can division be pipelined?  In scalar mode on the CYBER 205, division is not
>subject for pipelining.  I believe that this is the case because too much of
>the dividend must be retained throughout.

I don't know how well it can do it theoretically, but I do know that, on a
Cyber 170/760, multiplication is pipelined so that it can be doing a maximum
of three mults (first step, intermediate steps, last step).  Note that the
bottleneck (on a five-cycle instruction, wow) is in the intermediate steps.
This is why you try to multiply using registers that are otherwise unused at
the time 8-).

Division, on the other hand, is *not* pipelined.  You can only do one
division at a time (why he didn't put in more than one division unit is
something I still don't understand), and it takes a *long* time (70 or more
cycles).

-- 
Sean Eric Fagan  | "Engineering without management is *ART*"
seanf@sco.UUCP   |     Jeff Johnson (jeffj@sco)
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

cik@l.cc.purdue.edu (Herman Rubin) (12/04/88)

In article <6089@eagle.ukc.ac.uk>, mch@ukc.ac.uk (Martin Howe) writes:
> In article <1039@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> >In article <7740@boring.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes:
> >> In article <21390@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
> >< < In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> >> mandate operations that return two results.  So no machine I know does
> >> return two results; even with sine and cosine, which is implemented on
> >> many (as an extension to IEEE?).
> 
> What exactly, then is the 80387 FSINCOS instruction, which takes ST(0) and
> returns it's sine and cosine in ST(0) and ST(1), ready for division (tangent) or
> whatever else. In what way does it not return two results ? For that matter
> how are the flags set if both results return the same exception ?

I did not know of the 80387 instruction.  But how will programmers use it?
The languages still do not allow

	x, y = fsincos(z);

This being comp.arch, I must also state that I consider it unfortunate that
this type of operation requires that specific registers must be used; in other
situations, this is certainly considered a bad thing.  Now I do not always
agree with the gurus, but this is one case where I do.  Unless the hardware
actually uses the registers for the computation, such as the accumulator and
mq registers in the early von Neumann machines, the registers for a hardware
operation should be flexible.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

csc21824@unsvax.UUCP (Jay) (12/05/88)

In article <622@krafla.rhi.hi.is> einari@krafla.UUCP (Einar Indridason) writes:
>In article <606@poseidon.ATT.COM> ech@poseidon.ATT.COM (Edward C Horvath) writes:
>>does anyone have an example of a common processor that DOESN'T give you the
>>remainder and quotient at the same time?  I don't know the Intel chips, so
>>perhaps the original author just knows that the *86 divide doesn't do this.

        The 8086 divide doesn't?  Then I'd like to know why my integer-to
string conversion routine works?  It repeatedly divides the number by 10,
saving the remainder
---------------------------------------------------------------------

        This space for Rent                  Eric J. Schwertfeger
                                             CIS  [72657,1166]
                                        or   csc21824%unsvax.uns.edu
Disclaimer:These are just the mad ramblings of a man forced to use
           vi once too often, and as such, should be ignored.

baum@Apple.COM (Allen J. Baum) (12/06/88)

[]
>In article <1842@scolex> seanf@sco.COM (Sean Fagan) writes:
>In article <1039@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>>
>>Can division be pipelined?  In scalar mode on the CYBER 205, division is not
>>subject for pipelining.  I believe that this is the case because too much of
>>the dividend must be retained throughout.

Division can be pipelined, allthough I suspect that the reason it isn't is
that you may as well just duplicate the divider (I'm talking integer division
here).

There is at least one example of a systolic array integer divider:
see "Integer Division in Linear Time with Bounded Fan-in by C. Purdy
and G. Purdy, IEEE Trans. on Computers, V. C-36#5, May 1987, pp640-644.

Finally, there were some claims of a Newton iteration divide step
operation on the RPM-40. Those folks are still on the net, but I
guess decided they weren't allowed to talk about it, so no details emerged.

--
		  baum@apple.com		(408)974-3385
{decwrl,hplabs}!amdahl!apple!baum

guy@auspex.UUCP (Guy Harris) (12/06/88)

>I did not know of the 80387 instruction.  But how will programmers use it?
>The languages still do not allow
>
>	x, y = fsincos(z);

Yes, but they allow

	X = SIN(Z)
	Y = COS(Z)

and I think there are FORTRAN compilers (hence the CAPITAL LETTERS :-))
that will make a single call to the "sin/cos" function, or generate a
single "sin/cos" instruction, and store the two results into X and Y. 
This could conceivably be done with C as well; consider:

math.h:

	#define	sin(x)		_builtin_sin(x)
	#define	cos(x)		_builtin_cos(x)

and a compiler that knows about "_builtin_sin" and "_builtin_cos".

There's no particular need, in general, for programming language
constructs to directly match hardware constructs in order for the
hardware constructs to be used....

jeffa@hpmwtd.HP.COM (12/07/88)

Mathematicians have been using cis(x) = cos(x) + i sin(x) for ages.  Ergo,
fsincos can be accessed as a complex function.  (Or rotation matrix, or ...)

peter@stca77.stc.oz (Peter Jeremy) (12/07/88)

In article <1047@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
 [ 80387 FSINCOS instruction returns results in ST(0) and ST(1) ]
>This being comp.arch, I must also state that I consider it unfortunate that
>this type of operation requires that specific registers must be used; in other
>situations, this is certainly considered a bad thing.

This instruction is typical of Intel processors - they all have highly non-
orthogonal instruction sets, with specialised instructions containing implicit
register references.  Intel promote this as "a good thing" and have register
names to indicate their uses.

Intel processors in general are a real pain to use.  Even when you go to the
80386 and finally escape from the 64k limit, you still can't nest loops, or
use loops with shifts or string instructions without shuffling registers.
(Not to mention having to remember all the side effects whilst writing code).
-- 
Peter Jeremy (VK2PJ)         peter@stca77.stc.oz
Alcatel-STC Australia        ...!uunet!stca77.stc.oz!peter
41 Mandible St               peter%stca77.stc.oz@uunet.UU.NET
ALEXANDRIA  NSW  2015

rmarks@KSP.Unisys.COM (Richard Marks) (12/09/88)

In article <670001@hpmwjaa.HP.COM> jeffa@hpmwtd.HP.COM writes:
>Mathematicians have been using cis(x) = cos(x) + i sin(x) for ages.  Ergo,
>fsincos can be accessed as a complex function.

or Lisp can return an arbitrary number of multiple values.

aarons@cvaxa.sussex.ac.uk (Aaron Sloman) (12/18/88)

> From: ok@quintus.uucp (Richard A. O'Keefe)
> Message-ID: <764@quintus.UUCP>
> Date: 29 Nov 88 11:24:15 GMT

> In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> > A trivial example is having a list of results to the left of
> > the replacement operator.  I do not mean a vector or a struct; the items
> > may be of different types, and should not be stored in adjacent memory
> > locations.

> If you mean things like being able to write
>		x, i, s := x/(1.0,+x), i-2, s || "x" || s
> CPL had it, BCPL has it (in sequential form), MESA has it (you have to
> put record constructor functions around both sides, but that's merely
> notational), and it has appeared in several experimental languages.
> Common Lisp supports it under the name PSETQ:
>		(psetq x (/ x (+ 1.0 x))
>		       i (- i 2)
>		       s (however-you-concatenate-strings s "x" s))
> I have never understood why it was omitted from Pascal, Modula, ADA:
>		x, y := y, x
> is the clearest way to exchange two variables I know.  I once put
> this construct into the compiler for an Algol-like language; it was
> easy, and if the compiler had done any optimisation it would still
> have been easy.

-----------------------
I think Algol68 also has it ("collateral assignment"??).

I have not read the discussion that preceded all this, but
just for information, if you want to do multiple assignments in
Pop-11 (where assignments go left to right) you can do things like

    [a list], 'a string', "word", 99 -> x1 -> x2 -> x3 -> x4;

Because Pop-11 uses a stack this is equivalent to
    99         -> x1;
    "word"     -> x2;
    'a string' -> x3;
    [a list]   -> x4;
etc.

Swapping the values of two variables in Pop-11 is done thus:

    x, y -> x -> y

I.e. you don't need the usual "temporary" variable - just stack
the value of x, stack the value of y, then pop the top of the
stack into x, then pop into y. I assume Forth can do this too.

It would be trivial in Pop-11 to write a macro that did a multiple
assignment in the "natural" order, i.e. not in the above stack-oriented
last-in first-out order.

Not quite so trivial, but also possible (and more efficient), is
defining a new "syntax word" to do it. The difference is that a macro
reads in text then rearranges the compiler input stream, whereas a
syntax word reads in some of the input stream and then plants Poplog
virtual machine instructions to be compiled. Also, defining syntax
words helps the parser give more helpful error messages if you make
syntactic mistakes.

E.g. to define ">->" as a new Pop-11 syntax operator to do multiple
assignment, with a precedence of 11, i.e. equal to that of the built-in
assignment symbol "->", you could do something like the following:

    define syntax 11 >-> ;
        ;;; Define local recursive sub-procedure to read in comma-separated
        ;;; variables up to semi-colon and plant assignments to them
        ;;; in reverse order (on exit from recursion).
        define plant_assignments();
            lvars varname;
            if nextitem() /== ";" then
                ;;; Read the next item in input stream
                itemread() -> varname;
                ;;; Do all the rest up to the semi-colon
                plant_assignments();
                unless varname == "," then
                    ;;; plant VM code to assign top of stack to variable
                    sysPOP(varname);
                endunless;
            endif;
        enddefine;

        ;;; Now the main body of the procedure for ">->"
        ;;; First some mumbo jumbo to plant code for last Pop-11 operation
        pop_expr_inst(pop_expr_item);
        pop11_FLUSHED -> pop_expr_inst;

        ;;; Now read in variable names and plant assignments
        plant_assignments();
    enddefine;

We can test the above definition as follows
    ;;; Declare some variables
    vars a, b, c;

    ;;; Try a multiple assignment to a, b, c.
    11, 22, 33 >-> a, b, c;

    ;;; Make a list containing values of a, b, and c, and check that
    ;;; the order is right when it is printed out
    [^a ^b ^c] =>
    ** [11 22 33]

Ideally the definition of ">->" would include extra instructions
to do more syntax checking (e.g. complaining if a syntax word
is read in before ";", or if a comma is missing).


Aaron Sloman,
School of Cognitive and Computing Sciences,
Univ of Sussex, Brighton, BN1 9QN, England
    ARPANET : aarons%uk.ac.sussex.cvaxa@nss.cs.ucl.ac.uk
    JANET     aarons@cvaxa.sussex.ac.uk
    BITNET:   aarons%uk.ac.sussex.cvaxa@uk.ac
    UUCP:     ...mcvax!ukc!cvaxa!aarons or aarons@cvaxa.uucp