[comp.lang.fortran] Assembly or ....

861087p@aucs.UUCP (A N D R E A S) (11/22/88)

     Since this are Newsgroups that mainly programmers write I
would like to ask them if it really worth it to spend time to
learn assembly language.I know that when you program in 
assembly you loose the portability but you gain in speed.
I've been told also that there are some low-level operations
that you just can't do in C or any other high level language.
Apart from this most of the commercial programs are written
in assembly(Turbo Pascal etc.).Device drivers are also written in assemly.
So why is everybody against assembly when many programs available are
written in assembly?
   I hope I'll get some answers from you guys !!





--------------------------------------------------------------------------
**************************************************************************
The number of the beast is 80386|Andreas Pikoulas
or 68030 ??                     |Acadia University
***************************************************************************
----------------------------------------------------------------------------

dodson@mozart.uucp (Dave Dodson) (11/22/88)

In article <1388@aucs.UUCP> 861087p@aucs.UUCP (A N D R E A S) writes:
>
>     Since this are Newsgroups that mainly programmers write I
>would like to ask them if it really worth it to spend time to
>learn assembly language.
>   I hope I'll get some answers from you guys !!

I believe it is important to learn assembly language for some computer
because that is how you learn how computers _really_ work.  This knowledge
can help you program better in any language.

Also, there are certain pieces of code that may be worth coding in
assembly language because speed is very important or because they can't
easily be coded in a machine-independent fashion in a high level language
anyway.  In the former case, recall that in many programs, most of the
run time is spent executing only a few lines of source code; thus it may
be necessary to write only a small amount of assembly language.  In the
latter case, assembly language is no less portable and may actually be
more readable, especially if the high level language is not very suitable
for the task at hand.

----------------------------------------------------------------------

Dave Dodson					     dodson@convex.COM
Convex Computer Corporation      Richardson, Texas      (214) 952-0234

noren@dinl.uucp (Charles Noren) (11/24/88)

In article <1388@aucs.UUCP> Andreas Pikoulas write:
>I would like to ask them if it really worth it to spend time to
>learn assembly language...

It is very much worth while to learn assembly language.  Reasons:

1.  Sometimes (although rarely) speed is absolutly crucial.  No matter
    how much you optimize your compiler switches, you cannot be
    absolutly assured you are getting the best performance (unless
    you observe the assembly or machine executable output of the
    compiler).  Even if the compiler gives you acceptable performance
    now, will a future upgrade of the compiler do something that will
    slow it down in the future?  Assembly lanaguage gives you absolute
    control.

2.  Assembly gives your more options to interface to your operating
    system, service interrupts, and control hardware.  It is true
    I can do all of this in most C's (even in many Pascal's), but there
    is a price.  Often the C compiler forces you to do things in certain
    ways.  Example, to service an interrupt in Turbo C, the Turbo C
    compiler will save all the registers, let you perform the operation,
    the restore all the registers.  In assembly,  you can save just the
    registers you need, perform a quick operation, the restore the used
    registers.  There may be ways around this in C, but often these use
    undocumented features of the compiler or other features which are
    subject to change.  I have been bitten by this many times when a
    new upgrade came to a compiler or operating system.

3.  Assmembly language gives you a better understanding of the architecture
    of the machine you are working on, in fact it forces you to learn
    the architecture.  If you are going to spend any time working on
    the machine this kind of knowledge will be of subtle but great benifit.
    This kind of knowledge can improve your high level language coding
    in a variety of ways (yes, you do want to make sure your coding is
    as much machine independent as possible).  For instance, in C you
    can use register variables.  But what are the practical implications
    if your machine has a limited number of general purpose registers?  On the
    x86 class machines there are a variety of memory models.  What are
    the real implications of using small, medium, large?  Knowing the
    assembly language will give you those insights.  Why can't you
    do a straight compare of pointers in C for equality in most memory models
    of x86 computers (but you can in some other CPU families)?  Knowledge
    of assembly will give you this answer (o.k., knowldge of the
    machine architecture gives you this answer, but knowledge of the
    assembly language forces you to learn the details of the architecture).

4.  It is good to know several different classes of languages.  It
    expands a person's thinking and approaches to solving problems.

This is a long winded reply.  I hope this helps.
-- 
Chuck Noren
NET:     ncar!dinl!noren
US-MAIL: Martin Marietta I&CS, MS XL8058, P.O. Box 1260,
         Denver, CO 80201-1260
Phone:   (303) 971-7930

bpendlet@esunix.UUCP (Bob Pendleton) (11/24/88)

From article <1388@aucs.UUCP>, by 861087p@aucs.UUCP (A N D R E A S):
> 
>      Since this are Newsgroups that mainly programmers write I
> would like to ask them if it really worth it to spend time to
> learn assembly language.I know that when you program in 
> assembly you loose the portability but you gain in speed.
> I've been told also that there are some low-level operations
> that you just can't do in C or any other high level language.
> Apart from this most of the commercial programs are written
> in assembly(Turbo Pascal etc.).Device drivers are also written in assemly.
> So why is everybody against assembly when many programs available are
> written in assembly?
>    I hope I'll get some answers from you guys !!

There are several different questions here. I'll give my opinions on
the ones I think are important. Then I'll sit back and watch the fun.

1) Why should you learn assembly language?

No one should ever learn AN assembly language. You should learn at
least 3 or 4 DIFFERENT assembly lanuages or you are wasting your time.
By different I mean for different machines. Though looking at several
different assemblers for the same machine can be instructive and
amusing.

To really know what a machine will do you need to learn and program in
its assembly language. Learning one assembly language tends to make
you believe that all machines are like the one you know. This can make
you truely blind to the good points of other machines and the horrible
gotchas of your one and only machine. One assmbly language folks can
be real bores at parties and conferences. 

I actually met one idiot who thought that the phrase "assembly
language" was just another name for BAL. He didn't see how I could
possibly be writing in assembly language for an 8080 (dates me, I
know) when he KNEW that BAL didn't generate 8080 machine code. Sigh.

The only way to learn an assembly language is to write in it. Writing
a simple assembler or a simple compiler that emits assembly language
is almost as good.

A complete introduction to the software side of computing should cover
everything from transistors to objects. With some understanding of how
each layer in the hierarchy depends, and is implemented in, the lower
layers. That understanding requires a knowledge of assembly languages
and machine architectures.

Being able to program in assembly languages is also a marketable
skill. What the hey, it's always nice to have a skill to fall back on
:-)

2) Why are some programs written in assembly langauge?

Economics.

Writing in assembly langauge is expensive. But, when the possible
market is large, like say the PClone market. And, speed is important,
which it always is. And processor cycles are a precious resource i.e.
the processor is slow, like on say a 4.77 mhz 8088. And memory is a
precious resource, as it is on every machine in the world, but
especially on PCs. Then the pay back from writing in assembler can be
huge.

That is to say, there is a whole class of marketable applications that
won't fit or will be to slow if you don't write them in assembly. So,
if you want to make a buck, you write it in assembly.

For any given machine, there exists some application that is just
barely doable on that machine. And there exists someone who is
willing, for some reason, to jump through hoops of real fire to make
that application run on that machine.

The cost of high quality compilers for unusual processors is
staggering.  $20,000 to $100,000 and up, is not unusual. There are
cases where you can't afford to write in anything but assembly
language. Of course $100,000 is about one man year of software
engineering time, so this cost might look high up front, but over the
life cycle of a product it might be dirt cheap.

Personal preference.

Some people LIKE to program in assembly language. I like to program in
assembly language, but then I like to write haiku sequences. No
accounting for taste.

I don't like to maintain assembly language. And I hate writing the
same program N times. So I don't write much in assembler.

Expediency.

I've written assembly code because I didn't have a compiler and
couldn't get a compiler. Sometimes the compilers don't exist,
sometimes you can't afford to buy them.

I've written small chunks of assembler to get at instructions the
compiler never generates.

3) Why do people object to using assembly language?

Econonomics.

Assembly code is expensive to write, expensive to maintain, and
nonportable. With modern compilers and modern computers (even PClones)
you can write just about anything in a high level language and get the
size and speed you need. OS dependencies might make it hard to port,
but it can be ported.

You might need a few assembly language routines to access some
hardware or to speed up a few critical parts of the code.  So why pay
the price of writing the whole thing in assembly language?

Esthetics.

Somewhere along the way we all pick up a sense of good and bad, right
and wrong, truth and lie, beauty and ugly... We see something and it
strikes us as good, right, correct, beautiful, it pleases us. A lot of
people see high level code as good, beautiful, and right, and low
level code as bad, ugly, and wrong. Trying to read a piece of assembly
langauge that you wrote last month might convince you that low level
code is indeed bad, ugly, and wrong. Haiku stands the test of time and
readability much better than assembly language.

In summary I'd say that you should learn several assembly languages
because it will give needed insight into machine architectures. And
because when you need to use it, nothing else will do. But, when it
isn't needed, it isn't worth your time, or anyone elses, to use it.

All the questions seemed to assume that the PClone world IS the
computer world. It isn't. Keep that in mind.

			Bob P.

P.S.

Please don't think that any of this is intended as a slam against the
PC world. I just bought one, just because of all the software
available for it. Does anyone know how to get BZONE to work with and
Everex Micro Enhancer II with and EGA monitor? Weird machines, truely
weird ....
-- 
              Bob Pendleton, speaking only for myself.
UUCP Address:  decwrl!esunix!bpendlet or utah-cs!esunix!bpendlet

		Reality is what you make of it.

smryan@garth.UUCP (Steven Ryan) (11/24/88)

>     Since this are Newsgroups that mainly programmers write I
>would like to ask them if it really worth it to spend time to
>learn assembly language.I know that when you program in 

Yes, learn assembly to learn the true nature of beast.

But don't USE it unless necessary.
-- 
                                                   -- s m ryan
--------------------------------------------------------------------------------
As loners, Ramdoves are ineffective in making intelligent decisions, but in
groups or wings or squadrons or whatever term is used, they respond with an
esprit de corps, precision, and, above all, a ruthlessness...not hatefulness,
that implies a wide ranging emotional pattern, just a blind, unemotional
devotion to doing the job.....

orr@cs.glasgow.ac.uk (Fraser Orr) (11/25/88)

In article <729@convex.UUCP> dodson@mozart.UUCP (Dave Dodson) writes:
>I believe it is important to learn assembly language for some computer
>because that is how you learn how computers _really_ work.  This knowledge
>can help you program better in any language.

Do you think it is important to understand how transistors work as well?
The semantic level of most mircoprocessors is high enough that
learning a HLL is pretty much sufficient these days (compare 68000 asm
to C for example.) In the good old days, when men were men and transistors
were valves, I think your statment was true. I also think that you needed
a fair understanding of how the electronics worked. As things have developed
I think the abstract level, that it is necessary to understand computers at,
has been slowly progressing upward.

>Also, there are certain pieces of code that may be worth coding in
>assembly language because speed is very important or because they can't
>easily be coded in a machine-independent fashion in a high level language
>anyway.  In the former case, recall that in many programs, most of the
>run time is spent executing only a few lines of source code; thus it may
>be necessary to write only a small amount of assembly language.  In the
>latter case, assembly language is no less portable and may actually be
>more readable, especially if the high level language is not very suitable
>for the task at hand.

I don't agree that there is ever any necessity to code in assembler. We
have languages that produce code just as good as hand crafted assembler
(such as C), so why not use them for this sort of thing.
As to your comments on portability, your implication seems to be that
the reason you use a  HLL is to facilitate portibility, the reason I use
then is that they are easier to code in, easier to debug, and easier
to maintain ( particularly by people that didn't write the code originally).
Although portibility is clearly an issue, there things are equally if not
more important.
As to assembler being more readable, I think that assembler is not very
sutiable for any task at hand.

==Fraser Orr ( Dept C.S., Univ. Glasgow, Glasgow, G12 8QQ, UK)
UseNet: {uk}!cs.glasgow.ac.uk!orr       JANET: orr@uk.ac.glasgow.cs
ARPANet(preferred xAtlantic): orr%cs.glasgow.ac.uk@nss.cs.ucl.ac.uk

kolding@june.cs.washington.edu (Eric Koldinger) (11/27/88)

In article <1961@crete.cs.glasgow.ac.uk> orr@cs.glasgow.ac.uk (Fraser Orr) writes:
>I don't agree that there is ever any necessity to code in assembler. We
>have languages that produce code just as good as hand crafted assembler
>(such as C), so why not use them for this sort of thing.

Ah, but wouldn't that be nice.  Optimizing compilers that could generate
code as good as we can generate by hand in all cases.  Let me know when
someone writes one.

-- 
	_   /|				Eric Koldinger
	\`o_O'				University of Washington
  	  ( )     "Gag Ack Barf"	Department of Computer Science
       	   U				kolding@cs.washington.edu

cik@l.cc.purdue.edu (Herman Rubin) (11/27/88)

In article <6529@june.cs.washington.edu>, kolding@june.cs.washington.edu (Eric Koldinger) writes:
> In article <1961@crete.cs.glasgow.ac.uk> orr@cs.glasgow.ac.uk (Fraser Orr) writes:
> >I don't agree that there is ever any necessity to code in assembler. We
> >have languages that produce code just as good as hand crafted assembler
> >(such as C), so why not use them for this sort of thing.
> 
> Ah, but wouldn't that be nice.  Optimizing compilers that could generate
> code as good as we can generate by hand in all cases.  Let me know when
> someone writes one.

I agree completely.  Also, on the various machines I know, there are operations
I want to use about which the compiler knows nothing.  To SOME extent, SOME
of these can be added to the HLL.  But I have not seen a scheme to do this
which will not mess up attempts at automatic optimization.  

Also, these interesting and useful instructions vary somewhat from machine to
machine.  I am appalled by what the language designers have left out, and also
what has been relegated to subroutines.  What can one think of a compiler 
designer who has relegated to a subroutine an operation whose inline code
is shorter than the caller's code to use the subroutine?  This is rampant.

I recently had the occasion to produce a subroutine for a peculiar function.
In principle, it should have been done in a HLL.  I would prefer to do so.
BUT, the following was needed:

  Take a double floating number, and extract the exponent and the mantissa.
  This involves having a double long type.

  Reverse the above operation.

  Addition, shifting, Boolean operations on double long.

  Checking an "overflow field."  Not the usual overflow.

If these are available, C would do a good job PROVIDED it put everything
in registers, if possible.

A few of the compilers I have seen do a fair job; the others would get 
a D--.  (Since D-- = C in C, this might be why C is so bad :-))  But one
of the most amazing things that I have seen in the workings of the 
designers is the assumption that the compiler has all the information
necessary to produce optimized code!  There is no provision for input
as to frequency of branches.  Should the common condition be the branch
or the rare condition?  Does it make a difference in the combination?
Since I have examples where the branches are of comparable frequencies,
examples where the ratio of the frequencies are from 10-200, where the
ratio is a few thousand, and where one branch may never occur, I certainly
feel that I have input.  I think the compilers should be interactive, and
discuss the various possibilities with the programmer.  I can even give
cases where the dictum to remove a calculation from within a loop is wrong.

All of mankind does not know enough to produce a good language, or to 
produce a good implementation of a language.  There are more operations
appropriate to hardware than are dreamed of in all computer scientists'
philosophies.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

composer@bucsb.UUCP (Jeffrey L. Kellem) (11/28/88)

In article <1961@crete.cs.glasgow.ac.uk> orr@cs.glasgow.ac.uk (Fraser Orr) writes:
>I don't agree that there is ever any necessity to code in assembler. We
>have languages that produce code just as good as hand crafted assembler
>(such as C), so why not use them for this sort of thing.
>
>==Fraser Orr ( Dept C.S., Univ. Glasgow, Glasgow, G12 8QQ, UK)

Except, of course, in the case of real-time programming, where
speed/efficiency is of greatest importance (after correctness); as in
some robotics development and data sampling.  Now, though optimizing
compilers are becoming more common, we still don't have one that can
produce all possible optimizations.

Jeff Kellem
INTERNET: composer%bucsb.bu.edu@bu-it.bu.edu  (or @bu-cs.bu.edu)
UUCP: ...!harvard!bu-cs!bucsb!composer
Disclaimer: My opinions are someone's...hopefully mine...  :)

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/28/88)

In article <1031@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>... on the various machines I know, there are operations
>I want to use about which the compiler knows nothing.

There are many of us who don't feel obligated to make use of every
little whiz-bang feature that hardware designers have so thoughtfully
provided.  Life is too short to spend it in an effort to accommodate
hardware quirks.

frank@zen.co.uk (Frank Wales) (11/28/88)

In article <1961@crete.cs.glasgow.ac.uk> orr@cs.glasgow.ac.uk
(Fraser Orr) writes:
>In article <729@convex.UUCP> dodson@mozart.UUCP (Dave Dodson) writes:
>>I believe it is important to learn assembly language for some computer
>>because that is how you learn how computers _really_ work.  This knowledge
>>can help you program better in any language.
>
>Do you think it is important to understand how transistors work as well?

For some problems, yes.  And caches, pipelines, and all the other 
diddly bits that make computers go these days.  Being a mechanic won't
necessarily improve your driving abilities; but it can't hurt, and you
never know when it might be useful.

>The semantic level of most mircoprocessors is high enough that
>learning a HLL is pretty much sufficient these days (compare 68000 asm
>to C for example.) In the good old days, when men were men and transistors
>were valves, I think your statment was true. I also think that you needed
>a fair understanding of how the electronics worked. As things have developed
>I think the abstract level, that it is necessary to understand computers at,
>has been slowly progressing upward.

The problems being attacked using computers today are bigger than ever,
so the levels of abstraction which need to be used to make them
comprehensible to those who would solve them must also expand to fit 
(and with them, the tools in use).  But this does not mean that the
level of detail at the lowest level has become any less -- indeed, things
have become more complex down there too.  We now have tools
which are sufficiently powerful and flexible that most of the time we can
forget about the low-level, implementation-type details in favour of
the higher-level, problem-solving details.  But only most of the time,
not all of it.  You may never have come across a problem requiring you
to delve into the depths of assembler (or microcode, etc.), but that
doesn't mean such problems don't exist.

>I don't agree that there is ever any necessity to code in assembler. We
>have languages that produce code just as good as hand crafted assembler
>(such as C), so why not use them for this sort of thing.

[Ignoring the wishful thinking in that comment...]
Suppose you don't have a compiler.  Or an assembler.  Suppose you're writing
these for a brand new processor -- what do you write them in?

>As to your comments on portability, your implication seems to be that
>the reason you use a  HLL is to facilitate portibility, the reason I use
>then is that they are easier to code in, easier to debug, and easier
>to maintain ( particularly by people that didn't write the code originally).

That's a subjective assessment which isn't always borne out in practice;
some things can be easier to write in assembler -- again, it depends on
the level of detail required by the problem to hand.  No-one in their
right mind writes an operating system entirely in assembler; but they don't
write them entirely in C (Pascal, Ada, Modula-2, ...) either.

>As to assembler being more readable, I think that assembler is not very
>sutiable for any task at hand.

Then you have a fortuitously-positioned hand.

--
Frank Wales, Software Consultant,    [frank@zen.co.uk<->mcvax!zen.co.uk!frank]
Zengrange Ltd., Greenfield Rd., Leeds, ENGLAND, LS9 8DB. (+44) 532 489048 x217

tom@garth.UUCP (Tom Granvold) (11/29/88)

-

    I must jump in on this discussion with my two cents worth.  There are
definitly cases where assembly language is not only appropriate, but 
nesscary!

    I write diagnostics that test the hardware in computer systems which
makes me a member of very small minorty of programers.  While reasonable
memory tests can be written in languages such as C or Forth, many other
tests require assembly language.  For example in the newer CPU's, especially
RISC chips, it is becomming common to have pipelining of instruction
execution and register scoreboarding.  In order to reasonably test these
features, one must be able to specify exactly whe sequence of instructions
that are to be executed.

     The second need for assembly is in real time control.  In my previous
job we were using a Z80 to control several stepper motors.  The critical
timing restrictions accured in the interrupt routines.  While there were
high level languages available for the Z80 none, that we were aware of,
were optimizing compilers.  Therefore we were able to produce much faster
code in assembler.  This was a case where every machine cycle was of
importance.  The most importent comment in the source code was the number
of machine cycles each instruction took.  Yes we could have used a newer
faster CPU that has  optimizing complier available for it, but Z80's are
cheap!

Thomas Granvold

orr@cs.glasgow.ac.uk (Fraser Orr) (11/29/88)

In article <6529@june.cs.washington.edu> kolding@uw-june.UUCP (Eric Koldinger) writes:
>Ah, but wouldn't that be nice.  Optimizing compilers that could generate
>code as good as we can generate by hand in all cases.  Let me know when
>someone writes one.

Ah, but wouldn't it be nice. People that could write code as good
as optimising compilers can in all cases. Let me know when one is
born.

Optimisers are good at some things, people are good at others. It all
comes out in the wash.

===Fraser

orr@cs.glasgow.ac.uk (Fraser Orr) (11/29/88)

In article <1031@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
<A few of the compilers I have seen do a fair job; the others would get 
<a D--.  (Since D-- = C in C, this might be why C is so bad :-))  But one
<of the most amazing things that I have seen in the workings of the 
<designers is the assumption that the compiler has all the information
<necessary to produce optimized code!  There is no provision for input
<as to frequency of branches.  Should the common condition be the branch
<or the rare condition?  Does it make a difference in the combination?
<Since I have examples where the branches are of comparable frequencies,
<examples where the ratio of the frequencies are from 10-200, where the
<ratio is a few thousand, and where one branch may never occur, I certainly
<feel that I have input.  I think the compilers should be interactive, and
<discuss the various possibilities with the programmer.  I can even give
<cases where the dictum to remove a calculation from within a loop is wrong.

I agree entirely (including with the cheap jibe against C:^>). I think
that there would be great improvements if compilers were interactive.
Not perhaps realised with low level languages like C and its ilk,
but certainly realised in much higher level programming languages.
Interactive isn't necessarily the best by the way, since it means
you have to repeat yourself, and the decisions are not documented
in the program. There is no reason why you can't have an "advice"
command, that advises the compiler the best implementation. (Doesn't
"advise" sound so much better than "pragma"?)

<Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907

==Fraser Orr ( Dept C.S., Univ. Glasgow, Glasgow, G12 8QQ, UK)
UseNet: {uk}!cs.glasgow.ac.uk!orr       JANET: orr@uk.ac.glasgow.cs
ARPANet(preferred xAtlantic): orr%cs.glasgow.ac.uk@nss.cs.ucl.ac.uk

seanf@sco.COM (Sean Fagan) (11/29/88)

In article <1961@crete.cs.glasgow.ac.uk> orr@cs.glasgow.ac.uk (Fraser Orr) writes:
>Do you think it is important to understand how transistors work as well?
>The semantic level of most mircoprocessors is high enough that
>learning a HLL is pretty much sufficient these days (compare 68000 asm
>to C for example.)

However, not all micros use a 68k.  Compare 8086 asm to C.  Or 88k asm to C.
Or SPARC asm to C.  It is, IMHO, a good idea to learn assembly, or at least
enough to understand the concepts.  This can help people do the ever
unpopular "microoptimizations," and also to understand why the program may
run so slowly.  It also helps when you need to break into adb and find out
why the blasted program is dereferencing through location 12 8-).

-- 
Sean Eric Fagan  | "Engineering without management is *ART*"
seanf@sco.UUCP   |     Jeff Johnson (jeffj@sco)
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

skinner@saturn.ucsc.edu (Robert Skinner) (11/29/88)

In article <1961@crete.cs.glasgow.ac.uk> orr@cs.glasgow.ac.uk (Fraser Orr) writes:
>In article <729@convex.UUCP> dodson@mozart.UUCP (Dave Dodson) writes:
>>I believe it is important to learn assembly language for some computer
>>because that is how you learn how computers _really_ work.  This knowledge
>>can help you program better in any language.
>
>Do you think it is important to understand how transistors work as well?
>The semantic level of most mircoprocessors is high enough that
>learning a HLL is pretty much sufficient these days (compare 68000 asm
>to C for example.) In the good old days, when men were men and transistors
>were valves, I think your statment was true. 

I have to agree with Dodson to some extent:  It is often useful to
understande how the *next* lowest level works.
	
	*  When I was designing logic, knowing how the individual
	transistors worked sometimes got me out of trouble.

	*  When I was writting assembly, it was useful to know about
	machine code, and how the processor actually executed the
	instruction.

	*  When I write "High Level" C, knowing how the underlying
	assembly works deduce the problem at times.  I can sometimes
	examine the assembly output and debug faster than looking at
	the C source.

	*  And now when I program in C++, I occasionally benefit from
	looking at the C output.

I think that they call this having a foundation in the basics.
I have seen this kind of foundation help students I've worked with,
and it has certainly helped me.

Robert Skinner
skinner@saturn.ucsc.edu
-- 
----
this is a test

henry@utzoo.uucp (Henry Spencer) (11/29/88)

In article <1031@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>... What can one think of a compiler 
>designer who has relegated to a subroutine an operation whose inline code
>is shorter than the caller's code to use the subroutine? ...

If the operation is infrequently used and not efficiency-critical, then
what one can think is "this guy knows his tradeoffs".  Adding an operation
to a compiler is not free.

>... But one
>of the most amazing things that I have seen in the workings of the 
>designers is the assumption that the compiler has all the information
>necessary to produce optimized code!  There is no provision for input
>as to frequency of branches...

Not true; some C implementations will take advice from the profiler on
this.  It's practically a necessity for the VLIW folks.  (Oh, you meant
advice from the *programmer*? :-)  The guy who's wrong 3/4 of the time
about such things?  Silly idea.)  (No, I'm not kidding -- programmer
intuition is vastly inferior to measured data in such matters.  This
has been known for many years.)
-- 
SunOSish, adj:  requiring      |     Henry Spencer at U of Toronto Zoology
32-bit bug numbers.            | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

shawn@pnet51.cts.com (Shawn Stanley) (11/29/88)

cik@l.cc.purdue.edu (Herman Rubin) writes:
>All of mankind does not know enough to produce a good language, or to 
>produce a good implementation of a language.  There are more operations
>appropriate to hardware than are dreamed of in all computer scientists'
>philosophies.

What might be more to the point is that there is no single language that is
perfect for every application.  The right tool for the job, they always say.

UUCP: {rosevax, crash}!orbit!pnet51!shawn
INET: shawn@pnet51.cts.com

cik@l.cc.purdue.edu (Herman Rubin) (11/29/88)

In article <8993@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes:
> In article <1031@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> >... on the various machines I know, there are operations
> >I want to use about which the compiler knows nothing.
> 
> There are many of us who don't feel obligated to make use of every
> little whiz-bang feature that hardware designers have so thoughtfully
> provided.  Life is too short to spend it in an effort to accommodate
> hardware quirks.

But these are the natural operations (to me as the algorithm designer).
Which of them are on a given machine varies.  A machine has few operations,
and if the information were provided for a mathematician as a reader, it 
would take not 1 day but 2-3 hours to understand a machine's instructions.

I find that the designers of the various languages have not considered the
type of operations which I would want to use WITHOUT KNOWLEDGE OF THE
MACHINE.  A trivial example is having a list of results to the left of
the replacement operator.  I do not mean a vector or a struct; the items
may be of different types, and should not be stored in adjacent memory
locations.  Most of the time, they should end up in registers.  I have
not seen a language which is claimed to produce even reasonably efficient
code with this property.  Some of these operations are even hardware.

Another effect of the HLL tyranny is that the operations which are beyond
the ken of th HLL designers are disappearing from the machines.  Other
useful operations are not getting in.  For example, suppose we want to
divide a by b, obtaining an integer result i and a remainder c.  I know
of no machine with this instruction, and this is not that unusual an 
instruction to demand.  It is cheap in hardware, and extremely expensive
in software--at least 4 instructions.

-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

mash@mips.COM (John Mashey) (11/29/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
...
>Another effect of the HLL tyranny is that the operations which are beyond
>the ken of th HLL designers are disappearing from the machines.  Other
>useful operations are not getting in.  For example, suppose we want to
>divide a by b, obtaining an integer result i and a remainder c.  I know
>of no machine with this instruction, and this is not that unusual an 
>instruction to demand.  It is cheap in hardware, and extremely expensive
>in software--at least 4 instructions.

Although I don't necessarily subscribe to Herman's opinions, R2000 divides
actually do this (leave both results in registers).  Although I shouldn't
have been, I was surprised to find that the following C code, compiled
unoptimized (optimized, it essentially disappears, of course):

main() {
	register i,j,k;
	i = j / 7;
	k = j % 7;
}

generates one divide instruction to get both of the results.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

frank@zen.co.uk (Frank Wales) (11/29/88)

In article <1961@crete.cs.glasgow.ac.uk> orr@cs.glasgow.ac.uk
(Fraser Orr) writes:
>I don't agree that there is ever any necessity to code in assembler. We
>have languages that produce code just as good as hand crafted assembler
>(such as C), so why not use them for this sort of thing.

Then ...

In article <1988@crete.cs.glasgow.ac.uk> orr@cs.glasgow.ac.uk
(Fraser Orr) writes:
>In article <6529@june.cs.washington.edu> kolding@uw-june.UUCP
>(Eric Koldinger) writes:
>>Ah, but wouldn't that be nice.  Optimizing compilers that could generate
>>code as good as we can generate by hand in all cases.  Let me know when
>>someone writes one.
>
>Ah, but wouldn't it be nice. People that could write code as good
>as optimising compilers can in all cases. Let me know when one is
>born.
>
>Optimisers are good at some things, people are good at others. 

Which means that, if you always want good work done, you'd better use
people for some of it.  [Sound of foot being shot in background.]

--
Frank Wales, Software Consultant,    [frank@zen.co.uk<->mcvax!zen.co.uk!frank]
Zengrange Ltd., Greenfield Rd., Leeds, ENGLAND, LS9 8DB. (+44) 532 489048 x217

cjosta@taux01.UUCP (Jonathan Sweedler) (11/29/88)

In article <8938@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
|In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
|...
|> suppose we want to
|>divide a by b, obtaining an integer result i and a remainder c.  I know
|>of no machine with this instruction, and this is not that unusual an 
|>instruction to demand.  It is cheap in hardware, and extremely expensive
|>in software--at least 4 instructions.
|
|Although I don't necessarily subscribe to Herman's opinions, R2000 divides
|actually do this (leave both results in registers).  

The 32000 series has a DEI (Divide Extended Integer) instruction that
also does this. 

-- 
Jonathan Sweedler  ===  National Semiconductor Israel
UUCP:    ...!{amdahl,hplabs,decwrl}!nsc!taux01!cjosta
Domain:  cjosta@taux01.nsc.com

koll@ernie.NECAM.COM (Michael Goldman) (11/30/88)

I have tried it both ways - one project I wrote 90% in C and 10% in ASM
and the next project I tried 100% in C.  (I tried also 100% in ASM, but
we won't go into that ahem, cough, cough :) ).  I need ASM.  When you
are working on drivers or hardware specific items, C (or PASCAL) makes
assumptions about hardware that aren't true, and to do the job in C
requires using such convoluted, obscure code that maintainability and
productivity are gone anyway.  This is just as true for non-time
-critical things such as keyboard handlers as for time-critical things
such as communications handling routines.

There is also the question of what happens when a new machine (like the
IBM PC or MAC, or whatever) comes out and the C compilers for it are
late or terribly buggy, or soooooooooo slow, and there are few if
any utility packages for it ?  Users are used to (and should have !)
crisp, snappy feedback for window manipulation and I/O and it takes
years to get the utility packages (often written in ASM) that will
do that for you. 

Only in the academic world can code be written to be 100% machine
independent.  The rest of us have to struggle with those little quirks
of the chips.

In the old days, you might have needed to know about valves/tubes, now
even the hardware designers only work on the IC level, but us softies
should know about it too for that little weirdness that creeps in on us
from the ICs.

Otherwise, I agree with the arguments for portability, and
maintainability, and productivity over sheer crunch speed.

desnoyer@Apple.COM (Peter Desnoyers) (11/30/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>
> For example, suppose we want to
>divide a by b, obtaining an integer result i and a remainder c.  I know
>of no machine with this instruction, and this is not that unusual an 
>instruction to demand.  

I can't think of a machine that doesn't. The only machine language
manuals on my desk (68000 and HPC16083) show this instruction.
If anyone knows of a machine that doesn't leave the remainder in
a register, I would be interested in knowing of it. Such a machine
either throws the remainder away or uses some algorithm besides
shift-and-subtract. (for instance hard-wired.)

				Peter Desnoyers

henry@utzoo.uucp (Henry Spencer) (11/30/88)

In article <2025@garth.UUCP> tom@garth.UUCP (Tom Granvold) writes:
>... The most importent comment in the source code was the number
>of machine cycles each instruction took.  Yes we could have used a newer
>faster CPU that has  optimizing complier available for it, but Z80's are
>cheap!

Alas, if you buy your newer faster CPUs from Motorola or Intel, they can't
tell you how many cycles each instruction takes!
-- 
SunOSish, adj:  requiring      |     Henry Spencer at U of Toronto Zoology
32-bit bug numbers.            | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

pardo@june.cs.washington.edu (David Keppel) (11/30/88)

>>cik@l.cc.purdue.edu (Herman Rubin) writes:
>>>divide a by b, obtaining an integer result i and a remainder c.
>>>I know of no machine with this instruction.  It is cheap in hardware,
>>>and extremely expensive in software--at least 4 instructions.

>mash@mips.COM (John Mashey) writes:
>>[R2000 has 1-instruction divide that does this ]

cjosta@taux01.UUCP (Jonathan Sweedler) writes:
>The 32000 series has a DEI (Divide Extended Integer) instruction that
>also does this. 

The rather obscure but still-available VAX series of computers made by
a small Nashua, New Hampshire company (Digital Equipment Corporation)
introduced the EDIV instruction recently, about 1978.  I heard a rumor
that (at least in some early/small VAXen) it was faster to do two
separate instructions than to use EDIV, although I suspect that this
was an artifact of those implementations.

Followups to comp.arch.

	;-D on  ( Now how about that `editpc' opcode? )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

cik@l.cc.purdue.edu (Herman Rubin) (11/30/88)

In article <949@taux01.UUCP<, cjosta@taux01.UUCP (Jonathan Sweedler) writes:
< In article <8938@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
> |In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> |...
> |> suppose we want to
> |>divide a by b, obtaining an integer result i and a remainder c.  I know
> |>of no machine with this instruction, and this is not that unusual an 
> |>instruction to demand.  It is cheap in hardware, and extremely expensive
> |>in software--at least 4 instructions.
> |
> |Although I don't necessarily subscribe to Herman's opinions, R2000 divides
> |actually do this (leave both results in registers).  
< 
< The 32000 series has a DEI (Divide Extended Integer) instruction that
< also does this. 

I do not know if I made it clear in my initial posting, but the problem
arises if the types of a, b, and c are floating.  Not that the quote from
my paper specifically has i an integer.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

gandalf@csli.STANFORD.EDU (Juergen Wagner) (11/30/88)

In article <1793@scolex> seanf@sco.COM (Sean Fagan) writes:
>In article <1961@crete.cs.glasgow.ac.uk> orr@cs.glasgow.ac.uk (Fraser Orr) writes:
>>Do you think it is important to understand how transistors work as well?
>>The semantic level of most mircoprocessors is high enough that
>>learning a HLL is pretty much sufficient these days (compare 68000 asm
>>to C for example.)
>
>However, not all micros use a 68k.  Compare 8086 asm to C.  Or 88k asm to C.
>Or SPARC asm to C.  It is, IMHO, a good idea to learn assembly, or at least
>enough to understand the concepts.
>...

That's the point. My favourite programming language is LISP but nonetheless
it is important to know all the levels below that (C, Assembly language, 
Microcode, Nanocode, Transistors, ...), so you know what's going on.

It may seem to be sufficient to know a HLL (C, Modula, Ada), but as soon as
you are interested in portability and efficiency, you start doing some metering
and optimizing. It is important to know *WHY* that bit manipulation takes so
long on a 68000, and why it doesn't on a 68020. Knowing the architecture can
help understand what the compiler is doing, without having to code your
programs in assembly language. It also is important to know how the operating
system works, although I would not want to write one.

The "why" is more important than the "how".

-- 
Juergen Wagner		   			gandalf@csli.stanford.edu
						 wagner@arisia.xerox.com

dik@cwi.nl (Dik T. Winter) (11/30/88)

In article <21390@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
 > In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
 > >
 > > For example, suppose we want to
 > >divide a by b, obtaining an integer result i and a remainder c.  I know
 > >of no machine with this instruction, and this is not that unusual an 
 > >instruction to demand.  
 > 
 > I can't think of a machine that doesn't. The only machine language
 > manuals on my desk (68000 and HPC16083) show this instruction.
 > If anyone knows of a machine that doesn't leave the remainder in
 > a register, I would be interested in knowing of it. Such a machine
 > either throws the remainder away or uses some algorithm besides
 > shift-and-subtract. (for instance hard-wired.)
 > 
Well, if I remember right, the we32000 does that (the thing in a 3b2 and a
tty5620).  It has a divide, quotient, remainder and modulus instruction
like the ns32000, but unlike the ns32000 no instruction that gives you
both.  This is (I think) an example of a machine for which the instructions
are dictated by a HLL (C in this case), which Herman Rubin so deplores.
On the other hand, this need not be wrong if the operations can be
pipelined.  E.g. if i/j and i%j take 23 cycles each, but the second can
start one cycle after the first you do not lose very much.

In article <1034@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
 > I do not know if I made it clear in my initial posting, but the problem
 > arises if the types of a, b, and c are floating.  Not that the quote from
 > my paper specifically has i an integer.

Indeed you did not make that clear (seems evident :-)).
I know no machines that gives you both quotient and remainder on a floating
point division, stronger still, before the IEEE standard there was (I
believe) no machine that would give you the remainder, only quotients
allowed.  IEEE changed that and many machines now have floating point
division and floating point remainder, but the result is floating point
for both.  This explains also why not both are returned.  To obtain the
(exact) remainder may require a larger number of steps than the
(inexact) quotient.  I agree however that those machines could return the
quotient when doing a remainder operation.  Note that IEEE did *not*
mandate operations that return two results.  So no machine I know does
return two results; even with sine and cosine, which is implemented on
many (as an extension to IEEE?).  The CORDIC algorithm will give you fast
both sine and cosine, but is only efficient in hardware.  But also here,
pipelining may be a deciding point.
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

news@ism780c.isc.com (News system) (11/30/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
...
> suppose we want to
>divide a by b, obtaining an integer result i and a remainder c.  I know
>of no machine with this instruction, and this is not that unusual an
>instruction to demand.  It is cheap in hardware, and extremely expensive
>in software--at least 4 instructions.

Then he clarifies:

>I do not know if I made it clear in my initial posting, but the problem
>arises if the types of a, b, and c are floating.  Not that the quote from
>my paper specifically has i an integer.
>-- 
>Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907

Ok here is one.  Quoting from the IBM/709 Reference Manual (Circa 1959)

FDH - Floating Divide or Halt
    THe c(ac) [means contents of ac register] are divided by c(y). The
    quotent appears in the mq and the remainder appears in the ac. ...

I leave it to Herman to find out why more modern machines leave this out.
Hint: Try profiling your aplication to see what percent of the total process
time is used producing the remainder.

    Marv Rubinstein.

ok@quintus.uucp (Richard A. O'Keefe) (11/30/88)

In article <21390@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
>In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>>For example, suppose we want to
>>divide a by b, obtaining an integer result i and a remainder c.  I know
>>of no machine with this instruction, and this is not that unusual an 
>>instruction to demand.  
>
>I can't think of a machine that doesn't.

Note that Rubin was careful to identify *i* as integer, but NOT the other
variables!  Consider, for example, Burroughs Extended Algol, where
	x DIV y
was defined to be
	SIGN(x/y)*ENTIER(ABS(x/y))
and	x MOD y
was defined to be
	x - (x DIV y)*y

These definitions make perfect sense for any real x, y for which x/y is
defined.  And Burroughs Algol did in fact extend the definitions to real
and double precision x and y as well as integers.  But the B6700 did not
have any instructions with two results that I can recall (discounting
SWAP), so it only meets part of Rubin's requirements.

In fact, if we define SIGN(x) = if x = 0 then 0 else x/ABS(x), you can
extend DIV and MOD to the complex numbers as well, not that it's quite
as useful...

As for machines which lack an *integer* divide with those properties,
the VAX has an EDIV instruction, but it is awkward to set up.  The 4.1BSD
C compiler would generate the equivalent of x-(x/y)*y for x%y.  For those
with SPARCs, a look at the code generated for x%y might be illuminating.

orr@cs.glasgow.ac.uk (Fraser Orr) (11/30/88)

In article <2025@garth.UUCP> tom@garth.UUCP (Tom Granvold) writes:
>    I write diagnostics that test the hardware in computer systems which
>makes me a member of very small minorty of programers.  While reasonable
>memory tests can be written in languages such as C or Forth, many other
>tests require assembly language.  For example in the newer CPU's, especially
>RISC chips, it is becomming common to have pipelining of instruction
>execution and register scoreboarding.  In order to reasonably test these
>features, one must be able to specify exactly whe sequence of instructions
>that are to be executed.

I agree, in some very isolated cases asm is necessary, testing hardware is
one such case. Not many of us have to do that though, and again, this is
not using asm for "speed" and "code compactness", neither of which is,
in my ever so humble opinion, a good or valid reason for writting asm.

>     The second need for assembly is in real time control.  In my previous
>job we were using a Z80 to control several stepper motors.  The critical
>timing restrictions accured in the interrupt routines.  While there were
>high level languages available for the Z80 none, that we were aware of,
>were optimizing compilers.  Therefore we were able to produce much faster
>code in assembler.  This was a case where every machine cycle was of
>importance.  The most importent comment in the source code was the number
>of machine cycles each instruction took.  Yes we could have used a newer
>faster CPU that has  optimizing complier available for it, but Z80's are
>cheap!

Sorry I don't entirely grasp what you are saying. Firstly you say speed
is crucial here, I don't agree that programming in asm necessarily gives
significant improvments in speed, but then you seem to imply that what your
interested in is exact timing information. I used to write programs in asm
and spent a lot of time "optimising" them (in fact I once spent about 2 months
on and off improving one critical subroutine, sutting it down from something
like 35 micro secs to 26) these made little difference to the final product.
Moreover it meant that the software was always behind schedule. Anyway see
the above comments.
I would also say that if you ever try to do this on more complicated processors
like the 68k and RISCs, then good luck to you, the timing is unbelievably
complicated, with caches and pipelines and all that.

>Thomas Granvold

==Fraser Orr ( Dept C.S., Univ. Glasgow, Glasgow, G12 8QQ, UK)
UseNet: {uk}!cs.glasgow.ac.uk!orr       JANET: orr@uk.ac.glasgow.cs
ARPANet(preferred xAtlantic): orr%cs.glasgow.ac.uk@nss.cs.ucl.ac.uk

orr@cs.glasgow.ac.uk (Fraser Orr) (11/30/88)

In article <1434@zen.UUCP> frank@zen.co.uk (Frank Wales) writes:

[My comments on the relative merits of optimisers and hand crafting]

>Which means that, if you always want good work done, you'd better use
>people for some of it.  [Sound of foot being shot in background.]

Ouch (sound of me being shot in the foot:^)
No that is not what I'm getting at at all Frank. What I'm saying is
tha programmers are good at some things (like local, domain specific
optimisations) whereas programs are good at others (such as optimal
register usage, common expression elimination etc). Unfortunately
the two are not compatible, that is, when the optimiser has munged
your code, it is pretty hard for the programmer to further optimise
becase it is so seriously ununlike the original.
What I was trying to say is that on balance they are about equal
(he says with no evidence whatsoever to support him), and I'd prefer
to get the compiler to do the dirty work. My time costs money y'know.

>Frank Wales, Software Consultant,    [frank@zen.co.uk<->mcvax!zen.co.uk!frank]

==Fraser Orr ( Dept C.S., Univ. Glasgow, Glasgow, G12 8QQ, UK)
UseNet: {uk}!cs.glasgow.ac.uk!orr       JANET: orr@uk.ac.glasgow.cs
ARPANet(preferred xAtlantic): orr%cs.glasgow.ac.uk@nss.cs.ucl.ac.uk

jmacdon@cg-atla.UUCP (Jeff MacDonald) (11/30/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
...
> suppose we want to
>divide a by b, obtaining an integer result i and a remainder c.  I know
>of no machine with this instruction, and this is not that unusual an 
>instruction to demand.  It is cheap in hardware, and extremely expensive
>in software--at least 4 instructions.

Both the 80x86 and 680x0 machines have integer divide instructions
which leave quotient and remainder.  So there.

Think we've beat Herman up enough?
-- 
Jeff MacDonald ((decvax||ulowell)!cg-atla!jmacdon)
c/o Compugraphic Corporation   200-3-5F
200 Ballardvale, Wilmington, MA   01887
(508) 658-0200,  extension 5406

barmar@think.COM (Barry Margolin) (12/01/88)

In article <1989@crete.cs.glasgow.ac.uk> orr@cs.glasgow.ac.uk (Fraser Orr) writes:
>I think that there would be great improvements if compilers were interactive.
...
>Interactive isn't necessarily the best by the way, since it means
>you have to repeat yourself, and the decisions are not documented
>in the program.

How about if the compiler saved the answers in the program text as
explicit ADVICE statements?  The compiler becomes a simple form of
Programmer's Apprentice, helping the programmer determine where
potential optimizations can be specified.  The decisions become
documented, and they don't have to be repeated.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

barmar@think.COM (Barry Margolin) (12/01/88)

In article <771@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>In fact, if we define SIGN(x) = if x = 0 then 0 else x/ABS(x), you can
>extend DIV and MOD to the complex numbers as well, not that it's quite
>as useful...

Which is, in fact, precisely how Common Lisp (which has complex and
rational numbers built in) defines SIGNUM.  This definition computes
the unit vector colinear with the vector in the complex plane that the
original number specifies.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

desnoyer@Apple.COM (Peter Desnoyers) (12/01/88)

In article <1034@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>
>I do not know if I made it clear in my initial posting, but the problem
>arises if the types of a, b, and c are floating.  Not that the quote from
>my paper specifically has i an integer.

This was only implicitly stated, at best, in the original posting by
use of Fortran variable naming conventions. I read the article in
comp.lang.c. Enough said. Now that I understand the original question,
I have a new answer -

 There is no natural divide-type operation mapping (real,real) ->
(real,integer). The operation mentioned was either poorly defined, or
merely consisted of:
  convert a, b to int
  div a,b -> int quotient, remainder
  convert remainder to float

If what you mean is that for any integer or floating-point operation
op(i1,i2,...in,o1,o2...,om) with n inputs 'i' and m outputs 'o', the
full set of 2^^n+m combinations of i1 : {int, float}, ... om : {int,
float} should be supported as hardware instructions, then the idea is
patently absurd. 

				Peter Desnoyers

alverson@decwrl.dec.com (Robert Alverson) (12/01/88)

In fact, the Intel 8087 almost computes

     i, r := a/b, a%b;

The floating point remainder function returns the reminader as the
function result, and the least 3 bits of the integer quotient are
stored in the condition codes.  For some typical uses of the
remainder function (range reduction), this is all you need.

Bob

gandalf@csli.STANFORD.EDU (Juergen Wagner) (12/01/88)

In article <32354@think.UUCP> barmar@kulla.think.com.UUCP (Barry Margolin) writes:
>...
>How about if the compiler saved the answers in the program text as
>explicit ADVICE statements?  The compiler becomes a simple form of
>Programmer's Apprentice, helping the programmer determine where
>potential optimizations can be specified.  The decisions become
>documented, and they don't have to be repeated.

If you have such a sophisticated compiler, that would be more a tool for
automated program generation than a compiler in the classical sense. Such a
tool could also be fed with a pure declarative specification of what you
intend to do, and it should have knowledge about all optimizations for the
architectures you're going to run that program on.

Of course, if you learn something really nifty about optimizing code, you would
like to tell that "compiler" to use the new technique with your code. I guess,
this comes close to what some LISP environments offer, where you can modify
the compiler if you don't like it the way it is.

In such an environment, the programmer should not only have the opportunity to
choose between alternatives the compiler already knows, but also to introduce
new specialized strategies for certain applications. A major problem would
then be consistency of the resulting system...

If I had to use such a tool in the C environment, I would certainly ask for
incremental compilation, too. It is a pain in the neck if you have to recompile
and relink some of your modules every time you are changing a single character
in a function.

Hmm... I guess, I am shifting to the more general question of suitable
software development environments for C or other languages...

'nuff said.

-- 
Juergen Wagner		   			gandalf@csli.stanford.edu
						 wagner@arisia.xerox.com

ech@poseidon.ATT.COM (Edward C Horvath) (12/01/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> suppose we want to
>divide a by b, obtaining an integer result i and a remainder c.  I know
>of no machine with this instruction,...

In article <8938@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
> Although I don't necessarily subscribe to Herman's opinions, R2000 divides
> actually do this (leave both results in registers).  

From article <949@taux01.UUCP>, by cjosta@taux01.UUCP (Jonathan Sweedler):
> The 32000 series has a DEI (Divide Extended Integer) instruction that
> also does this. 

Hmm, add the MC68K, the PDP-11, and the IBM s/360 et fils.  Put another way,
does anyone have an example of a common processor that DOESN'T give you the
remainder and quotient at the same time?  I don't know the Intel chips, so
perhaps the original author just knows that the *86 divide doesn't do this.

It's interesting, though, how few languages provide such a "two-valued"
functions (all right, I can feel the mathematicians cringing.  So few
languages provide functions with ranges like ZxZ, OK?).  I've seen
implementations of FORTH, by the way, where the expression
	a b /%
for example, divides a by b, leaving a/b and a%b on the stack.  Of
course, if your favorite flavor of forth didn't provide the /% operator
("word") you'd just define it...

=Ned=

ok@quintus.uucp (Richard A. O'Keefe) (12/01/88)

In article <21440@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
>In article <1034@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>>[about the iq, r := x div y, x mod y instruction]

>The operation mentioned was either poorly defined, or merely consisted of:
>  convert a, b to int
>  div a,b -> int quotient, remainder
>  convert remainder to float

Wrong.  The important thing is that the remainder is
	remainder = a - b*INT(a/b)
I am sure that the IEEE floating-point committee would be interested to
learn that it is not a "natural divide-type operation"; this is
precisely the IEEE drem(a,b) function.  Quoting the SunOS manual:
	drem(x, y) returns the remainder r := x - n*y where n is the
	integer nearest the exact value of x/y; moreover if |n-x/y|=1/2
	then n is even.  Consequently the remainder is computed exactly
	and |r| <= |y|/2.  ... drem(x, 0) returns a NaN.

This is obviously a range reduction operator.  Oddly enough, on most
present-day machines, there is a good excuse for _not_ returning the
quotient (n) as well:  with 64-bit floats and 32-bit integers there
is no reason to expect n to be representable as a machine "integer".
Both results would have to be floats.  And in its use as a range
reduction operation, you normally aren't interested in n.

friedl@vsi.COM (Stephen J. Friedl) (12/01/88)

In article <1988Nov29.181235.23628@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes:
> Alas, if you buy your newer faster CPUs from Motorola or Intel, they can't
> tell you how many cycles each instruction takes!

Why is this?  When I was hacking on the VAX, nobody could ever
tell me how long anything took, and empirical measurements were
pretty tedious.  Is it laziness on the vendor's part or are there
good reasons for this?

    Steve

-- 
Stephen J. Friedl        3B2-kind-of-guy            friedl@vsi.com
V-Systems, Inc.                                 attmail!vsi!friedl
Santa Ana, CA  USA       +1 714 545 6442    {backbones}!vsi!friedl
---------Nancy Reagan on cutting the grass: "Just say mow"--------

cik@l.cc.purdue.edu (Herman Rubin) (12/01/88)

In article <7740@boring.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes:
> In article <21390@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
< < In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
< < >
< < > For example, suppose we want to
< < >divide a by b, obtaining an integer result i and a remainder c.  I know
< < >of no machine with this instruction, and this is not that unusual an 
< < >instruction to demand.  
< < 
< < I can't think of a machine that doesn't. The only machine language
< < manuals on my desk (68000 and HPC16083) show this instruction.
< < If anyone knows of a machine that doesn't leave the remainder in
< < a register, I would be interested in knowing of it. Such a machine
< < either throws the remainder away or uses some algorithm besides
< < shift-and-subtract. (for instance hard-wired.)

The CDC6000s and its CYBER successors and the CYBER 205 (a totally different
architecture) do not even have integer division.  The CRAYs do not even have
division.  Certainly the VAX and the PYRAMID only have the above properties
if an EDIV rather than a DIV is used for division, and that had a few more
problems than one would expect because they do not use sign-magnitude.  It
takes two instructions even to do the sign extension.

> Well, if I remember right, the we32000 does that (the thing in a 3b2 and a
> tty5620).  It has a divide, quotient, remainder and modulus instruction
> like the ns32000, but unlike the ns32000 no instruction that gives you
> both.  This is (I think) an example of a machine for which the instructions
> are dictated by a HLL (C in this case), which Herman Rubin so deplores.
> On the other hand, this need not be wrong if the operations can be
> pipelined.  E.g. if i/j and i%j take 23 cycles each, but the second can
> start one cycle after the first you do not lose very much.

Can division be pipelined?  In scalar mode on the CYBER 205, division is not
subject for pipelining.  I believe that this is the case because too much of
the dividend must be retained throughout.

> In article <1034@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
< < I do not know if I made it clear in my initial posting, but the problem
< < arises if the types of a, b, and c are floating.  Not that the quote from
< < my paper specifically has i an integer.
> 
> Indeed you did not make that clear (seems evident :-)).
> I know no machines that gives you both quotient and remainder on a floating
> point division, stronger still, before the IEEE standard there was (I
> believe) no machine that would give you the remainder, only quotients
> allowed.  IEEE changed that and many machines now have floating point
> division and floating point remainder, but the result is floating point
> for both.  This explains also why not both are returned.  To obtain the
> (exact) remainder may require a larger number of steps than the
> (inexact) quotient.  I agree however that those machines could return the
> quotient when doing a remainder operation.  Note that IEEE did *not*
> mandate operations that return two results.  So no machine I know does
> return two results; even with sine and cosine, which is implemented on
> many (as an extension to IEEE?).  The CORDIC algorithm will give you fast
> both sine and cosine, but is only efficient in hardware.  But also here,
> pipelining may be a deciding point.
> -- 
> dik t. winter, cwi, amsterdam, nederland
> INTERNET   : dik@cwi.nl
> BITNET/EARN: dik@mcvax

The implementation of quotient and remainder should be a simple modification
of the fixed point algorithm.  Those machines which have floating point
remainder in hardware presumably do this.

Even if CORDIC is not used, returning the sine and cosine would be faster in
software than two function calls.  The corresponding function is in some of
the libraries I know, but I have not seen any other implementation than the
two calls.  This also holds for many other reasonable situations in software.
In software, pipelining is certainly not possible except in vector situations.
The lack of inclusion of a list to the left of the = in a replacement statement
should have been obvious to all HLL designers from day 1; mathematicians have
used functions returning a list for centuries.

-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

cik@l.cc.purdue.edu (Herman Rubin) (12/01/88)

In article <19848@ism780c.isc.com>, news@ism780c.isc.com (News system) writes:
> In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> ...

 
> Then he clarifies:
> 
> >I do not know if I made it clear in my initial posting, but the problem
> >arises if the types of a, b, and c are floating.  Not that the quote from
> >my paper specifically has i an integer.

 
> Ok here is one.  Quoting from the IBM/709 Reference Manual (Circa 1959)
> 
> FDH - Floating Divide or Halt
>     THe c(ac) [means contents of ac register] are divided by c(y). The
>     quotent appears in the mq and the remainder appears in the ac. ...

The format of this allows the multiple precison division of a single precision
floating point number by a single precision number, and was probably used in
computing double precision quotients, but it has nothing to do with the
question I raised.  The quotient is a truncated floating point quotient,
and the remainder is the remainder from this operation.  Even denormalization,
which was possible on that machine, could not get the quotient as an integer.  
The easiest way to do it on the 709 was first to check if the quotient was 0,
in which case the result is clear.  Otherwise, do the appropriate unpacking
and shifting, do an integer divide, and repack the integer remainder.  This
was more efficient on the 709 than on current machines, because division was
relatively slower and comparisons and transfers were FAST instructions in those
days.  (No, they have not gotten slower; the others have gotten relatively much
faster.)
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

khb%chiba@Sun.COM (Keith Bierman - Sun Tactical Engineering) (12/02/88)

In article <606@poseidon.ATT.COM> ech@poseidon.ATT.COM (Edward C Horvath) writes:
>
>It's interesting, though, how few languages provide such a "two-valued"
>functions (all right, I can feel the mathematicians cringing.  So few
>languages provide functions with ranges like ZxZ, OK?).  I've seen
>implementations of FORTH, by the way, where the expression
>	a b /%
>for example, divides a by b, leaving a/b and a%b on the stack.  Of
>course, if your favorite flavor of forth didn't provide the /% operator
>("word") you'd just define it...
>

F88 does (define a new data type, define a function which returns it,
possibly overload some operator).


Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus

dave@onfcanim.UUCP (Dave Martindale) (12/02/88)

In article <1107@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>
>Somewhere along the way we all pick up a sense of good and bad, right
>and wrong, truth and lie, beauty and ugly... We see something and it
>strikes us as good, right, correct, beautiful, it pleases us. A lot of
>people see high level code as good, beautiful, and right, and low
>level code as bad, ugly, and wrong. Trying to read a piece of assembly
>langauge that you wrote last month might convince you that low level
>code is indeed bad, ugly, and wrong. Haiku stands the test of time and
>readability much better than assembly language.

I've seen some elegant assembly code, and it's elegant for the same
reasons that higher-level code is elegant: it does its job precisely
and cleanly and well.  And, in some sense, it does it more cleanly than
the equivalent compiled code, or does something that compiled code
could not do as well.  However, such pieces of elegant assembly code
are seldom more than one screen, or one printer page, in size.

There are people who can write well-structured, clean, readable
assembly code that goes on for hundreds of pages.  I've known
a few of them.  They also write good high-level-language code,
and generally do so when suitable compilers are available.

In my opinion, excellent programmers can write well in both assembly
and compiled languages, and can thus choose the best language for
the job at hand.  (You could argue that this extends to fluency in
interpreted languages as well).

dave@onfcanim.UUCP (Dave Martindale) (12/02/88)

In article <189@ernie.NECAM.COM> koll@ernie.NECAM.COM (Michael Goldman) writes:
>
>There is also the question of what happens when a new machine (like the
>IBM PC or MAC, or whatever) comes out and the C compilers for it are
>late or terribly buggy, or soooooooooo slow, and there are few if
>any utility packages for it ?
>
>Only in the academic world can code be written to be 100% machine
>independent.  The rest of us have to struggle with those little quirks
>of the chips.

Hmm.  Is it only in the academic world that the system (hardware and
software) is chosen for its benefit to the programmer or user?

I think that most people choose hardware and software for its
usefulness to them.  If a particular piece of hardware has too many
"quirks" to be useful, we don't buy it.  So we can write code that is
100% portable (not necessarily the same thing as machine independent)
to all of the machines that we consider sufficiently non-brain-damaged
to be useful to us.

I'll suggest that it's primarily people who write software to sell that
have to deal with new hardware that doesn't yet have decent compilers.
You aren't buying the hardware to use it, but to write stuff to sell to
others who use it, and are thus forced to put up with whatever state
it is in.  The "software vendor" view of the world is at least as narrow
as that of the "academic" view, and certainly not representative of
all non-academic users.

"The rest of us" are end users, who don't have to touch such hardware
if we don't want to.

ecphssrw@solaria.csun.edu (Stephen Walton) (12/02/88)

Can this discussion be ended, or at least moved to ONLY
comp.lang.misc?  This is the second extended language war in the last
few months (the last one was C vs. Fortran), and these two have
convinced me to auto-KILL all comp.lang.fortran mesages cross-posted
to comp.lang.c. 
-- 
Stephen Walton, Dept. of Physics & Astronomy, Cal State Univ. Northridge
RCKG01M@CALSTATE.BITNET       ecphssrw@afws.csun.edu
swalton@solar.stanford.edu    ...!csun!afws.csun.edu!bcphssrw

khb%chiba@Sun.COM (Keith Bierman - Sun Tactical Engineering) (12/02/88)

In article <1039@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>
>Can division be pipelined?  In scalar mode on the CYBER 205, division is not
>subject for pipelining.  I believe that this is the case because too much of
>the dividend must be retained throughout.

The cydra 5 had it pipelined, but the pipe was interlocked during
operation. This was said to be due to a lack of real estate (viz could
have been fully pipelined if (a) not for gradual underflow (the cydra
5 was a fully ieee compliant machine), (b) there was more space it
could have been done anyway.

>
>Even if CORDIC is not used, returning the sine and cosine would be faster in
>software than two function calls.  The corresponding function is in some of
>the libraries I know, but I have not seen any other implementation than the
>two calls.

DEC's famous math$sincos does this. SunFORTRAN takes two calls (in the
user code) and turns them into one call to the sun sincos routine.
Unless I am mistaken this is a very old and well known optimization
(which is typically invisible to the user).

On many machines the cost of both is only a couple of % more than the
cost of one.

Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus

consult@osiris.UUCP (Unix Consultation Mailbox ) (12/02/88)

In article <1034@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>I do not know if I made it clear in my initial posting, but the problem
>arises if the types of a, b, and c are floating.  Not that the quote from
>my paper specifically has i an integer.

And I, ex-FORTRAN programmer that I am, should have picked up on this
immediately... :-)

                                                                 Phil Kos
                                                      Information Systems
...!uunet!pyrdc!osiris!phil                    The Johns Hopkins Hospital
                                                            Baltimore, MD

mccalpin@loligo.fsu.edu (John McCalpin) (12/02/88)

I notice that no one has discussed supercomputers in this long 
discussion on the merits of learning assembly language.  To achieve
any sort of reasonable performance on a vector supercomputer, you
must know a lot about the architecture.  Some machines are worse
than others --- the CDC/ETA machines and the Cray-2 come to mind as
machines on which it is remarkably easy to get bad performance....

On the other hand, it is almost never necessary for users to _write_
is assembly language on these machines to get good performance - you
just need to know what the vectorizer is able to convert into 
efficient code.  Also on the CDC/ETA machines, the entire instruction
set is available in FTN200 anyway via special calls. (This actually
produces inline code, but they are written as subroutine calls).

John D. McCalpin
Supercomputer Computations Research Institute
The Florida State University
mccalpin@masig1.ocean.fsu.edu

ok@quintus.uucp (Richard A. O'Keefe) (12/02/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> suppose we want to
>divide a by b, obtaining an integer result i and a remainder c.  I know
>of no machine with this instruction,...

Various people have listed machines with integer quotient-and-remainder
instructions.  I seldom agree with Herman Rubin, but he is quite capable
of reading an instruction set manual.  At any rate, he is better at that
than most of the repliers were at reading his message.  He was asking for
the equivalent of
	double a, b, r;
	long int i;
	r = drem(a, b);
	i = (int)( (a-r)/b );

Another poster says
>I've implementations of FORTH, by the way, where the expression
>	a b /%
>for example, divides a by b, leaving a/b and a%b on the stack.
	
Pop-2 has a//b which does much the same thing.

Common Lisp has
	(floor number divisor)
plus ceiling, truncate, and round.  All four functions return TWO
values: the quotient and the remainder.  E.g.
	(multiple-value-setq (Q R) (truncate N D))
In particular, Common Lisp's (round - -) function, if given floating-
point arguments, is the function that Rubin wants a single instruction for.
(Well, that's drem() anyway.  He might want one of the other three.)

I don't know, but I wouldn't be surprised if the Symbolics machines had
an instruction for this.

cik@l.cc.purdue.edu (Herman Rubin) (12/02/88)

In article <787@quintus.UUCP>, ok@quintus.uucp (Richard A. O'Keefe) writes:
> In article <21440@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
> >In article <1034@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:

		.............................

> Wrong.  The important thing is that the remainder is
> 	remainder = a - b*INT(a/b)
> I am sure that the IEEE floating-point committee would be interested to
> learn that it is not a "natural divide-type operation"; this is
> precisely the IEEE drem(a,b) function.  Quoting the SunOS manual:
> 	drem(x, y) returns the remainder r := x - n*y where n is the
> 	integer nearest the exact value of x/y; moreover if |n-x/y|=1/2
> 	then n is even.  Consequently the remainder is computed exactly
> 	and |r| <= |y|/2.  ... drem(x, 0) returns a NaN.
> 
> This is obviously a range reduction operator.  Oddly enough, on most
> present-day machines, there is a good excuse for _not_ returning the
> quotient (n) as well:  with 64-bit floats and 32-bit integers there
> is no reason to expect n to be representable as a machine "integer".
> Both results would have to be floats.  And in its use as a range
> reduction operation, you normally aren't interested in n.

I disagree.  In the varied situations I have wanted this, one of the 
following happens.

	I know the result is small, so that sometimes even 8 bits is enough.

	I only care about the last few bits.

	I want an overflow trap if it is too large.

BTW, I think there should be alternatives about the range of the remainder.
There are situations in which I want the remainder forced positive.  This is
much cheaper in hardware than in software.

Thus we have several instructions wanted, or one can look at it as one instruc-
tion with a "tag field."  Either way, it should be done.  And the integer is
wanted in many situations.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

gandalf@csli.STANFORD.EDU (Juergen Wagner) (12/03/88)

 In article <960@vsi.COM> friedl@vsi.COM (Stephen J. Friedl) writes:
 >In article <1988Nov29.181235.23628@utzoo.uucp>, henry@utzoo.uucp (Henry
 >Spencer) writes:
 >> Alas, if you buy your newer faster CPUs from Motorola or Intel, they can't
 >> tell you how many cycles each instruction takes!
 >
 >Why is this?
 >...

Because of instruction prefetching, overlapping of instruction executions, ...
The time an instruction takes to execute depends on its context.
Strictly speaking, e.g. a 68020 is not a von-Neumann machine!

-- 
Juergen Wagner		   			gandalf@csli.stanford.edu
						 wagner@arisia.xerox.com

einari@rhi.hi.is (Einar Indridason) (12/03/88)

In article <606@poseidon.ATT.COM> ech@poseidon.ATT.COM (Edward C Horvath) writes:
>
>Hmm, add the MC68K, the PDP-11, and the IBM s/360 et fils.  Put another way,
>does anyone have an example of a common processor that DOESN'T give you the
>remainder and quotient at the same time?  I don't know the Intel chips, so
>perhaps the original author just knows that the *86 divide doesn't do this.

Are we talking about the old 8-bit processors as well, or are we just talking
about the "new" processors.  (Define it for your self :-)
If we are talking about those 8-bitters then the MOSTEK-6502 does not contain
dividing or multiplication instructions.  And if memory serves me right, neither
did the Z-80.

Now, one question: does the ARM chip in Acorn Archimedes include multiply and/or
divide instructions.

To quote Alfred E. Neuman: "What! Me worry????"

Internet:	einari@rhi.hi.is
UUCP:		..!mcvax!hafro!rhi!einari

-- 
To quote Alfred E. Neuman: "What! Me worry????"

Internet:	einari@rhi.hi.is
UUCP:		..!mcvax!hafro!rhi!einari

dave@lethe.UUCP (David Collier-Brown) (12/03/88)

In article <1032@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
| Another effect of the HLL tyranny is that the operations which are beyond
| the ken of th HLL designers are disappearing from the machines. 

From article <8938@winchester.mips.COM>, by mash@mips.COM (John Mashey):
|  Although I don't necessarily subscribe to Herman's opinions, R2000 divides
|  actually do this (leave both results in registers).  Although I shouldn't
|  have been, I was surprised to find that the following C code, compiled
|  unoptimized (optimized, it essentially disappears, of course):
|  main() {
|  	register i,j,k;
|  	i = j / 7;
|  	k = j % 7;
|  }
|  generates one divide instruction to get both of the results.

  Actually its isn't too surprising: one of the primitives I want
at code-generator-generation time is an n-tuple consisting of:
  instruction name
  input register(s) and content constraints (aka types)
  output register(s) ditto
  pattern to generate instruction
  size
  time (or other figure of merit)

  If I have these, I claim I can write a code generator that will do
a "good" job [see note below].  If I preserve these, however, I
also get the ability to do usefull, if simple, improvements in the
code generated to do setup/teardown before and after the operation.  If I
plan on doing so, I get the ability to take a new instruction,
specified by the programmer, and provide the same kind of results.
  The simple case is old hat [press n if you've seen this one before]
Define a system call in Ada by defining the representation of a trap
word, provide the types of input and output by declaring it in
functional/procedural form and the register constraints by pragmas.
Then look at the code generated (about 3 years ago on a small
honeybun) and note that no redundant register-register moves were
generated on either call or return.


--dave (this also answers the question of rarely-used 
        instructions being skimped on) c-b

note: No, I **can't** write you a code-generator-generator. I'm not
      a good enough academic.  But I've met people who have... I can
      only write code-generators themselves.

ech@poseidon.ATT.COM (Edward C Horvath) (12/03/88)

In article <1961@crete.cs.glasgow.ac.uk> (Fraser Orr) writes:
>I don't agree that there is ever any necessity to code in assembler. We
>have languages that produce code just as good as hand crafted assembler
>(such as C), so why not use them for this sort of thing.

From article <1432@zen.co.uk>, by frank@zen.co.uk (Frank Wales):
> Suppose you don't have a compiler.  Or an assembler.  Suppose you're writing
> these for a brand new processor -- what do you write them in?

I have to disagree with Mr. Orr, but not for the reason Mr. Wales cites.
Most assemblers are table-driven, as are the code generators for typical
compilers.  So you "teach" those programs about the new machine, compile to
get a cross-system, and compile again to get a "native" system.  My fingers
never leave my hands...

Assembler remains useful where nothing else will do, in performance-critical
operations (it's amazing how slow hashing is in C compared with assembler)
and nasty realities like dealing with MMUs, I/O devices, and other
ugly bits of hardware that have ugly timing constraints.  I have no fear of
assembler, but I only use it AFTER I run the performance measurements (or
when that ugly hardware is an issue).

=Ned Horvath=

henry@utzoo.uucp (Henry Spencer) (12/04/88)

In article <960@vsi.COM> friedl@vsi.COM (Stephen J. Friedl) writes:
>> Alas, if you buy your newer faster CPUs from Motorola or Intel, they can't
>> tell you how many cycles each instruction takes!
>
>Why is this?  When I was hacking on the VAX, nobody could ever
>tell me how long anything took, and empirical measurements were
>pretty tedious.  Is it laziness on the vendor's part or are there
>good reasons for this?

Both.  On simple machines like an 8080, which do one thing at a time and
do not stress their memory systems, it's easy to say how many cycles a
given instruction takes.  Make the instruction set hideously complex,
like the one on the VAX, and timing information gets very bulky.  (Worse,
it becomes heavily model-dependent, because different models of the CPU
implement details differently.)  Boost the clock rate and add a cache, and
all of a sudden memory-access times are effectively non-deterministic:  the
time taken for an instruction is a function of whether its memory fetches
come from the cache or not.  Add prefetch, and execution times can be
a function of the preceding instructions, because their memory-access
patterns determine whether the prefetcher can sneak the instruction fetch
in without stalling the execution unit.  Add pipelining, and this gets
a dozen times worse, because now instruction time is a complex function
of both preceding and following instructions and how they fight each
other for machine resources.  (For example, the register-to-register
move time on a 68020 is often zero, because the instruction completely
disappears in overlap with neighboring instructions.)

All of this means that supplying useful timing information for a cached,
pipelined, prefetching CISC is hard.  Supplying anything halfway accurate
is a lot of work, and it may be encrusted with so many ifs, buts, and
maybes that it isn't very useful.  This encourages manufacturers to be
lazy.  There are also more cynical motives that may be involved, such as
making life harder for third-party compiler suppliers, or a deliberate
policy of discouraging model-specific programming (after all, if the
customer isn't happy with the performance, he can always buy a bigger
machine, and this way there's only one version of the software to maintain).
-- 
SunOSish, adj:  requiring      |     Henry Spencer at U of Toronto Zoology
32-bit bug numbers.            | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

csc21824@unsvax.UUCP (Jay) (12/05/88)

In article <622@krafla.rhi.hi.is> einari@krafla.UUCP (Einar Indridason) writes:
>In article <606@poseidon.ATT.COM> ech@poseidon.ATT.COM (Edward C Horvath) writes:
>>does anyone have an example of a common processor that DOESN'T give you the
>>remainder and quotient at the same time?  I don't know the Intel chips, so
>>perhaps the original author just knows that the *86 divide doesn't do this.

        The 8086 divide doesn't?  Then I'd like to know why my integer-to
string conversion routine works?  It repeatedly divides the number by 10,
saving the remainder
---------------------------------------------------------------------

        This space for Rent                  Eric J. Schwertfeger
                                             CIS  [72657,1166]
                                        or   csc21824%unsvax.uns.edu
Disclaimer:These are just the mad ramblings of a man forced to use
           vi once too often, and as such, should be ignored.

tom@garth.UUCP (Tom Granvold) (12/06/88)

I orginally wrote:

>>     The second need for assembly is in real time control.  In my previous
>>job we were using a Z80 to control several stepper motors.  The critical
>>timing restrictions accured in the interrupt routines.  While there were
>>high level languages available for the Z80 none, that we were aware of,
>>were optimizing compilers.  Therefore we were able to produce much faster
>>code in assembler.  This was a case where every machine cycle was of
>>importance.  The most importent comment in the source code was the number
>>of machine cycles each instruction took.  Yes we could have used a newer
>>faster CPU that has  optimizing complier available for it, but Z80's are
>>cheap!

Then Fraser Orr replied:

>Sorry I don't entirely grasp what you are saying. Firstly you say speed
>is crucial here, I don't agree that programming in asm necessarily gives
>significant improvments in speed, but then you seem to imply that what your
>interested in is exact timing information. I used to write programs in asm
>and spent a lot of time "optimising" them (in fact I once spent about 2 months
>on and off improving one critical subroutine, sutting it down from something
>like 35 micro secs to 26) these made little difference to the final product.
>Moreover it meant that the software was always behind schedule. Anyway see
>the above comments.

    I wasn't too clear in what I was trying to say.  We had several things
that needed to be serviced periodly which included four stepper motors and
a pump.  The timing of events was controled by setting up interrupts to
occur when some action was needed, say when it was time to step one of the
motors.  Therefore the relative timming of events was not the major concern.

    Our concern was that each of the interrupt routines had to execute as
quicly as possible so that it would delay other interrupts as little as
possible.  The number of machine cycles was counted in order to know how
long the interrupt routine took.  Whenever we changed the routine in order
to speed it up, we knew if we made an improvement simply by comparing the
machine cycle totals.  Yes this did take a lot of time, but the improvements
were noticable, for example the motors ran smoother and the positioning
accuracy improved.

>I would also say that if you ever try to do this on more complicated processors
>like the 68k and RISCs, then good luck to you, the timing is unbelievably
>complicated, with caches and pipelines and all that.

We it becomes very difficult, if not impossible, to count machine cycles
for the processors you mention.  But if the processor is fast enough, the need
to count cycles is no longer needed because it is more than fast enough.

Thomas Granvold

pasche@ethz.UUCP (Stephan Pasche) (12/06/88)

In article <1388@aucs.UUCP> 861087p@aucs.UUCP (A N D R E A S) writes:
>
>I've been told also that there are some low-level operations
>that you just can't do in C or any other high level language.

for example : I don't know a high-level language where you have control of
	      the CPU status flags. These can be very useful for some 
	      applications. One example is an operation you need for a
	      fft program. There you have to reverse the bitorder of a word.
	      In assembly language there's a very elegant and fast solution :
	      You can shift the source operand via the carry bit into the
	      destination operand. For a 680xx CPU :

		Entry: move.l Source,d0  get source operand
		       moveq #31,d2	 init bit counter for a longword
		Loop:  lsr.l #1,d0	 shift source, one bit goes into x-bit 
		       roxl.l #1,d1      shift bit from x-bit into destination
		       dbra d2,Loop	 loop until last bit is shifted
		       move.l d1,Dest    save result (reverse bit-order) 

	      The x-bit of the 680xx has in this example about the same
	      function as the carry bit of other CPUs.
	      If you want to write the same program in a high-level language
	      you have to work with bit-masks and test operations, which
	      is (compared to the program above) very slow and complicated.

==============================================================================
OS/2 & PS/2 : half an operating system for half a computer
 
Stephan Paschedag           pachedag@strati.ethz.ch  or  pasche@bernina.UUCP
Federal Institute Of Technology Zurich
______________________________________________________________________________

nevin1@ihlpb.ATT.COM (Liber) (12/06/88)

[followups to comp.lang.misc]

In article <1432@zen.co.uk> frank@zen.co.uk (Frank Wales) writes:

>Suppose you don't have a compiler.  Or an assembler.  Suppose you're writing
>these for a brand new processor -- what do you write them in?

I would write a cross-assembler (probably table-driven if I had to do
this sort of thing a lot) and an emulator on another machine.  If I
waited for the actual hardware to come out before I did any of my
software, I'd probably be out of business. :-)
-- 
NEVIN ":-)" LIBER  AT&T Bell Laboratories  nevin1@ihlpb.ATT.COM  (312) 979-4751

stachour@umn-cs.CS.UMN.EDU (Paul Stachour) (12/06/88)

In article <6529@june.cs.washington.edu> kolding@uw-june.UUCP (Eric Koldinger) writes:
>
>Ah, but wouldn't that be nice.  Optimizing compilers that could generate
>code as good as we can generate by hand in all cases.  Let me know when
>someone writes one.
>
One of the important parameters is the "size of the program".
When I was working at IBM in the early 1970's, we had an
outstanding "bet" as to whether anyone could outdo the PL/S compiler
on a "non-trivial" program (defined as one that took more than
1K bytes of 360-assembler to write).  We had a number of challenges.
We never lost.  Yes, one could take the compiler-output and find
one thing to optimize by hand, but that was forbidden by the
ground-rules.
  And those who "needed" an instruction or two used "instruction-forms",
such as "CVB(IntVar,PackedStringVar);", whose semantics the compiler
understood (and just generated the CVB + setup if needed).  However,
the win was that the SIL compiler did not have to "dump" its history
for inability to see what the user was doing in this assembly-code block.
  That enabled one to write "everything" in a mid-level-language
designed for systems implementation work [such as the PL/S dialect of PL/I]
(I refuse to call languages of the ilk of "C" high-level), and put
in the one-or-two assemblers that one "really needed".
  To their credit, notice that the designers of Ada allowed the
same style in their language definition; I hope the compiler-writers
and vendors know how to take advantage of it.

dik@cwi.nl (Dik T. Winter) (12/07/88)

In article <159@loligo.fsu.edu> mccalpin@loligo.UUCP (John McCalpin) writes:
 > I notice that no one has discussed supercomputers in this long 
 > discussion on the merits of learning assembly language.  To achieve
 > any sort of reasonable performance on a vector supercomputer, you
 > must know a lot about the architecture.  Some machines are worse
 > than others --- the CDC/ETA machines and the Cray-2 come to mind as
 > machines on which it is remarkably easy to get bad performance....

This is very true.
 > 
 > On the other hand, it is almost never necessary for users to _write_
 > is assembly language on these machines to get good performance - you
 > just need to know what the vectorizer is able to convert into 
 > efficient code.  Also on the CDC/ETA machines, the entire instruction
 > set is available in FTN200 anyway via special calls. (This actually
 > produces inline code, but they are written as subroutine calls).

But, but ... Programming special calls in FTN200 is not very different
from programming in assembler (META in this case).  Try to port a
program containing special calls.  (For the readers that do not know
FTN200 and its special calls, it is similar to the C asm statement.)
When you rely on the vectorizer you may find you have bad luck (and
especially FTN200 comes to my mind).
Also you have to twist your program to let it compile to an efficient
executable, finding that your program will not port so very well anymore.
And (at least on Cray-1) on vector machines with vector registers
you may find a general improvement when you do things in assembler.
Not that it is *necessary*, but it may help.
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

news@ism780c.isc.com (News system) (12/10/88)

In article <707@ethz.UUCP> pasche@bernina.UUCP (Stephan Paschedag) writes:
>for example : I don't know a high-level language where you have control of
>             the CPU status flags.

The original FORTRAN language had statements like:

      IF SENSE SWITCH(3) 100,200
      if SENSE LIGHT(1)  500,600   (tests a status flag)

The sense swithes were settable by an operator.  The sense lights were
settable (and testable) under program control and were displayed on the
operators console.  Wonder why these statements were dropped in later
FORTRANs :-).  After all FORTRAN 77 still has:

   HALT 22

In the original FORTRAN the number 22 was displayed and the operator could
resume execution by pressing the start switch.  HALT was use primarly as
a debugging tool.

    Marv Rubinstein