[net.arch] Orphaned Response

ucbesvax.turner@ucbcad.UUCP (06/15/83)

#R:noscvax:-13800:ucbesvax:12800001:37777777600:1049
ucbesvax!turner    May 25 00:41:00 1983

	The multiple-dedicated-processors scheme embodied in the BCC
500 might well have been an outgrowth of frustration with Berkeley's
CDC 6400 system.  From vague recollection of conversations with people
remotely associated with it, CAL TSS (I think it was) was a bomb as
a time-sharing system because Seymour Cray had done too good of a
job of designing a batch machine.  Nothing glaringly wrong with the
software architecture (that I know), but it was poorly matched to
the available hardware.

	The 6400 also has multiple-dedicated-processors, and was
a phantasmagorical design.  It was built entirely with discrete
components at a time when everybody was going for IC's.  Cray
just didn't trust what he couldn't see, apparently.  (Even now,
his machines are built up from a very small repetoire of chips
whose design he has overseen--except, perhaps, for the ECL RAMs.)
Notwithstanding, the 6400 was not retired until last year, although
it had been running in the red as an operation for quite some time.

	Michael Turner
	ucbvax!ucbesvax.turner

hal@cornell.UUCP (09/04/83)

#R:rlgvax:0:cornell:-1:37777777600:1213
cornell!hal    Jul  3 09:47:00 1983

One of the articles in this conversation said something like "I don't
understand why they [the 8008-8080 designers] didn't do a better job.
After all, much was known about computer architecture at the time."

Well, yes, but...

How many good chip designers are also good computer architects?  Perhaps
there are some now, but back when the original microprocessors were built,
probably very few people were good at both.  The original microprocessors
remind me a lot of early computer designs before very much was known about
computer architecture.  The hardware folks put together something that
could execute instructions, then left it to the software folks to see
if they could figure out how to use it effectively and come up with good
code generators for compilers in spite of the instruction set.  There are
VERY few examples of good computer design.  The only encouraging thing is
that there is more awareness of how hard it is to do it right, and that a
really good design must take into account lots more than circuits.

Hal Perkins                         uucp:  {decvax|vax135|...}!cornell!hal
Cornell Computer Science            arpa:  hal@cornell
                                  bitnet:  hal@crnlcs

johnl@haddock.UUCP (01/31/84)

#R:parsec:32800003:haddock:9500008:000:616
haddock!johnl    Jan 30 10:31:00 1984

Without trying to fan the flames, I believe that the advantages of two's
complement arithmetic over its main competitors, one's complement and
sign-magnitude are:

  -- Unique representation of zero (the others have +0 and -0)

  -- Simpler design of adders and subtracters.  The other two require
     tweaks like end-around carry to get the right answer.
     Multiplication and division are consquently simpler, too.

  -- Easier software implementation of multiple-precision aritmetic,
     since signed and unsigned addition are the same except for the
     interpretation of the result

John Levine, ima!johnl

coulter@hpbbla.UUCP (05/14/84)

>> A lot depends on the C-2 beyond the survival of Cray Research.

I am intrigued by what the author meant by this.  Would the author 
(or anyone else) please tell me.

perry@hp-dcde.UUCP (perry) (01/10/85)

/***** hp-dcde:net.arch / dartvax!chuck /  8:35 pm  Jan  9, 1985*/
Now suppose we had a cache that was much more under a programmer's control.
To be concrete, suppose we have a cache of say, 32 elements each containing
32 words.  And suppose our processor has a load cache instruction with
syntax:

    load cache <cache address> <memory address>

dartvax!chuck
/* ---------- */

I think the biggest problem right now is the compiler technology.  The
compiler would have to recognize instruction locality when it occurs in a
program.  Another intelligent approach is locality lookahead, similar to
the instruction lookahead which has been done on the big machines for years.
This all sounds like a compiler-writer's nightmare to implement.  I have not
yet seen a compiler optimized for cache technology.

Of course, assembly-level programmers could hand-code this instruction,
but I think that would be a step backward (or sideways).

You allude to the idea of a segmented cache.  The idea of RISC-type register
files might be applicable to caches.  Especially in environments which tend to
switch contexts rapidly (such as a program making a lot of OS calls),
this may avoid the problem of voiding a valid cache on a context switch.

Perry Scott
!hplabs!hpfcla!perry

roy@phri.UUCP (Roy Smith) (01/15/85)

> Now suppose we had a cache that was much more under a programmer's control.
> And suppose our processor has a load cache instruction with syntax:
>     load cache <cache address> <memory address>
> dartvax!chuck
> /* ---------- */
> 
> I think the biggest problem right now is the compiler technology.  The
> compiler would have to recognize instruction locality when it occurs in a
> program.
> Perry Scott, !hplabs!hpfcla!perry

	Maybe the compiler could get some help from the programmer.  We
already have register variables, why not have an extension to C which
allows CACHEON and CACHEOFF keywords?  OK you guys from the C Standards
committee, flame on, I'm ready for you; so we'll call it C++prime-star :-).
These could be used to bracket that section of code you want to make sure
gets cached.

	It is probably a given that anything bigger than a micro will have
some kind of cache (I'm not sure about 68K based systems, anybody know?)
and the variability in the size, update strategy, and organization of
caches should be no worse a problem than the corresponding differences in
register sets. Besides, just as with the REGISTER keyword, the compiler
takes this as a hint which it is free to ignore if it doesn't make sense
for the particular machine architecture.

	Perhaps instead of having bracketed sections of code, we could have
a CACHE keyword which is only valid in a function declaration and makes the
whole function get cached (yeah, I know, what if the function is 2.3Kbytes
of code and you only have 2K of cache?)  This later version would be
particularly good for things like interrupt routines which run often, but
usually with enough other stuff in-between invocations to flush the cache.

	Possibly, you would want the load cache instruction to be
kernel-mode-only to keep user code from hogging it???  Perhaps you want 1
part of the cache to be programmer allocatable using the load cache
instruction and part to be up for grabs???  Perhaps even 1 part reserved
for kernel allocation, 1 part for user allocation, 1 part up-for-grabs data
and 1 part up-for-grabs text (hey, it's got to be power-of-2, right?)  Hey,
why not do something really clever and put J-11's in all the device
controllers and give everybody a micro-vax II in their terminal so the cpu
doesn't have to do anything at all :-)

-- 
Don't blame me, I just work here....

{allegra,seismo,ihnp4}!vax135!timeinc\
                 cmcl2!rocky2!cubsvax >!phri!roy  (Roy Smith)
                      philabs!cubsvax/

peterb@pbear.UUCP (01/29/85)

	Personally I think that cache should not be under programmer control
(don't flame me!). All the memory management models assume a cache of fixed
size with random age distrubution. If you start loading and unloading the
cache, the approximations for the MM models all go to hell, and you can
inadvertantly kill the advantage of cache.

					Peter Barada
					ima!pbear!peterb

barrett@hpcnoe.UUCP (barrett) (03/10/85)

I don't know about the rest of the world, but 24 bit PHYSICAL address
spaces are somewhat marginal.  I have seen single-user machines with 
16MB of physical memory.  24 bits for VIRTUAL addressing is absurdly small. 

Dave Barrett
hplabs!hp-dcd!barrett or ihnp4!hpfcla!barrett

richardt@orstcs.UUCP (richardt) (07/10/85)

[sacrifice to the line ea

>Most of the UN*X utilities take less than 64k

ha!  Try doing a `TOP` sometime.  Although many of the standard utilities
do have code in the 1k- range, their data spaces tend to grow like crazy.
When you see a 'ls   1k code 117k data'  you'll understand.  Also, think
about what routines/programs use more than 64k.  Emacs, Rogue, Top, CSH,
probably SH and VI as well.  All of those were larger than 100K.  So forget
using 64k segments if you can possibly avoid it.  I will admit that
the resident sizes of those utilities is in the 10k-40k range.  However,
that other 90-100K still has to be addressed!
-------------------------
Is there an assembly-language programmer in the house?
						orstcs!richardt

henry@utzoo.UUCP (Henry Spencer) (07/15/85)

> >Most of the UN*X utilities take less than 64k
>
> ...  Although many of the standard utilities
> do have code in the 1k- range, their data spaces tend to grow like crazy.
> When you see a 'ls   1k code 117k data'  you'll understand.  Also, think
> about what routines/programs use more than 64k.  Emacs, Rogue, Top, CSH,
> probably SH and VI as well.  All of those were larger than 100K. ...

Not "sh", since it was written by people who knew what they were doing.
It's hardly surprising that csh, vi, and rogue are bloated, considering
where they were written.  The Berkloids have forgotten how to make anything
small.  And emacs is well-known to be elephantine.  Try looking more
carefully at /bin sometime; there are *lots* of small programs in there,
and a few big ones.  The original comment was correct.

"What's that you say?  4.3BSD 'echo' is 150KB?  I'm not surprised."
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

peterb@pbear.UUCP (07/25/85)

	It is trivial to determine overflow in a word X word = word
operation. Just do the standard multiply producing 2*word bits and check the
high order word for non-zero(base 1) values. If this is so, then to result
has overflowed. This check can be built in and executed at the end of the
multiply as the resultant lower word result is placed in the destination.

	It is not that expensive, requiring a few clocks at best.

Peter Barada
{ihnp4!inmet|{harvard|cca}!ima}!pbear!peterb

dougp@ISM780.UUCP (11/18/85)

How come this blather seems to get posted every two weeks whether
we need it or not???

allen@uicsrd.CSRD.UIUC.EDU (11/22/85)

   If this blather gets posted every 2 weeks why did you respond to 4
week old blather?  Wouldn't it be more appropriate to flame the most
recent blatherings?

aglew@ccvaxa.UUCP (05/14/86)

>/* Written  7:08 pm  Apr 29, 1986 by mac%uvacs@uvacs.UUCP in net.arch */
>> ....  Losing indexing for array accesses isn't too bad, since
>> just about every expensive array operation can be written using
>> pointers instead of indexes (post-increment doesn't require a
>> carry to perform a memory address).
>
>No semi-random access by subscripts?  Your machine is to be
>programmed exclusively in C or assembler, never Fortran.

The Fortran codes to worry about are the matrix processing codes, like the
Livermore loops. A hell of a lot of money has been spent figuring out how to
vectorize, and use other optimizations, for these - you can almost buy them
off the shelf.

>
>> So what can you do? Adding an index register in slows down the rest of your
>> instruction set,
>
>Even when you don't index?

Yep. (1) a short pipeline machine - pipeline stage has to be the size of an
addition. (2) add extra pipeline stages to absorb the addition - since there
is nearly always a chain from the front to the back of the pipe, you've
slowed it down again.


>> .... and, if memory is cheap, you might be willing to pad out a
>> lot more of your structures, to get a speed increase.
>
>or use a separate add instruction.

Slow. That's the point - if you aren't worried about speed you can use the
ADD, and get all the flexibility you want. If you are worried about speed,
then you could use the OR. 

Now, if a separate register-to-register OR instruction were faster than a
register-to-register ADD... too bad asynchronous systems are out of fashion.

aglew@ccvaxa.UUCP (05/14/86)

I'm told the Star has pages of 512 and 64K.

Fujitsu's new machine has pages of 4K and 1M.

I believe that Convex's machine has multiple page sizes, although I do not
know if these are different page sizes as seen from the page table, or
different sizes as seen from the point of view of demand paging (ie.
variable size clusters).

Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms

parafras@uicsrd.UUCP (07/14/86)

The input language to Parafrase is FORTRAN 66 with some extensions.  The 
output is a FORTRAN-like language that cannot be compiled on any FORTRAN
compiler.  The most noticeable differences are:

	1) loop types other than DO, notably DOALL, DOPIPE, DOACROSS, etc.

	2) non-standard variable names.  &, ', lower case, etc. are
	   included in variable names.  This is good in that a user of
	   Parafrase can tell what transformation caused some temporaries
	   to be created.  This is bad because it's not standard.

It would not be much trouble to write a pass that would convert Parafrase
output FORTRAN back to standard serial FORTRAN, but then you really haven't
gained a whole lot.  It would be more useful to convert Parafrase output
to a general standard parallel FORTRAN, but unfortunately such a beast does
not yet exist.