[comp.sys.transputer] Transputer scheduler and documentation

ADRIAN@VAX.OXFORD.AC.UK (04/04/89)

In reply to David Suter: transputer scheduler.

There was a time when Inmos was not very forthcoming about implementation
details and the transputer instruction set, but that *was* a very long time
ago. Even then, if you read between the lines, you could infer a good deal.
I was one of the lucky few (about 200) who attended a seminar in Bristol
at which Steven Brain and Paul Mattos told us a little about this new device.
They even passed around silicon wafers carrying bits of transputers! At that
time the component parts were not yet assembled; but it seemed wafer probing
showed that each component worked.

And we were given a data sheet/book: for the T424. (Yes 4*2*4: the memory
had to be reduced later to give the T414).  In that first book were examples
of the compilation of occam constructs into assembler. Reading between those
lines, it was possible to infer a good deal. I admit that later data sheets
omitted much of the more interesting stuff. And there was a period when
Inmos did seem intent on shooting themselves in the foot: they seemed to be
badly out of contact with the culture of the people who were actually
designing with existing microprocessors. (We were even told that the
transputer *was not* a microprocessor, but that seemed to be a marketing
ploy both to draw attention to its distinctive nature, and to avoid
direct comparison with sales of 68000's or whatever.)  But when you met INMOS
designers, they were always most helpful. Personally, I believe that the INMOS
philosophy was superior, powerful, elegant and efficient, but unless you had
a very strong background in theoretical and mathematical computer science,
there did seem to be a communication gap. Unfortunately, the majority of
people designing hardware did not have such a background, and many simply
*did not understand* the literature. I met many such people at early
Occam User Group meetings.

BUT ALL THIS HAS CHANGED! All the information is now readily available.
There have been many articles in journals, and well as the computing and
electronics magazines. The "Compiler Writers' Guide" (CWG) was available for
the asking : I have something entitled "T2/T4 transputer Instruction Set" dated
July 1986. The preliminary version of the CWG on my desk is dated Feb 19, 1986.
Those two were merged and are now available as a rather expensive Prentice
Hall book: ISBN 0-13-929100-8.


The primary references are now in the Prentice Hall series:-
CWG as above,
The transputer Reference Manual, ISBN 0-13-929001-X.
The Transputer Development System, ISBN 0-13-928995-X,

..and several others. (Yes, I know this has been on the list before, but
clearly many subscribers haven't the faintest idea of what a transputer
*is*!!)

And INMOS have a series of technical notes (I think they have reached at least
number 58.) These are also prime sources of information.


My experience is that you have only to ask: INMOS are most generous with help
and information!

I also think that anyone working with transputers will benefit enormously
by some appreciation of CSP (Communicating Sequential Processes). This is
one of the theoretical underpinnings for occam and the transputer. There is
a vast literature of papers, but start with another Prentice Hill Book :
"Communicating Sequential Processes" by C A R Hoare (who invented CSP along
with so much else!). Sorry, I don't have the reference to hand; published
about 3 years ago.


Some people may not be aware that there is also an occam mailing list: that is
run from Oxford. In UK, the address for registration is
              occam-request @ uk.ac.oxford.prg.
In the US, I have:
		occ-req @ syr-sutcase.


The answers to David Suter's questions are all in the Compiler Writers' Guide.
It would be amazing (well I would be amazed) at any procedural compiler that
did not map its concurrency onto the transputer hardware (microcode if you
think that's different). $ Oh, ok, I do know of some C's that also support
shared memory and semaphores and such, but that can only be there for people
with old code to port, or who don't understand CSP? Tongue in cheek...
So the answers are probably all the same whether you use C, Pascal, Fortran,
occam and very possibly Lisp or Prolog.

Now for a little more detail (but check this):
(surely all this has been said 50 times already on this list, and at least
once in the last week.... but here we go again!)

High priority processes run until :
a) they terminate (hooray!)
b) they communicate.
  If it's to another process on same chip, that will
  involve looking at a memory location to see if the other
  end of the communication is ready: is MOSTNEG INT there....
  Actually, if the other process is on the same chip and is
  ready, it is not descheduled...
  If it's  to a process off chip, then it is all handed over to the link
  engine, and the process is always descheduled. After all, it can't
  continue to execute until the communuication itself has terminated.
c) Someone hits reset, or possibly analyse. (Boo, presumably!)

Notice that no interrupts (Inmos allow us to use that word nowadays, provided
we say it quietly in a dark room with a notice "Do Not Disturb: No EventIns,
please" on the door, and Ian Barron is not in the building) will be serviced
[or rather only type c) above]. Unless of course your high priority process is
precisely waiting on such a communication.

It seems worth saying here that people moving up from conventional sequential
processors often have trouble seeing how to use these mechanisms to implement
the equivalent multiple interrupt servicing. The problem can be that if you
simply hang a high priority process on each interrupt, you can come unstuck
in several ways: if your process is too long, you may miss another signal;
worse you might 'block' on communication. There is no problem: there are well
known and elegant solutions: it is simply a matter of taking a new more general
and powerful viewpoint. I think Geraint Jones and Michael Goldsmith cover this
in "Programming in Occam2": Pretice Hall (yet again): ISBN 0 -13-730334-3.
Personally, I do think that understanding the implementation on the transputer
does help in understanding this area.

And no low priority process will run while a high priority process is eligible.
(that leads to lots of injunctions not to allow high priority processes to
run too long in case timers are missed, particularly timeslices...).

And low priority processes run only if the high priority active list is
empty until:
a) They communicate. Roughly the workspace pointer is stored in a link engine
   or memory register. And they are removed from the active low priority list.
   The next active process is then scheduled....
   (When the communication is complete, the workspace pointer is added onto
   the *end* of the active list, so they are made eligible for execution.)

b) They get timesliced. The period and so on is a function of the silicon:
	as far as I remember, most current silicon uses 1K ticks of 64microsec,
        although I seem to remember that there is one exception with slightly
        different timing. It would be a peculiar program where it mattered
        much! You could always schedule a high priority process waiting on a
	timer to awake a low priority animal...or just read a timer now and
	again and do a dummy communication with a partner to force a shorter
        timeslice if that is what you need.
c) They terminate. (celebration)
d) Various things like timers and alts which are really special cases of
   communication.
e) stopp is executed: this is usually used to avoid pathological conditions
   propagating....


...and so on.

Now David Suter says 'they are suspended .. only when an autonomous link
interface...' . No, processes are suspended (descheduled) and rescheduled
in a wide rich and diverse number of ways, even when there is no external
communication.

In summary, it needs to be said (again):

1) The transputer is a very different animal from conventional sequential
    processors.
2) There is a large literature which *is* easily available, and really does
   need to be read if you are unfamiliar with CSP, occam and formal systems.
3) Inmos does make all the details available: it usually just a matter
   of asking.
4) A transputer is designed so that the logical properties of a program
   (including concurrency) are the same whether executed on a single
   or many transputers.
5) Occam has algebraic properties allowing 'proofs' of correctness. The
   transputer is designed to preserve these properties across any array.
6) Exploring these concepts is exhilarating and liberating: once you have
   caught on, you won't want to go back!
7) Concurrency in this sense is a fairly young technology: there are unsolved
   problems. Things which are simple and straightforward on a single sequential
   machine *are not* simple or clear in a massively parallel machine.
   Transputers are designed for massively parallel systems. You *must*
   examine the transputer design decisions in *that* context. Perhaps Inmos was
   right after all (as so often) in refusing to call the transputer a
   MicroProcessor!

No I don't have shares in Prentice-Hall, or Inmos! Hope this is of some help.


					Adrian Lawrence
----------------------------------------------------------------------------
ADRIAN @ UK.AC.OXFORD.VAX	Microprocessor Unit, Oxford University,
				13,Banbury Road, Oxford, Oxon. OX2 6NN. UK.
----------------------------------------------------------------------------
EARN/Bitnet:		ADRIAN%UK.AC.OXFORD.VAX@AC.UK
			$EARN node UKACRL

ARPA:			ADRIAN%VAX.OXFORD.AC.UK

ACSnet:			adrian@vax.oxford.ac.uk@au.oz.munnari
                        [ean.ac.uk%nummari.cs.mu]
-----------------------------------------------------------------------------

les@unicads.UUCP (Les Milash) (04/05/89)

In article <8904032222.AA02110@tcgould.TN.CORNELL.EDU> ADRIAN@VAX.OXFORD.AC.UK writes:
>Even then, if you read between the lines, you could infer a good deal.

YES YES I agree.  I was out there trying to reverse-engineer the 242
trying to figure what was in it from the cycle-counts from that red
spiral-bound book, remember that one?  I would have been buying those
suckers with every spare cent if they could only multiply.  (I always
wished the T212's had a real multiplier on it, you know, a 1-cycle
multiplier like in a DSP.  now THAT'd be a killer chip; a scalable
DSP.  (the niche is STILL OPEN, and I'd STILL buy them)

>
>I also think that anyone working with transputers will benefit enormously
>by some appreciation of CSP (Communicating Sequential Processes)...

I THINK SO TOO.  I think ANYONE will benefit enormously, (but then I'm a nerd).
You start to see breakfast as a function that maps {coffee, electricity,
a hungover wire-wrapper, and a clean kitchen} onto {coffee grounds, waste
heat, an employable programmer, and a dirty kitchen}.  It'll be good for you
even if you never see a T800 or OCCAM.  It'll taint any infatuation with C;
the "machismo" of trying to get correct results from a language with partly
unspecified semantics will seem more like "rambismo".

>one of the theoretical underpinnings for occam and the transputer. There is
>a vast literature of papers, but start with another Prentice Hill Book :
>"Communicating Sequential Processes" by C A R Hoare (who invented CSP along
>with so much else!). Sorry, I don't have the reference to hand; published
>about 3 years ago.

I HAVE A COPY RIGHT HERE ON THE SHRINE and here is the reference:
CAR Hoare, "Communicating Sequential Processes", Prentice Hall _International_,
1985.  does this stuff "ISBN 0-13-153271-5" help?

Also interesting is to read CARHoare's ACM Turing Award Lecture; as i recall
Mr. Hoare botched his first job pretty well :-), and got interested in 
correctness as a result of that.

w-colinp@microsoft.UUCP (Colin Plumb) (04/06/89)

les@sirius.UUCP (Les Milash) wrote:
> I would have been buying those
> suckers with every spare cent if they could only multiply.  (I always
> wished the T212's had a real multiplier on it, you know, a 1-cycle
> multiplier like in a DSP.  now THAT'd be a killer chip; a scalable
> DSP.  (the niche is STILL OPEN, and I'd STILL buy them)

Well, I've been asked not to say how many nanoseconds, but you're gonna
love the T810.  Let's simply say that fp multiply is over twice as
fast as the T800 and integer multiply is faster still.
-- 
	-Colin (uunet!microsoft!w-colinp)

"Don't listen to me.  I never do." - The Doctor