roger@inmos.co.uk (Roger Shepherd) (02/06/90)
There were some critisisms made about the transputer instruction set. These are my responses, I hope they are correct, I was around while these decisions were being made but time causes one's memory to fade (just like DRAM). I suspect that one of my collegues has explained the positronics used in the transputers which enables instructions to complete before they have started and thus to go faster than light :-) > ............ How are memory accesses (both > internal and external) handled? Is there an independent entity (i.e., > a seperate process running in parallel to CPU, FPU, and the DMA > machines) which handles these requests? This would explain why a LDL > takes two cycles (have to wait for data to arrive), while STL takes > only one (just get rid of it and go on). Of course, there would be a > penalty fo a LDL following a STL, but maybe not if the load comes from > internal memory and the store goes to external. What about that? > (This is no idle speculation. Compliler writers for and users of > vector machines spend much time avoiding these types of memory access > conflicts!) The above description of the operation of LDL and STL is almost correct. LDL takes 2 cycles, the first cycle performs an address calculation, the second makes the memory access. For STL the first cycle is spent performing the same address calculation and during the next cycle the write occurs. The operation of the sequence STL; LDL takes 3 cycles as the memory access for STL overlaps the address calculation of the LDL. The operation of LDL; STL also takes 3 cycles. > 4. Lastly, I'd like to know why some instructions have been done the > way they are. I can understand that cj (conditional jump) behaves as a > `jump zero' or `jump false', that's a matter of taste - a RISC > processor just doesn't have the orthogonal instruction set of a CISC. > But why, for whoever's sake, does it remove the operand if not jumping > and leave the zero when jumping? Most of the time, I find myself > preceding it with a dup so I have a copy of my laboriously computed > loop count after I've checked whether it's zero! Well, of course the > zero is easier to remove (with a diff) than a non-zero, but in that > case, we could invert the result of the comparison. The reason that CJ behaves as it does is that it optimises the generation of the occam OR and AND operators (equivalent to the C || and && operators). In retrospect it would have been better to have had the popping behaviour of CJ reversed, so that it threw away a known value and kept an unknown one. The choice of JUMP IF FALSE or JUMP IF TRUE is non-trival. The instruction set of the transputer as set up allows easy generation of fairly dense code and this involves a careful choice for the instruction set. For example, if we assume CJ as it is, then the choice of whether to have a EQC (equals constant) or (NEQC) not equals constant instruction depends of the frequency of constructs such as "IF x = constant" verses "IF x <> constant". > In general, I find the selection of direct and short instructions very > reasonable. Exceptions are the startp and endp instructions: Why do > they, taking 10 and 11 cycles, get the privilege of a one-byte > instruction, while the much more useful dup now takes two cycles just > because it has to be a two-byte instruction? (Why dup wasn't there > from the start, but was added as an afterthought to the T800, would > probably make an amusing historical anecdote. Does anybody know the > reason?) DUP wasn't in the T424 from the start because it did not seem a very useful instruction. So, you ask, why didn't DUP seem like a useful instruction? Well, the working assumptions that made in the early 80s when the instruciton set was designed included things like, all code would be compiler generated, register allocation is hard (one reason for producing a stack machine), LDL would be 1-cycle, global optimisation is hard..... When viewed in this light, DUP would not be a common instruction and would, therefore be 2 cycles. There would be a distinct penalty for using the sequence LDL; DUP rather than LDL; LDL, there would be no benefit to using DUP; STL rather than STL; LDL. So why introduce DUP? Well, DUP was introduced to improve various of the double length conversion operations in the T800. The primary reason for its inclusion was that (i) LDL was a two cycle operation, (ii) LDL was often even slower since workspaces were not always on chip. As to which instructions should be short codes and which long codes I don't think we did too badly given that we were guessing about the structure of concurrent programs. I now think that the assignment of opcodes to instructions should be driven so as to minimise byte of code fetched during program execution. At the end of the day the memory bandwidth taken by instruction fetch will be limiting - this is what will limit the speed of RISCs and what will cause a re-emergence of machines with addressing modes - they will get better code density. Roger Shepherd, INMOS Ltd JANET: roger@uk.co.inmos 1000 Aztec West UUCP: ukc!inmos!roger or uunet!inmos-c!roger Almondsbury INTERNET: roger@inmos.com +44 454 616616 ROW: roger@inmos.com OR roger@inmos.co.uk