wca@oakhill.UUCP (william anderson) (07/09/88)
INTRODUCTION In a private communication, Landon Dyer <landon@claris.UUCP> asks the following question in response to the power considerations discussed in article <1362@oakhill.UUCP>: -> Is it possible that a worst-case instruction-stream address sequence -> (from a program designed to maximize address-bit changes while minimizing -> cache hits) would cause an 88100 to overheat? That is to say, can a malevolent programmer find an instruction sequence for the MC88100 which is equivalent to the legendary HCF (Halt and Catch Fire) instruction? PATHOLOGICAL EXAMPLE Assume we have the following register contents: r2: 0x55555555 r3: 0xAAAAAAAA Consider the following M88K "program", with the above register contents: 0xFFFFFFFC: st r2,r3,r0 br.n -1 st r3,r2,r0 On the MC88100, r0 is hardwired to 0. Consider what this loop does: 1 - The first st stores all 5s at address = AAAAAAAA. The address of this instruction is -1. 2 - The br.n branches back to the first st instruction while executing another store in the delay slot. The address of this instruction is 0. 3 - The second st stores all As at address = 55555555. A loop where stores (where the MC88100 must drive its data pins) run nearly back to back with highly uncorrelated states on both data and address busses, and with uncorrelated states on the instruction address bus, is the worst case power dissipation for the MC88100. Therefore the relevant M88K P bus states in this loop look like (using hexadecimal notation): I-Address D-Address Data Byte Strobe (30 bits) (30 bits) (32 bits) (4 bits) -------- -------- -------- - 3FFFFFFF 00000000 00000001 3FFFFFFF 00000000 AAAAAAAA 55555555 F 00000001 AAAAAAAA 55555555 0 3FFFFFFF 55555555 AAAAAAAA F 00000000 AAAAAAAA 55555555 F 00000001 AAAAAAAA 55555555 0 3FFFFFFF 55555555 AAAAAAAA F and so on. Note that the I-address pins are going from all 0 to all 1 to all 0 every third cycle, and both the D-address and Data pins are alternating 1s and 0s at every pin at the same frequency. POWER CONSIDERATIONS FOR PATHOLOGICAL EXAMPLE Now, the AC power dissipation (as discussed in article <1362@oakhill.UUCP>) is given by: P = .5*C*V**2*F*N, [ equation 1 ] where: P = AC power dissipation (W) C = load capacitance (F), V = voltage swing (V), F = frequency (Hz), and N = number of pins which make transitions. (For the MC88100, V = 3.8 volts (TTL logic levels) and F = 20 MHz) NOTE: This formula was INCORRECTLY posted as P = 2*C*V**2*F*N in the aforementioned article. We sincerely apologize for any inconvenience that this might have caused. However, the derivation of this formula takes about 60 seconds and is left as an exercise for the reader (use the formulae P = V*I and I = C*dV/dt, and average over one clock to do the derivation). If we plug in the numbers, using: C = 85 pF (70 pF maximum output load capacitance plus 15 pF internal output capacitance) V = 3.8 V (see above) N = 2/3 * 96 = 64 (2/3 due to transition frequency) we get: P = .79 W (F = 20 MHz) P = .98 W (F = 25 MHz) This result is well below the maximum power dissipation of 1.5 W given for the MC88100. In general, the AC power dissipation of the MC88100 highly dominates any DC power dissipation. Clearly, this program is pathological: - No work gets done and the code never terminates; therefore, we can make the MC88100 halt (i.e. quit doing useful work) but we cannot make it catch fire. - The instruction addresses are contrived (in fact, this program won't work with the MC88200 CMMUs since addresses >= FFF00000 are hardwired for I/O address space). REALISTIC EXAMPLE WITH OPTIONAL POP-QUIZ Perhaps a more interesting example would be where the M88K is doing memory-intensive work in a useful manner. Let's consider an example which might be useful in a Unix(R) kernel: wordmove(). A common example of the C source code for wordmove() might be: void wordmove_1(d, s, n) long *d, *s; unsigned int n; { while ( n-- ) *d++ = *s++; } OR: void wordmove_2(d, s, n) long *d, *s; unsigned int n; { register i; for(i = -n; i; i++) s[n+i] = d[n+i]; } OR: void wordmove_3(d, s, n) long *d, *s; unsigned int n; { register i; for(i = 0; i<n; i++) s[i] = d[i]; } ( Multiple-choice pop-quiz for C programmers or compiler writers: which C code should generate the fastest loop on the MC88100? Answer follows on next page. ) In the worst case, this word-move case causes more pin transitions per cycle and is therefore a more power-consumptive situation than, for example, a byte-move code. Now, using a 'free source code with restricted redistribution (but not public domain)' ANSI C compiler, the answer to the above "pop-quiz" is wordmove_2(), (with wordmove_3() second-best and wordmove_1() worst). The inner loop code from this compiler, after being scheduled with the Motorola scheduling filter, is: top: lda.b r2,r4,r3 ld r7,r6[r2] lda.b r3,r3,1 bcnd.n ne0,r3,top st r7,r5[r2] This is quite efficient code for a short (rolled) loop, running at 20 MIPS (for a 20 MHz MC88100) and moving 16 Mbytes/second. We now assume the worst case for the data being moved (we use 0xAAAAAAAA and 0x55555555 for alternate words as above). We also assume worst case coherency between source address (s) and destination address (d). The MC88100 P bus states in this loop look like (here, ******** represents the data pins on a load; since the cache memory system [e.g. MC88200] does the driving on a load, this doesn't represent power loading on the MC88100): I-Address D-Address Data Byte Strobe (30 bits) (30 bits) (32 bits) (4 bits) -------- -------- -------- - top top+1 top+2 s F top+3 s ******** 0 top+4 d 55555555 F top d 55555555 0 top+1 d 55555555 0 top+2 s+1 55555555 F top+3 s+1 ******** 0 top+4 d+1 AAAAAAAA F top d+1 AAAAAAAA 0 top+1 d+1 AAAAAAAA 0 top+2 s+2 AAAAAAAA F top+3 s+2 ******** 0 top+4 d+2 55555555 F and so on, ad nauseum. POWER CONSIDERATIONS FOR REALISTIC EXAMPLE We now reapply equation 1 above, using the same values for C, V, and F, but we adjust N (effective number of pins changing per cycle) to get the AC power dissipation for the more realistic example: N = 2 (I-Address) + 4*32/10 (D-Address) + 2*32/10 (Data) + 8*4/10 (Byte Strobe) = 24.4 (effective pin transitions per clock) and this gives: P = .30 W (F = 20 MHz) P = .37 W (F = 25 MHz) or roughly one-fifth of the maximum power rating for the 20 MHz part. CONCLUSION In this article, we have analyzed the AC power dissipation of the MC88100 as a function of the code that it runs. A pathological code example was found to be the worst-case with regard to power dissipation: this code segment caused power dissipation to be roughly one-half the rated maximum (at 20 MHz). A more realistic code example (a move-word routine) was examined and the power dissipation for this code was found to be about one-fifth the rated maximum (again, at 20 MHz). Clearly, a low average power dissipation should have a highly desirable effect on the reliability and longevity of any design using a CMOS VLSI microprocessor. We have also (again) considered the output from an ANSI C compiler and have implicitly shown that, in the case of the MC88100, pointer arithmetic is less efficient than array arithmetic, due in part to the scaled-indexed addressing mode of the part. This may have some impact on the way system code is most efficiently written for the M88K. Finally, we have used an AC power dissipation equation (equation 1) which has general applicability to all microprocessors, since it doesn't depend upon internal details of the chip architecture but instead depends upon the simple physics of the microprocessor/memory interface. We hope that the readers of this article can use this equation to enlighten us with regards to the power characteristics of their products. ACKNOWLEDGMENTS Thanks to Mitch Alsup for his valuable advice and motivation. The statements and opinions presented in this article are my own. They should not be interpreted as being the opinons or policy, official or otherwise, of Motorola Inc. /\ /\ William C. Anderson //\\ //\\ Member of the Motorola 88000 Design Group ///\\\ ///\\\ Motorola Microprocessor Division // \\ // \\ Oak Hill, TX / \/ \ / \
andrew@frip.gwd.tek.com (Andrew Klossner) (07/12/88)
A few nits ... > r2: 0x55555555 > r3: 0xAAAAAAAA > 0xFFFFFFFC: > st r2,r3,r0 > br.n -1 > st r3,r2,r0 Under normal circumstances, each store will cause a misaligned data access exception because the target addresses are not longword-aligned. You can disable this exception by manipulating a bit in the PSR, but I've never been able to figure out just what happens in that case, except that a 68020-style unaligned longword store (straddling two adjacent aligned longwords) doesn't. > I-Address D-Address Data Byte Strobe > (30 bits) (30 bits) (32 bits) (4 bits) > -------- -------- -------- - > 3FFFFFFF 55555555 AAAAAAAA F You can't get 55555555 in 30 bits. The D-address will be 15555555. Will the byte strobe really be F for this misaligned access? -=- Andrew Klossner (decvax!tektronix!tekecs!andrew) [UUCP] (andrew%tekecs.tek.com@relay.cs.net) [ARPA]
wca@oakhill.UUCP (07/12/88)
In article <10157@tekecs.TEK.COM>, andrew@frip.gwd.tek.com (Andrew Klossner) writes: > A few nits ... That's OK, Andrew, I deserve it. > r2: 0x55555555 > r3: 0xAAAAAAAA > 0xFFFFFFFC: > st r2,r3,r0 > br.n -1 > st r3,r2,r0 > > Under normal circumstances, each store will cause a misaligned data > access exception because the target addresses are not longword-aligned. > You can disable this exception by manipulating a bit in the PSR, but > I've never been able to figure out just what happens in that case, > except that a 68020-style unaligned longword store (straddling two > adjacent aligned longwords) doesn't. When the misaligned access exception is disabled (by setting the appropriate bit in the Processor Status Register) and a misaligned access is attempted, the M88100 rounds the address *down* to a consistent boundary. In the above case, a word access addressed for the location 0xAAAAAAAA will access (in this case, store) a full word of data (that is, byte strobe = 0xF) at location 0xAAAAAAA8. Clearly, this could create serious problems (!), so the moral of the story is that any programmer who disables the misaligned access exception better know what he/she is doing. Note that one use of this feature is in a tagged architecture application. > I-Address D-Address Data Byte Strobe > (30 bits) (30 bits) (32 bits) (4 bits) > -------- -------- -------- - > 3FFFFFFF 55555555 AAAAAAAA F > > You can't get 55555555 in 30 bits. The D-address will be 15555555. > Will the byte strobe really be F for this misaligned access? You're right, Andrew, the D-Address should be 15555555 (and the next D-Address in the cycle should be 2AAAAAAA). And the byte strobe is 0xF for this access, as mentioned above. In my enthusiasm to flip as many bits (and burn as many mW) as possible (and to keep the sample program as simple as possible), I blithely ignored the misaligned problem and committed the gaffe. If I use two more registers for the addresses 0x55555554 and 0xAAAAAAA8 (keeping r2 and r3 for the data as above), then I can write a program that does flip as many bits as possible and one gets essentially the same results as in the previous article (as far as power dissipation goes). That is to say, if we have: r2: 0x55555555 r3: 0xAAAAAAAA r4: 0x55555554 r5: 0xAAAAAAA8 and we run the code: 0xFFFFFFFC: st r2,r5,r0 br.n -1 st r3,r4,r0 then we get P-bus states that look like: I-Address D-Address Data Byte Strobe (30 bits) (30 bits) (32 bits) (4 bits) -------- -------- -------- - 3FFFFFFF 00000000 00000001 3FFFFFFF 00000000 2AAAAAAA 55555555 F 00000001 2AAAAAAA 55555555 0 3FFFFFFF 15555555 AAAAAAAA F 00000000 2AAAAAAA 55555555 F 00000001 2AAAAAAA 55555555 0 3FFFFFFF 15555555 AAAAAAAA F and so on. Or, I could disable misaligned access exceptions and use the original code for identical results (as far as the P-bus and the MC88100 are concerned; after all, this was the pathological example!) The remainder of the article (in particular, the power dissipation analysis for the M88K) is not affected by this correction in any substantial way. > -=- Andrew Klossner (decvax!tektronix!tekecs!andrew) [UUCP] > (andrew%tekecs.tek.com@relay.cs.net) [ARPA] Thanks again for the correction, Andrew. The statements and opinions presented in this article are my own. They should not be interpreted as being the opinons or policy, official or otherwise, of Motorola Inc. /\ /\ William C. Anderson //\\ //\\ Member of the Motorola 88000 Design Group ///\\\ ///\\\ Motorola Microprocessor Division // \\ // \\ Oak Hill, TX / \/ \ / \